Статті в журналах з теми "Document Intelligence"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Document Intelligence.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Document Intelligence".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Askarifard, Hadis. "Types of classifier in artificial intelligence." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 15, no. 1 (October 23, 2015): 6436–43. http://dx.doi.org/10.24297/ijct.v15i1.1716.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Artificial intelligence or machine intelligence should be considered as the vast domain of junction of many knowledge, sciences and old and new technics. Today, classification of documents is adopted extensively in information recovery for organizing documents. In the method of document supervised classification some correct information about documents that previously have been classified are available for us and based on these information we classify these documents. Thus, we will examine methods such as: expert systems, artificial neural network, Genetic algorithm and fuzzy logics and so on. In this project we examine documents thematically and then using existing algorithms we predict a theme for a new document.
2

Bhatt, Ajay. "Document Automation Using Artificial Intelligence." International Journal for Research in Applied Science and Engineering Technology 10, no. 9 (September 30, 2022): 1365–13169. http://dx.doi.org/10.22214/ijraset.2022.46839.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract: Documents such as Invoices, receipts, bills, ID cards, and passports are basic and very common documents around the world which are used for accounting, tax records, payments, financial history, performing activities like data analytics, digitizing company’s records etc. All these documents are available in different formats such as images, PDFs and hard copies (paper). To retrieve the data from these documents, a person needs to manually write or type the data from documents to the table, which is time-consuming as well as irritating. We provide a simple, unique and easily implementable end-to-end approach that uses AI and Deep Learning models to automate the above tasks by just uploading the document of any format image, PDF or Docs, and the software will extract the data and save it in the required structural format. Our approach eliminates the need to manually enter data in an Excel or database record with no limit on the amount of work while companies are facing problems because of their limited workforce and limited work hour for manual data entry, but our software can run 24x7.
3

Shi, Zhongzhi, Qing He, Ziyan Jia, and Jiayou Li. "Intelligence Chinese Document Semantic Indexing System." International Journal of Information Technology & Decision Making 02, no. 03 (September 2003): 407–24. http://dx.doi.org/10.1142/s0219622003000732.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the rapid growth of the Internet, how to get information from this huge information space becomes an even more important problem. In this paper, An Intelligence Chinese Document Semantic Indexing System; ICDSIS, is proposed. Some new technologies are integrated in ICDSIS to obtain good performance. ICDSIS is composed of four key procedures. A parallel, distributed and configurable Spider is used for information gather; a multi-hierarchy document classification approach combining the information gain initially processes gathered web documents; a swarm intelligence based document clustering method is used for information organization; a concept-based retrieval interface is applied for user interactive retrieval. ICDSIS is an all-sided solution for information retrieval on the Internet.
4

Stoyanova, Miglena. "Document Process Automation with Artificial Intelligence for Logistics Sector." Izvestia Journal of the Union of Scientists - Varna. Economic Sciences Series 12, no. 1 (October 1, 2023): 190–97. http://dx.doi.org/10.56065/ijusv-ess/2023.12.1.190.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The logistics sector serves as the backbone of global commerce, facilitating the movement of goods across vast networks. The efficient management of documents is key to operational success in this industry. Document process automation powered by artificial intelligence offers a transformative solution to the challenges inherent in document- intensive workflows. The current study clarifies the essential role of AI-driven document process automation in optimizing document-related processes in the logistics domain. Through a systematic analysis, it highlights the imperative need for document process automation integration, its operational benefits, and the underlying considerations for successful implementation.
5

CROSSNO, PATRICIA J., ANDREW T. WILSON, TIMOTHY M. SHEAD, WARREN L. DAVIS, and DANIEL M. DUNLAVY. "TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING." International Journal on Artificial Intelligence Tools 22, no. 05 (October 2013): 1360008. http://dx.doi.org/10.1142/s0218213013600087.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
6

HAO, XIAOLONG, JASON T. L. WANG, MICHAEL P. BIEBER, and PETER A. NG. "HEURISTIC CLASSIFICATION OF OFFICE DOCUMENTS." International Journal on Artificial Intelligence Tools 03, no. 02 (June 1994): 233–65. http://dx.doi.org/10.1142/s0218213094000121.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Document Processing Systems (DPSs) support office workers to manage information. Document classification is a major function of DPSs. By analyzing a document’s layout and conceptual structures, we present in this paper a sample-based approach to document classification. We represent a document’s layout structure by an ordered labeled tree through a procedure known as nested segmentation and represent the document’s conceptual structure by a set of attribute type pairs. The layout similarities between the document to be classified and sample documents are determined by a previously developed approximate tree matching toolkit. The conceptual similarities between the documents are determined by analyzing their contents and by calculating the degree of conceptual closeness. The document type is identified by computing both the layout and conceptual similarities between the document to be classified and the samples in the document sample base. Some experimental results are presented, which demonstrate the effectiveness of the proposed techniques.
7

Belov, Ilya I. "Automation of Electronic Document Management Systems Functions by Means of Artificial Intelligence Technologies." Herald of an archivist, no. 3 (2022): 772–83. http://dx.doi.org/10.28995/2073-0101-2022-3-772-783.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The article discusses the use of artificial intelligence technologies in performance of documentation processes in electronic document management systems in order to improve existing practices and to develop methods of using such technologies and implementing some functions automatically without or with minimal human intervention. The scientific novelty of the research is due to an attempt to generalize practical experience of using artificial intelligence technologies to automate electronic document management systems functions by direct development and modernization of such systems and also by introduction of outsourced software for simplification of some tasks for specialists in the field of document management. The study is to analyze the functionality of electronic document management systems implemented with use of intelligent solutions. It draws on works of national and foreign experts in the field of document management, archival science, and information technology. The article reviews electronic document management systems based on artificial intelligence technologies, as well as software solutions integrated into information systems for intellectualization of various processes pertaining to working with documents. It analyzes specific software products of Russian developers involved in various areas of the economy and in direct creation of software in the field of electronic document management, as well as foreign experience in development and use of software for solving practical problems in the field of document management. The article describes the main functions for registering, indexing, routing, and searching for documents, as well as technical support for users that are already implemented within the framework of electronic document management systems using artificial intelligence technologies, which may indicate sufficient practical benefits of using intelligent solutions in document management. The author focuses not only on obvious benefits of using artificial intelligence technologies to improve efficiency and to expand functionality of electronic document management systems, but also on possible risks pertaining to ensuring information security while working with such systems, which indicates a need for further study of this area by specialists in records management and archival science.
8

A., Lukman, Emmanuel R., and Amos David. "Integrating Document Usage with Document Index in Competitive Intelligence Process." International Journal of Computer Applications 132, no. 13 (December 17, 2015): 37–43. http://dx.doi.org/10.5120/ijca2015907630.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Khudyak Kozorovitsky, A., and O. Kurland. "From ``Identical'' to ``Similar'': Fusing Retrieved Lists Based on Inter-Document Similarities." Journal of Artificial Intelligence Research 41 (June 21, 2011): 267–96. http://dx.doi.org/10.1613/jair.3214.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Methods for fusing document lists that were retrieved in response to a query often utilize the retrieval scores and/or ranks of documents in the lists. We present a novel fusion approach that is based on using, in addition, information induced from inter-document similarities. Specifically, our methods let similar documents from different lists provide relevance-status support to each other. We use a graph-based method to model relevance-status propagation between documents. The propagation is governed by inter-document-similarities and by retrieval scores of documents in the lists. Empirical evaluation demonstrates the effectiveness of our methods in fusing TREC runs. The performance of our most effective methods transcends that of effective fusion methods that utilize only retrieval scores or ranks.
10

Hayama, Tessai, Takashi Kanai, and Susumu Kunifuji. "Document Skimming Support Environment for Surveying Documents in Creative Activities." Transactions of the Japanese Society for Artificial Intelligence 19 (2004): 113–25. http://dx.doi.org/10.1527/tjsai.19.113.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Zhutyaeva, S. A., and T. A. Lysova. "Trends in the Development of Electronic Document Management: Prospects, Problems, Opportunities." Economics and Management 27, no. 12 (December 26, 2021): 963–70. http://dx.doi.org/10.35854/1998-1627-2021-12-963-970.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Aim. The presented study aims to determine the role and place of electronic document management in the corporate system of Russian enterprises, outlining the prospects for its development.Tasks. The authors examine the legislative acts of the Russian Federation on the prospects for the implementation of electronic document management; assess the impact of the pandemic on the digitalization of document management; analyze the business costs of paper document management; identify the advantages of using electronic document management and promising technologies in document processing.Methods. This study uses theoretical and empirical research methods. The dialectic method is used to determine the role, significance, and legal status of electronic document management. Through a logical approach, the essence of such concepts as 'electronic document' and 'electronic document management' is identified.Results. The study presents directions for the development of electronic document management using blockchain technology, which will improve workflows by processing, sorting, exchanging data and documents protected from unauthorized access, and artificial intelligence, which can help organizations process documents faster by simplifying operational procedures. Obstacles that prevent companies from actively using electronic document management are identified. These include additional investment, time costs, and reorganization of management. The volume of innovative services is analyzed by the type of economic activity, and the costs of creating, storing, and processing paper documents are considered.Conclusions. Recent trends in legislation indicate the government's firm commitment to the speedy introduction of electronic document management in Russia. Its use frees up a lot of resources, including time, labor, and finances. The 2020 pandemic has emphasized the importance of digitalizing business processes to ensure their continuity in unforeseen situations. Integrated into the automation of work processes, blockchain technology will ensure the protection of information from unauthorized tampering. Artificial intelligence will open up new opportunities for processing electronic documents.
12

Chen, H., S. R. K. Branavan, R. Barzilay, and D. R. Karger. "Content Modeling Using Latent Permutations." Journal of Artificial Intelligence Research 36 (October 28, 2009): 129–63. http://dx.doi.org/10.1613/jair.2830.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.
13

Kolandaisamy, Raenu, Heshalini Rajagopal, Indraah K., and Glaret Shirley Sinnappan. "The Smart Document Processing with Artificial Intelligence." Proceedings of International Conference on Artificial Life and Robotics 29 (February 22, 2024): 534–40. http://dx.doi.org/10.5954/icarob.2024.os18-8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

PERA, MARIA SOLEDAD, and YIU-KAI NG. "A NAÏVE BAYES CLASSIFIER FOR WEB DOCUMENT SUMMARIES CREATED BY USING WORD SIMILARITY AND SIGNIFICANT FACTORS." International Journal on Artificial Intelligence Tools 19, no. 04 (August 2010): 465–86. http://dx.doi.org/10.1142/s0218213010000285.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text classification categorizes web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified web documents to identify the ones with contents that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. We further enhance CorSum by considering the significance factor of sentences in documents, in addition to using word-correlation factors, for document summarization. We denote the enhanced approach CorSum-SF and use the summaries generated by CorSum-SF to train a Multinomial Naïve Bayes classifier for categorizing web document summaries into predefined classes. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum-SF outperforms other extractive summarization methods, and classification time (accuracy, respectively) is significantly reduced (compatible, respectively) using CorSum-SF generated summaries compared with using the entire documents. More importantly, browsing summaries, instead of entire documents, which are assigned to predefined categories, facilitates the information search process on the Web.
15

Guo, Shun, and Nianmin Yao. "Generating word and document matrix representations for document classification." Neural Computing and Applications 32, no. 14 (October 28, 2019): 10087–108. http://dx.doi.org/10.1007/s00521-019-04541-x.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Alothman, Abdulaziz Fahad, and Abdul Rahaman Wahab Sait. "Managing and Retrieving Bilingual Documents Using Artificial Intelligence-Based Ontological Framework." Computational Intelligence and Neuroscience 2022 (August 25, 2022): 1–15. http://dx.doi.org/10.1155/2022/4636931.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In recent times, artificial intelligence (AI) methods have been applied in document and content management to make decisions and improve the organization’s functionalities. However, the lack of semantics and restricted metadata hinders the current document management technique from achieving a better outcome. E-Government activities demand a sophisticated approach to handle a large corpus of data and produce valuable insights. There is a lack of methods to manage and retrieve bilingual (Arabic and English) documents. Therefore, the study aims to develop an ontology-based AI framework for managing documents. A testbed is employed to simulate the existing and proposed framework for the performance evaluation. Initially, a data extraction methodology is utilized to extract Arabic and English content from 77 documents. Researchers developed a bilingual dictionary to teach the proposed information retrieval technique. A classifier based on the Naïve Bayes approach is designed to identify the documents’ relations. Finally, a ranking approach based on link analysis is used for ranking the documents according to the users’ queries. The benchmark evaluation metrics are applied to measure the performance of the proposed ontological framework. The findings suggest that the proposed framework offers supreme results and outperforms the existing framework.
17

KHOCHA, Nadiia, and Nataliia MOSKAL. "Transition to electronic document workflow: perspectives and challenges." Economics. Finances. Law 10, no. - (October 31, 2023): 17–20. http://dx.doi.org/10.37634/efp.2023.10.3.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Introduction. The importance of this research lies in the fact that electronic document management can enhance accounting, management and assist businesses in making informed decisions. The purpose of the paper is to investigate the advantages and challenges that arise when transitioning to electronic document management in a company's accounting, as well as to identify possible ways to optimize this process. Results. The transition to electronic document management in accounting encompasses numerous advantages that can significantly improve a company's functioning and enhance its competitiveness. These advantages include reducing bureaucracy and paper costs, increasing productivity and document processing speed, elevating information security and data protection, and supporting sustainable development and environmental conservation. The shift to electronic document management in accounting is accompanied by several significant challenges, including technical issues and infrastructure, legal and regulatory considerations, issues with corporate culture and management transformation, and associated risks and cybersecurity challenges. Electronic document management in accounting holds significant potential. Artificial intelligence can detect and correct errors in documents, automatically distribute information, and predict payments and transactions. Blockchain technology allows for creating an immutable registry and a trust chain for each document, preventing data alteration or document forgery. Cloud solutions and mobile access enable users to access documents from any device and location, expediting work processes and enhancing efficiency. Conclusion. Transitioning to electronic document management allows for reducing bureaucracy and paper costs, increasing productivity and document processing speed, enhancing information security and data protection, as well as supporting sustainable development and environmental conservation. By leveraging artificial intelligence, blockchain technologies, cloud solutions, and integration with other systems, businesses gain the capability to enhance their operational efficiency and security.
18

Akbar, Rasona Sunara, Vita Nurul Fathia, Muhammad Arif Hamdi, Seno Setya Pujang, Mila Rosmaya, and Masdar Bakhtiar. "Bibliometric Analysis: Utilization of Artificial Intelligence to Support Office Performance." Devotion : Journal of Research and Community Service 5, no. 6 (June 27, 2024): 712–23. http://dx.doi.org/10.59188/devotion.v5i6.745.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Artificial Intelligence (AI) has experienced rapid development in recent years. In the current era of globalization, research on the use of AI to support the implementation of tasks has urgency to be raised in the scientific realm. This article aims to explore the study of the utilization of AI in the field of organizational performance. This study uses bibliometric analysis to review all articles related to the theme in the leading Scopus database. The search was carried out on Wednesday, March 13, 2024, at 16.50 WIB, and resulted in 13 documents related to "artificial intelligence" and "organizational performance" in 5 years (2019-2024). The results of this study show that the highest percentage of document types is in the form of articles, which reached 10 documents (76.9%), followed by conference papers as many as 2 documents (15.45%), and books as much as 1 document (7.7%). And the most frequently used languages in Artificial Intelligence and performance publications are English, Spanish, French, and Spanish. The research is based on a study conducted by T. Panichayakorn and K. Jermsittiparsert with the title "Mobilizing organizational performance through robotic and artificial intelligence awareness in mediating role of supply chain agility", 2019, Volume 8, Issue 5, Pages 757-768.
19

Mannar Mannan, J., K. Sindhanai Selvan, and R. Mohemmed Yousuf. "Independent document ranking for E-learning using semantic-based document term classification." Journal of Intelligent & Fuzzy Systems 40, no. 1 (January 4, 2021): 893–905. http://dx.doi.org/10.3233/jifs-201070.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Massive digital documents on Internet leading to use e-learning, and it becomes an emerging field of research due to the massive growth of internet users. E-learning requires suitable document ranking method to avoid navigating to the next Search Engine Result Page (SERP) frequently. The existing document ranking methods are lacking to rank the documents independently based on the conceptual contents. This paper proposes a novel method for ranking the documents independently based on the different classification of term it contains. In this approach, the terms are classified into five categories such as (1) direct query term, (2) expanded terms, (3) semantically related term, (4) supporting terms and (5) stop words. The query has been expanded using domain ontology to acquire more semantic terms for better understanding of user query. The semantic weight has been applied independently over different categories of terms in a document for ranking. The document with the highest augmented value in each category of terms has been ranked first. Remaining documents are ranked in the same way and are arranged in the descending order. The WordNet tool is utilized as a knowledge base and Wu and Palmer semantic distance method have applied for measuring semantic distance between the query and document terms for ranking the terms. The experiments show that the performance of the proposed document ranking method for e-learning retrieved better document compared with existing document ranking methods.
20

Bourbakis, Nikolaos, and Sukarno Mertoguno. "A Holistic Approach for Automatic Deep Understanding and Protection of Technical Documents." International Journal on Artificial Intelligence Tools 29, no. 06 (September 2020): 2050007. http://dx.doi.org/10.1142/s0218213020500074.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A Technical Document (TD) is mainly composed by a set of modalities appropriately structured and associated. These modalities could be NL-text, block diagrams, formulas, tables, graphics, pictures etc. A deep understanding of a TD will be based on the synergistic understanding and associations of these modalities. This paper offers a novel methodology for the implementation of a holistic approach for deep understanding of technical documents by understanding and associating these modalities. This approach is based on the homogeneous expression (mapping) of the technical document modalities into the same medium, which in this case is the Stochastic Petri-nets (SPN). Then, these modalities are associated to each other generating new knowledge about the technical document topic and a SPN simulator is created to offer additional information about the functional behavior of the system described in the document. Some results from our studies are provided to prove the overall concept.
21

Selvaraj, Suganya, and Eunmi Choi. "Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks." Sensors 21, no. 9 (May 4, 2021): 3196. http://dx.doi.org/10.3390/s21093196.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as search optimization and extracting hidden information from data generated by IoT sensors. Swarm intelligence (SI) algorithms use stochastic and heuristic principles that include simple and unintelligent individuals that follow some simple rules to accomplish very complex tasks. By mapping features of problems to parameters of SI algorithms, SI algorithms can achieve solutions in a flexible, robust, decentralized, and self-organized manner. Compared to traditional clustering algorithms, these solving mechanisms make swarm algorithms suitable for resolving complex document clustering problems. However, each SI algorithm shows a different performance based on its own strengths and weaknesses. In this paper, to find the best performing SI algorithm in text document clustering, we performed a comparative study for the PSO, bat, grey wolf optimization (GWO), and K-means algorithms using six data sets of various sizes, which were created from BBC Sport news and 20 newsgroups. Based on our experimental results, we discuss the features of a document clustering problem with the nature of SI algorithms and conclude that the PSO and GWO SI algorithms are better than K-means, and among those algorithms, the PSO performs best in terms of finding the optimal solution.
22

Tiwari, Manisha, Padmanabhan Aital, and Padmaja Joshi. "A comprehensive survey and review of machine learning techniques in document processing : Industry applications and future directions." Journal of Information and Optimization Sciences 45, no. 4 (2024): 1177–88. http://dx.doi.org/10.47974/jios-1701.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The use of deep learning, machine learning, and natural language-based approaches in the artificial intelligence (AI) field has rapidly advanced document processing efficiency across a wide range of business areas. This research provides a comprehensive survey by providing approaches, models and datasets used for document processing especially for the project proposal documents. The objective of this study is to review the literature on AI-based methods for the aforementioned application, which involves the automated processing of project proposal documents with deep learning, machine learning, and natural language processing. It also investigates a case study, which discusses the application of the methodologies in one of the e-governance document-processing task.
23

SHIOYA, ISAMU, HIROHITO OH'UCHI, and TAKAO MIURA. "DOCUMENT RETRIEVAL BY PROJECTION BASED FREQUENCY DISTRIBUTION." International Journal on Artificial Intelligence Tools 16, no. 04 (August 2007): 647–59. http://dx.doi.org/10.1142/s0218213007003485.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In document retrieval task, random projection (RP) is a useful technique of dimension reduction. It can be obtained very quickly yet the recalculation is not necessary to any changes. However, in lower dimension, random projection has instability by randomness in itself. In this investigation, we propose a new technique, called skewed projection (SP), for dimension reduction based on term frequency distribution. By our experiments, we show that we can take advantages of local independence thus we can obtain efficient retrieval for documents which belong to specific application area. Also we examine document size by which we can determine term distribution.
24

Veisi, Hadi, and Hamed Fakour Shandi. "A Persian Medical Question Answering System." International Journal on Artificial Intelligence Tools 29, no. 06 (September 2020): 2050019. http://dx.doi.org/10.1142/s0218213020500190.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.
25

Ben Basat, Ran, Moshe Tennenholtz, and Oren Kurland. "A Game Theoretic Analysis of the Adversarial Retrieval Setting." Journal of Artificial Intelligence Research 60 (December 30, 2017): 1127–64. http://dx.doi.org/10.1613/jair.5547.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The main goal of search engines is ad hoc retrieval: ranking documents in a corpus by their relevance to the information need expressed by a query. The Probability Ranking Principle (PRP) --- ranking the documents by their relevance probabilities --- is the theoretical foundation of most existing ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects; specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single-topic and multi-topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields an overall user utility that transcends that of applying the PRP.
26

HU, TIANMING, CHEW LIM TAN, YONG TANG, SAM YUAN SUNG, HUI XIONG, and CHAO QU. "CO-CLUSTERING BIPARTITE WITH PATTERN PRESERVATION FOR TOPIC EXTRACTION." International Journal on Artificial Intelligence Tools 17, no. 01 (February 2008): 87–107. http://dx.doi.org/10.1142/s0218213008003790.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The duality between document and word clustering naturally leads to the consideration of storing the document dataset in a bipartite. With documents and words modeled as vertices on two sides respectively, partitioning such a graph yields a co-clustering of words and documents. The topic of each cluster can then be represented by the top words and documents that have highest within-cluster degrees. However, such claims may fail if top words and documents are selected simply because they are very general and frequent. In addition, for those words and documents across several topics, it may not be proper to assign them to a single cluster. In other words, to precisely capture the cluster topic, we need to identify those micro-sets of words/documents that are similar among themselves and as a whole, representative of their respective topics. Along this line, in this paper, we use hyperclique patterns, strongly affiliated words/documents, to define such micro-sets. We introduce a new bipartite formulation that incorporates both word hypercliques and document hypercliques as super vertices. By co-preserving hyperclique patterns during the clustering process, our experiments on real-world data sets show that better clustering results can be obtained in terms of various external clustering validation measures and the cluster topic can be more precisely identified. Also, the partitioned bipartite with co-preserved patterns naturally lends itself to different clustering-related functions in search engines. To that end, we illustrate such an application, returning clustered search results for keyword queries. We show that the topic of each cluster with respect to the current query can be identified more accurately with the words and documents from the patterns than with those top ones from the standard bipartite formulation.
27

BOURBAKIS, N., W. MENG, C. ZHANG, Z. WU, N. J. SALERNO, and S. BOREK. "RETRIEVAL OF MULTIMEDIA WEB DOCUMENTS AND REMOVAL OF REDUNDANT INFORMATION." International Journal on Artificial Intelligence Tools 08, no. 01 (March 1999): 19–42. http://dx.doi.org/10.1142/s0218213099000038.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This paper describes a search engine for multimedia web documents and a methodology for removing (partially or totally) redundant information from multiple documents in an effort to synthesize new documents. In this paper, a typical multimedia document contains free text and images and additionally has associating well-structured data. An SQL-like query language, WebSSQL, is proposed to retrieve this type of documents. The main differences between WebSSQL and other proposed SQL extensions for retrieving web documents are that WebSSQL is similarity-based and supports conditions on images. This paper also deals with the detection and removal of redundant information (text paragraphs and images) from multiple retrieved documents. Documents reporting the same or related events and stories may contain substantial redundant information. The removal of the redundant information and the synthesis of these documents into a single document can not only save a user's time to acquire the information but also storage space to archive the data. The methodology reported here consists of techniques for analyzing text paragraphs and images as well as a set of similarity criteria used to detect redundant paragraphs and images. Examples are provided to illustrate these techniques.
28

Lobachev, Sergey L., and Elena V. Karpycheva. "Artificial Intelligence in Archiving: Statutory Regulation and Personnel Formation." Herald of an archivist, no. 2 (2022): 623–39. http://dx.doi.org/10.28995/2073-0101-2022-2-623-639.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The article studies issues of digitalization of archiving and statutory regulation of procedures for using artificial intelligence technologies in this field of activity. The authors have attempted a systematic approach to the problem of using AI in the work of archives, taking into account the processes taking place within the framework of digitalization of the economy of the Russian Federation. The study is to analyze the experience of using artificial intelligence technologies in archiving and statutory and methodological regulation of this sphere. The authors identify trends in the use of artificial intelligence technologies, as reflected in the preparation of draft standards (Artificial Intelligence. Artificial intelligence technologies used in activities of federal executive authorities. Classification and general requirements). The article touches upon development of competence-based approach in training of personnel, including archivists, in line with current trends of digitalization of all documented spheres of state activity. The article assesses compliance of local regulations with requirements of current professional standards. It addresses organization of storage, acquisition, and accounting of use of archival documents in correlation with digitalization of documented spheres of organizations’ activities; organization of document storage in operational records management; formation of fonds for use of electronic copies of documents; formation of electronic archives of organizations; transfer of electronic archival documents to state permanent storage; organization of work with electronic documents in state archives; planning and construction of archival storage processes; accounting and use of documents of temporary and permanent storage; regulation and control of federal and regional state information systems, registers, and databases; creation of a unified system of archival storage of documents; introduction of an electronic archive system (acquisition, accounting, examination of value, use of archival documents). The authors have identified three tasks that need to be solved within the framework of creating archives of tomorrow, based on widespread use of AI and on determination of main standards to use in the work of archives, the solution of which requires training of personnel capable of mastering and using AI technologies in archivist’s practical work.
29

Farid, Imam, Nabilah Bulqois, Nur Alim Assidiq, Sisi Megahutami, and Mochamad Iskarim. "Fostering Multiple Intelligences in Elementary Schools: Harnessing Independent Curriculum-Based Learning in Elementary Schools." Tadibia Islamika 3, no. 2 (November 30, 2023): 62–73. http://dx.doi.org/10.28918/tadibia.v3i2.1177.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This study investigates multiple intelligence learning strategies within the context of independent curriculum-based learning (ICBL) in elementary schools. The case study qualitative research method was employed at SD Islam Nusantara, involving first-grade and fourth-grade teachers. Interviews and document analysis were utilized as data collection techniques. The gathered data was analyzed using Miles, Huberman, and Saldana's interactive model, encompassing data condensation, data display, and data drawing. The findings of the study reveal that ICBL has the potential to enhance students' multiple intelligences. The ICBL process fosters improvements in linguistic intelligence, mathematical-logical intelligence, and interpersonal intelligence. Additionally, the project aimed at strengthening the profile of Pancasila students (P5) demonstrates positive effects on existential intelligence, musical intelligence, interpersonal intelligence, naturalist intelligence, and visual-spatial intelligence.
30

Li, Yongfei, Yuanbo Guo, Chen Fang, Yongjin Hu, Yingze Liu, and Qingli Chen. "Feature-Enhanced Document-Level Relation Extraction in Threat Intelligence with Knowledge Distillation." Electronics 11, no. 22 (November 13, 2022): 3715. http://dx.doi.org/10.3390/electronics11223715.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Relation extraction in the threat intelligence domain plays an important role in mining the internal association between crucial threat elements and constructing a knowledge graph (KG). This study designed a novel document-level relation extraction model, FEDRE-KD, integrating additional features to take full advantage of the information in documents. The study also introduced a teacher–student model, realizing knowledge distillation, to further improve performance. Additionally, a threat intelligence ontology was constructed to standardize the entities and their relationships. To solve the problem of lack of publicly available datasets for threat intelligence, manual annotation was carried out on the documents collected from social blogs, vendor bulletins, and hacking forums. After training the model, we constructed a threat intelligence knowledge graph in Neo4j. Experimental results indicate the effectiveness of additional features and knowledge distillation. Compared to mainstream models SSAN, GAIN, and ATLOP, FEDRE-KD improved the F1score by 22.07, 20.06, and 22.38, respectively.
31

Rosady, Imron. "Humanist Learning With Multiple Intelligences Strategy On Religious Education." FALASIFA : Jurnal Studi Keislaman 13, no. 2 (October 31, 2022): 134–43. http://dx.doi.org/10.62097/falasifa.v13i2.1049.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Learning in Islamic education is very important in the application of learning with multiple intelligences in Islamic Religious education learning. The learning planning document is called a lesson plan, there is multiple intelligences research at the time of acceptance of new students to find out their intelligence, students are grouped based on their respective intelligence. The implementation of multiple intelligences in Islamic religious education learning in schools has been carried out well. The lesson plan has been owned by the religious education teacher before the lesson is carried out. The main basis for teachers in making lesson plans for religious education learning is the linguistic intelligence of students by adjusting all learning activity plans with linguistic intelligence indicators. The implementation of the multiple intelligences linguistics strategies in religious education learning has been carried out with learning activities that are adapted to the linguistic intelligence of students, both activities at the beginning of learning such as apperception, warner, and scene setting, core activities are presented with methods that are following linguistic intelligence, and activities at the end of the lesson.
32

Braasch, Jason L. G., Ivar Bråten, Helge I. Strømsø, and Øistein Anmarkrud. "Incremental theories of intelligence predict multiple document comprehension." Learning and Individual Differences 31 (April 2014): 11–20. http://dx.doi.org/10.1016/j.lindif.2013.12.012.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Alexiou, Michail S., Nikolaos Gkorgkolis, Sukarno Mertoguno, and Nikolaos G. Bourbakis. "Deep Understanding of Technical Documents: An Enhancement on Diagrams Understanding." International Journal on Artificial Intelligence Tools 30, no. 05 (August 2021): 2150027. http://dx.doi.org/10.1142/s0218213021500275.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Humans are capable of understanding the knowledge that is included in technical documents automatically by consciously combining the information that is presented in the document’s individual modalities. These modalities are mathematical formulas, charts, tables, diagram images and etc. In this paper, we significantly enhance a previously presented technical document understanding methodology3 that emulates the way that humans also perceive information. More specifically, we make the original diagram understanding methodology adaptive to larger architectures with more complex structures and modules. The overall understanding methodology results in the generation of a Stochastic Petri-net (SPN) graph that describes the system’s functionality. Finally, we conclude with the introduction of the hierarchical association of different diagram images from the same technical document. This processing step aims to provide a holistic understanding of all illustrated diagram information.
34

IYER, SWAMI, and DAN A. SIMOVICI. "STRUCTURAL CLASSIFICATION OF XML DOCUMENTS USING MULTISETS." International Journal on Artificial Intelligence Tools 17, no. 05 (October 2008): 1003–22. http://dx.doi.org/10.1142/s0218213008004266.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this paper, we investigate the problem of clustering XML documents based on their structure. We represent the paths in an XML document as a multiset and use the symmetric difference operation on multisets to define certain metrics. These metrics are then used to obtain a measure of similarity between any two documents in a collection. Our technique was successfully applied to real and synthesized XML documents yielding high-quality clusterings.
35

ESPOSITO, FLORIANA, DONATO MALERBA, and GIOVANNI SEMERARO. "MULTISTRATEGY LEARNING FOR DOCUMENT RECOGNITION." Applied Artificial Intelligence 8, no. 1 (January 1994): 33–84. http://dx.doi.org/10.1080/08839519408945432.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Le, Tuan M. V., and Hady W. Lauw. "Semantic Visualization with Neighborhood Graph Regularization." Journal of Artificial Intelligence Research 55 (April 28, 2016): 1091–133. http://dx.doi.org/10.1613/jair.4983.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold, modeled using a neighborhood graph. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called SEMAFORE, which aims to preserve the manifold in the lower-dimensional spaces through a neighborhood regularization framework designed for the semantic visualization task. To validate the efficacy of SEMAFORE, our comprehensive experiments on a number of real-life text datasets of news articles and Web pages show that the proposed methods outperform the state-of-the-art baselines on objective evaluation metrics.
37

Pourvali, Mohsen, and Salvatore Orlando. "Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion." Journal of Intelligent Systems 29, no. 1 (December 4, 2018): 1109–21. http://dx.doi.org/10.1515/jisys-2018-0098.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract This paper explores a multi-strategy technique that aims at enriching text documents for improving clustering quality. We use a combination of entity linking and document summarization in order to determine the identity of the most salient entities mentioned in texts. To effectively enrich documents without introducing noise, we limit ourselves to the text fragments mentioning the salient entities, in turn, belonging to a knowledge base like Wikipedia, while the actual enrichment of text fragments is carried out using WordNet. To feed clustering algorithms, we investigate different document representations obtained using several combinations of document enrichment and feature extraction. This allows us to exploit ensemble clustering, by combining multiple clustering results obtained using different document representations. Our experiments indicate that our novel enriching strategies, combined with ensemble clustering, can improve the quality of classical text clustering when applied to text corpora like The British Broadcasting Corporation (BBC) NEWS.
38

Lee, Sungjoo, Letizia Mortara, Clive Kerr, Robert Phaal, and David Probert. "Analysis of document-mining techniques and tools for technology intelligence: discovering knowledge from technical documents." International Journal of Technology Management 60, no. 1/2 (2012): 130. http://dx.doi.org/10.1504/ijtm.2012.049102.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Rashaideh, Hasan, Ahmad Sawaie, Mohammed Azmi Al-Betar, Laith Mohammad Abualigah, Mohammed M. Al-laham, Ra’ed M. Al-Khatib, and Malik Braik. "A Grey Wolf Optimizer for Text Document Clustering." Journal of Intelligent Systems 29, no. 1 (July 21, 2018): 814–30. http://dx.doi.org/10.1515/jisys-2018-0194.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, text mining, and natural language processing. This presents the need for a potent document clustering algorithm that can be used effectively to navigate, summarize, and arrange information to congregate large data sets. This paper encompasses an adaptation of the grey wolf optimizer (GWO) for TCP, referred to as TCP-GWO. The TCP demands a degree of accuracy beyond that which is possible with metaheuristic swarm-based algorithms. The main issue to be addressed is how to split text documents on the basis of GWO into homogeneous clusters that are sufficiently precise and functional. Specifically, TCP-GWO, or referred to as the document clustering algorithm, used the average distance of documents to the cluster centroid (ADDC) as an objective function to repeatedly optimize the distance between the clusters of the documents. The accuracy and efficiency of the proposed TCP-GWO was demonstrated on a sufficiently large number of documents of variable sizes, documents that were randomly selected from a set of six publicly available data sets. Documents of high complexity were also included in the evaluation process to assess the recall detection rate of the document clustering algorithm. The experimental results for a test set of over a part of 1300 documents showed that failure to correctly cluster a document occurred in less than 20% of cases with a recall rate of more than 65% for a highly complex data set. The high F-measure rate and ability to cluster documents in an effective manner are important advances resulting from this research. The proposed TCP-GWO method was compared to the other well-established text clustering methods using randomly selected data sets. Interestingly, TCP-GWO outperforms the comparative methods in terms of precision, recall, and F-measure rates. In a nutshell, the results illustrate that the proposed TCP-GWO is able to excel compared to the other comparative clustering methods in terms of measurement criteria, whereby more than 55% of the documents were correctly clustered with a high level of accuracy.
40

Jahanbakhsh, Farnaz, Elnaz Nouri, Robert Sim, Ryen W. White, and Adam Fourney. "Understanding Questions that Arise When Working with Business Documents." Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (November 7, 2022): 1–24. http://dx.doi.org/10.1145/3555761.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
While digital assistants are increasingly used to help with various productivity tasks, less attention has been paid to employing them in the domain of business documents. To build an agent that can handle users' information needs in this domain, we must first understand the types of assistance that users desire when working on their documents. In this work, we present results from two user studies that characterize the information needs and queries of authors, reviewers, and readers of business documents. In the first study, we used experience sampling to collect users' questions in-situ as they were working with their documents, and in the second, we built a human-in-the-loop document Q&A system which rendered assistance with a variety of users' questions. Our results have implications for the design of document assistants that complement AI with human intelligence including whether particular skillsets or roles within the document are needed from human respondents, as well as the challenges around such systems.
41

Bhole, Pankaj Kailas, and A. J. Agrawal. "Extractive Based Single Document Text Summarization Using Clustering Approach." IAES International Journal of Artificial Intelligence (IJ-AI) 3, no. 2 (June 1, 2014): 73. http://dx.doi.org/10.11591/ijai.v3.i2.pp73-78.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text summarization is an old challenge in text mining but in dire need of researcher’s attention in the areas of computational intelligence, machine learning and natural language processing. We extract a set of features from each sentence that helps identify its importance in the document. Every time reading full text is time consuming. Clustering approach is useful to decide which type of data present in document. In this paper we introduce the concept of k-mean clustering for natural language processing of text for word matching and in order to extract meaningful information from large set of offline documents, data mining document clustering algorithm are adopted.
42

Sukatemin, Sukatemin, and Yuli Peristiowati. "Transformation Concept of Artificial Intelligencein the Early Identification of Tuberculosis Recovery Phase: A Systematic Literature Review." INDONESIAN JOURNAL OF HEALTH SCIENCES RESEARCH AND DEVELOPMENT (IJHSRD) 5, no. 1 (June 27, 2023): 30–41. http://dx.doi.org/10.36566/ijhsrd/vol5.iss1/144.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Background: Another chronic problem in tuberculosis (TB) eradication programs amid the widespread use of artificial intelligence (AI) in the health world is that early identification of the healing phase has not been found using similar technology. The purpose is to find if the transformation concept of AI will be able to help identify symptoms of early recovery of TB. Methods: This research used a document review with a descriptive design. Data was processed by PRISMA analysis. The keywords were Artificial Intelligence, convalescence phase, and tuberculosis. The data were filtered from Google Engine included from Google Scholar, ResearchGate, PubMed, and Semantic Scholar screened in the last 5 years (2018-2023), in English or Indonesian. The stages of document screening were adjusted to the PRISMA diagram, and were analyzed descriptively. Results: The study shows that the majority of AI studies discussed diagnosis (n=9 or 69.2%), only 3 documents (23.1%) discussed on TB treatment, and 1 document (7.7%) on monitoring. In conclusion, early identification of the recovery phase of TB patients is supported by previous researchers and can be done in the form of an application. Conclusions: Artificial intelligence in the TB eradication program has value especially if conducted integrated with other health programs.
43

Eckroth, Joshua. "Evolution of a Robust AI System: A Case Study of AAAI’s AI-Alert." AI Magazine 41, no. 4 (December 28, 2020): 17–38. http://dx.doi.org/10.1609/aimag.v41i4.5309.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Since mid-2018, we have used a suite of artificial intelligence (AI) technologies to automatically generate the Association for the Advancement of Artificial Intelligence’s AI-Alert, a weekly email sent to all Association for the Advancement of Artificial Intelligence members and thousands of other subscribers. This alert contains ten news stories from around the web that focus on some aspect of AI, such as new AI inventions, AI’s use in various industries, and AI’s impacts in our daily lives. This alert was curated by-hand for a decade before we developed AI technology for automation, which we call “NewsFinder.” Recently, we redesigned this automation and ran a six-month experiment on user engagement to ensure the new approach was successful. This article documents our design considerations and requirements, our implementation (which involves web crawling, document classification, and a genetic algorithm for story selection), and our reflections after a year and a half since deploying this technology.
44

Wang, Qin, Shou Ning Qu, Tao Du, and Ming Jing Zhang. "The Research and Application in Intelligent Document Retrieval Based on Text Quantification and Subject Mapping." Advanced Materials Research 605-607 (December 2012): 2561–68. http://dx.doi.org/10.4028/www.scientific.net/amr.605-607.2561.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nowadays, document retrieval was an important way of academic exchange and achieving new knowledge. Choosing corresponding category of database and matching the input key words was the traditional document retrieval method. Using the method, a mass of documents would be got and it was hard for users to find the most relevant document. The paper put forward text quantification method. That was mining the features of each element in some document, which including word concept, weight value for position function, improved weights characteristic value, text distribution function weights value and text element length. Then the word’ contributions to this document would be got from the combination of five elements characteristics. Every document in database was stored digitally by the contribution of elements. And a subject mapping scheme was designed in the paper, which the similarity calculation method based on contribution and association rule was firstly designed, according to the method, the documents in the database would be conducted text clustering, and then feature extraction method was used to find class subject. When searching some document, the description which users input would be quantified and mapped to some class automatically by subject mapping, then the document sequences would be retrieved by computing the similarity between the description and the other documents’ features in the class. Experiment shows that the scheme has many merits such as intelligence, accuracy as well as improving retrieval speed.
45

Kim, Sunhye, and Byungun Yoon. "Multi-document summarization for patent documents based on generative adversarial network." Expert Systems with Applications 207 (November 2022): 117983. http://dx.doi.org/10.1016/j.eswa.2022.117983.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Burget, Radek, and Pavel Smrz. "Extracting Visually Presented Element Relationships from Web Documents." International Journal of Cognitive Informatics and Natural Intelligence 7, no. 2 (April 2013): 13–29. http://dx.doi.org/10.4018/ijcini.2013040102.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Many documents in the World Wide Web present structured information that consists of multiple pieces of data with certain relationships among them. Although it is usually not difficult to identify the individual data values in the document text, their relationships are often not explicitly described in the document content. They are expressed by visual presentation of the document content that is expected to be interpreted by a human reader. In this paper, the authors propose a formal generic model of logical relationships in a document based on an interpretation of visual presentation patterns in the documents. The model describes the visually expressed relationships between individual parts of the contents independently of the document format and the particular way of presentation. Therefore, it can be used as an appropriate document model in many information retrieval or extraction applications. The authors formally define the model, the authors introduce a method of extracting the relationships between the content parts based on the visual presentation analysis and the authors discuss the expected applications. The authors also present a new dataset consisting of programmes of conferences and other scientific events and the authors discuss its suitability for the task in hand. Finally, the authors use the dataset to evaluate results of the implemented system.
47

Umehara, Masayuki, Koji Iwanuma, and Hidetomo Nabashima. "A Case-Based Recognition of Semantic Structures in HTML Documents Which Constitutes a Document Series." Transactions of the Japanese Society for Artificial Intelligence 17, no. 6 (2002): 690–98. http://dx.doi.org/10.1527/tjsai.17.690.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

LIU, YONGLI, YUANXIN OUYANG, and ZHANG XIONG. "INCREMENTAL CLUSTERING USING INFORMATION BOTTLENECK THEORY." International Journal of Pattern Recognition and Artificial Intelligence 25, no. 05 (August 2011): 695–712. http://dx.doi.org/10.1142/s0218001411008622.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Document clustering is one of the most effective techniques to organize documents in an unsupervised manner. In this paper, an Incremental method for document Clustering based on Information Bottleneck theory (ICIB) is presented. The ICIB is designed to improve the accuracy and efficiency of document clustering, and resolve the issue that an arbitrary choice of document similarity measure could produce an inaccurate clustering result. In our approach, document similarity is calculated using information bottleneck theory and documents are grouped incrementally. A first document is selected randomly and classified as one cluster, then each remaining document is processed incrementally according to the mutual information loss introduced by the merger of the document and each existing cluster. If the minimum value of mutual information loss is below a certain threshold, the document will be added to its closest cluster; otherwise it will be classified as a new cluster. The incremental clustering process is low-precision and order-dependent, which cannot guarantee accurate clustering results. Therefore, an improved sequential clustering algorithm (SIB) is proposed to adjust the intermediate clustering results. In order to test the effectiveness of ICIB method, ten independent document subsets are constructed based on the 20NewsGroup and Reuters-21578 corpora. Experimental results show that our ICIB method achieves higher accuracy and time performance than K-Means, AIB and SIB algorithms.
49

Dutta, Subhayu, Subhrangshu Adhikary, and Ashutosh Dhar Dwivedi. "VisFormers—Combining Vision and Transformers for Enhanced Complex Document Classification." Machine Learning and Knowledge Extraction 6, no. 1 (February 16, 2024): 448–63. http://dx.doi.org/10.3390/make6010023.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Complex documents have text, figures, tables, and other elements. The classification of scanned copies of different categories of complex documents like memos, newspapers, letters, and more is essential for rapid digitization. However, this task is very challenging as most scanned complex documents look similar. This is because all documents have similar colors of the page and letters, similar textures for all papers, and very few contrasting features. Several attempts have been made in the state of the art to classify complex documents; however, only a few of these works have addressed the classification of complex documents with similar features, and among these, the performances could be more satisfactory. To overcome this, this paper presents a method to use an optical character reader to extract the texts. It proposes a multi-headed model to combine vision-based transfer learning and natural-language-based Transformers within the same network for simultaneous training for different inputs and optimizers in specific parts of the network. A subset of the Ryers Vision Lab Complex Document Information Processing dataset containing 16 different document classes was used to evaluate the performances. The proposed multi-headed VisFormers network classified the documents with up to 94.2% accuracy, while a regular natural-language-processing-based Transformer network achieved 83%, and vision-based VGG19 transfer learning could achieve only up to 90% accuracy. The model deployment can help sort the scanned copies of various documents into different categories.
50

DMITRIY N., PETROV, and LUTSKO ANDREY N. "AUTOMATION OF ELECTRONIC-PRINTING OFFICE WORK WITH THE DOCUMENTS IDENTIFICATION AND VERIFICATION." CASPIAN JOURNAL: Control and High Technologies 54, no. 2 (2021): 81–89. http://dx.doi.org/10.21672/2074-1707.2021.53.1.081-089.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In the article the problem of electronic-printing office work, the documents identification and verification has been reviewed and analyzed. The technical feasibility of using algorithms for handwriting recognition, artificial intelligence and cryptography in electronic-printing office work has been substantiated. Authors are suggest an automated processing method of a paper document verification of its electronic counterpart authenticity. The algorithm of the alternative four-stage life cycle of the document and the method of its verification with the manual operations reduction and machine input are presented in detail. The structure of the integrated subsystem of machine input and documents verification used in the Edinaja informacionnaja sistema «Elektronniy Universitet» of an educational institution is presented. The results of the document automated processing and protection with training a multilayer convolutional neural network using the example of intermediate certification sheets were obtained. The possibility of applying the described methodology to strict reporting of a fixed structure documents with handwritten filling has been proved.

До бібліографії