Journal articles on the topic 'Web of document'

To see the other types of publications on this topic, follow the link: Web of document.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Web of document.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kim, Kwang-Hyun, Joung-Mi Choi, and Joon-Ho Lee. "Detecting Harmful Web Documents Based on Web Document Analyses." KIPS Transactions:PartD 12D, no. 5 (October 1, 2005): 683–88. http://dx.doi.org/10.3745/kipstd.2005.12d.5.683.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chawla, Suruchi. "Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering." International Journal of Information Retrieval Research 12, no. 1 (January 2022): 1–14. http://dx.doi.org/10.4018/ijirr.300367.

Full text
Abstract:
The volume of web search data stored in search engine log is increasing and has become big search log data. The web search log has been the source of data for mining based on web document clustering techniques to improve the efficiency and effectiveness of information retrieval. In this paper Deep Learning Model Convolution Neural Network(CNN) is used in big web search log data mining to learn the semantic representation of a document. These semantic documents vectors are clustered using K-means to group relevant documents for effective web document clustering. Experiment was done on the data set of web search query and associated clicked URLs to measure the quality of clusters based on document semantic representation using Deep learning model CNN. The clusters analysis was performed based on WCSS(the sum of squared distances of documents samples to their closest cluster center) and decrease in the WCSS in comparison to TF.IDF keyword based clusters confirm the effectiveness of CNN in web search log mining for effective web document clustering.
APA, Harvard, Vancouver, ISO, and other styles
3

Rani Manukonda, Sumathi, Asst Prof Kmit, Narayanguda ., Hyderabad ., Nomula Divya, Asst Prof Cmrit, Medchal ., and Hyderabad . "Efficient Document Clustering for Web Search Result." International Journal of Engineering & Technology 7, no. 3.3 (June 21, 2018): 90. http://dx.doi.org/10.14419/ijet.v7i3.3.14494.

Full text
Abstract:
Clustering the document in data mining is one of the traditional approach in which the same documents that are more relevant are grouped together. Document clustering take part in achieving accuracy that retrieve information for systems that identifies the nearest neighbors of the document. Day to day the massive quantity of data is being generated and it is clustered. According to particular sequence to improve the cluster qualityeven though different clustering methods have been introduced, still many challenges exist for the improvement of document clustering. For web search purposea document in group is efficiently arranged for the result retrieval.The users accordingly search query in an organized way. Hierarchical clustering is attained by document clustering.To the greatest algorithms for groupingdo not concentrate on the semantic approach, hence resulting to the unsatisfactory output clustering. The involuntary approach of organizing documents of web like Google, Yahoo is often considered as a reference. A distinct method to identify the existing group of similar things in the previously organized documents and retrieves effective document classifier for new documents. In this paper the main concentration is on hierarchical clustering and k-means algorithms, hence prove that k-means and its variant are efficient than hierarchical clustering along with this by implementing greedy fast k-means algorithm (GFA) for cluster document in efficient way is considered.
APA, Harvard, Vancouver, ISO, and other styles
4

An, Dong-Un, and In-Ho Kang. "Document Ranking of Web Document Retrieval Systems." Journal of Information Management 34, no. 2 (June 30, 2003): 55–66. http://dx.doi.org/10.1633/jim.2003.34.2.055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lee, Youngseok, and Jungwon Cho. "Web document classification using topic modeling based document ranking." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (June 1, 2021): 2386. http://dx.doi.org/10.11591/ijece.v11i3.pp2386-2392.

Full text
Abstract:
In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the efficiency of data retrieval can be improved because new information can be automatically classified and transmitted. By expanding the scope of the method to big data based web pages and improving it for application to various websites, it is expected that more effective information retrieval will be possible.
APA, Harvard, Vancouver, ISO, and other styles
6

Lecarpentier, Jean-Marc, Hervé Le Crosnier, Romain Brixtel, and Cyril Bazin. "Document Model and Prototyping Methods for Web Engineering." International Journal of Information System Modeling and Design 5, no. 4 (October 2014): 91–117. http://dx.doi.org/10.4018/ijismd.2014100105.

Full text
Abstract:
This paper proposes models for managing documents in a web engineering context. First, it proposes a document model to better manage multilingual composite documents. The approach, inspired by the FRBR report, is to group all versions, translations, formats, etc. of a document in a unique document tree, putting document data and metadata at the same level. Then it proposes a model for prototyping applications, using a combination of class-based inheritance and prototype programming principles. This model applies to document models, documents views and actions. Finally, it proposes a metadata management model, laying foundations for easier integration and management of information in web applications. The proposed models are implemented in the framework Sydonie and several applications are built with the model and framework.
APA, Harvard, Vancouver, ISO, and other styles
7

Radilova, Martina, Patrik Kamencay, Robert Hudec, Miroslav Benco, and Roman Radil. "Tool for Parsing Important Data from Web Pages." Applied Sciences 12, no. 23 (November 24, 2022): 12031. http://dx.doi.org/10.3390/app122312031.

Full text
Abstract:
This paper discusses the tool for the main text and image extraction (extracting and parsing the important data) from a web document. This paper describes our proposed algorithm based on the Document Object Model (DOM) and natural language processing (NLP) techniques and other approaches for extracting information from web pages using various classification techniques such as support vector machine, decision tree techniques, naive Bayes, and K-nearest neighbor. The main aim of the developed algorithm was to identify and extract the main block of a web document that contains the text of the article and the relevant images. The algorithm on a sample of 45 web documents of different types was applied. In addition, the issue of web pages, from the structure of the document to the use of the Document Object Model (DOM) for their processing, was analyzed. The Document Object Model was used to load and navigation of the document. It also plays an important role in the correct identification of the main block of web documents. The paper also discusses the levels of natural language. These methods of automatic natural language processing help to identify the main block of the web document. In this way, the all-textual parts and images from the main content of the web document were extracted. The experimental results show that our method achieved a final classification accuracy of 88.18%.
APA, Harvard, Vancouver, ISO, and other styles
8

Cheng, Wen Zhi, Yi Yang, Liao Zhang, and Lian Li. "Optimization for Web-Based Online Document Management." Advanced Materials Research 756-759 (September 2013): 1135–40. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1135.

Full text
Abstract:
In this paper, we construct a web-based document life-cycle management model. The model manages documents which consist of the institute library from their creation to the archive state. For an online office system, we aim at solving three issues: network delay, version storage problems and deletion strategy. To solve network delay, we propose both local and online document synchronized editing model. In addition, we combine the longest recursive chain with recursive chain time to optimize the system response time. In order to optimize documents to be deleted, we propose a two-step optimized method. In the performance test, the effectiveness of the method is confirmed to solve the issues of documents management.
APA, Harvard, Vancouver, ISO, and other styles
9

Chawla, Suruchi. "Application of Fuzzy C-Means Clustering and Semantic Ontology in Web Query Session Mining for Intelligent Information Retrieval." International Journal of Fuzzy System Applications 10, no. 1 (January 2021): 1–19. http://dx.doi.org/10.4018/ijfsa.2021010101.

Full text
Abstract:
Information retrieval based on keywords search retrieves irrelevant documents because of vocabulary gap between document content and search queries. The keyword vector representation of web documents is very high dimensional, and keyword terms are unable to capture the semantic of document content. Ontology has been built in various domains for representing the semantics of documents based on concepts relevant to document subject. The web documents often contain multiple topics; therefore, fuzzy c-means document clustering has been used for discovering clusters with overlapping boundaries. In this paper, the method is proposed for intelligent information retrieval using hybrid of fuzzy c-means clustering and ontology in query session mining. Thus, use of fuzzy clusters of web query session concept vector improve quality of clusters for effective web search. The proposed method was evaluated experimentally, and results show the improvement in precision of search results.
APA, Harvard, Vancouver, ISO, and other styles
10

Kim, Taeh Wan, Ho Cheol Jeon, and Joong Min Choi. "A New Document Representation Using a Unified Graph to Document Similarity Search." Advanced Materials Research 601 (December 2012): 394–400. http://dx.doi.org/10.4028/www.scientific.net/amr.601.394.

Full text
Abstract:
Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.
APA, Harvard, Vancouver, ISO, and other styles
11

Zhao, Liang, Yu Qing Lan, and Guang Hao Zhang. "A Method of Automatic Generation of Documents from WSDL to OWL-S." Applied Mechanics and Materials 668-669 (October 2014): 1202–7. http://dx.doi.org/10.4028/www.scientific.net/amm.668-669.1202.

Full text
Abstract:
A web service is a software function provided at a network address over the web or the cloud. Conventional web service is described by WSDL. As for the conventional web services, it has problems of discovering and selecting the most suitable web service. Semantic web service can solve these problems, and it is described by OWL-S. In this paper, we propose a method of automatic generation of documents from WSDL to OWL-S. To accomplish this, we first translate XML document to OWL document through the relationship between XML Schema and the ontology. We use a mapping rule from XML Schema to Schema tree and Element tree. As for the WSDL document is written by XML. And OWL-S document is written by OWL. We translate WSDL document to OWL-S document based on mapping between XML document and OWL document. At present, most of web service is described by WSDL. According to this method, we can not only translate from WSDL document to OWL-S document easily, but also make the generation more efficient. Thus, we provide support for extensive of web services and service composition and discovery. Through the use of an example, the paper verifies the method is feasible and effective.
APA, Harvard, Vancouver, ISO, and other styles
12

TSEKOURAS, GEORGE E., and DAMIANOS GAVALAS. "AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING." International Journal of Software Engineering and Knowledge Engineering 23, no. 06 (August 2013): 869–86. http://dx.doi.org/10.1142/s021819401350023x.

Full text
Abstract:
This article presents a novel crawling and clustering method for extracting and processing cultural data from the web in a fully automated fashion. Our architecture relies upon a focused web crawler to download web documents relevant to culture. The focused crawler is a web crawler that searches and processes only those web pages that are relevant to a particular topic. After downloading the pages, we extract from each document a number of words for each thematic cultural area, filtering the documents with non-cultural content; we then create multidimensional document vectors comprising the most frequent cultural term occurrences. We calculate the dissimilarity between the cultural-related document vectors and for each cultural theme, we use cluster analysis to partition the documents into a number of clusters. Our approach is validated via a proof-of-concept application which analyzes hundreds of web pages spanning different cultural thematic areas.
APA, Harvard, Vancouver, ISO, and other styles
13

Burget, Radek, and Pavel Smrz. "Extracting Visually Presented Element Relationships from Web Documents." International Journal of Cognitive Informatics and Natural Intelligence 7, no. 2 (April 2013): 13–29. http://dx.doi.org/10.4018/ijcini.2013040102.

Full text
Abstract:
Many documents in the World Wide Web present structured information that consists of multiple pieces of data with certain relationships among them. Although it is usually not difficult to identify the individual data values in the document text, their relationships are often not explicitly described in the document content. They are expressed by visual presentation of the document content that is expected to be interpreted by a human reader. In this paper, the authors propose a formal generic model of logical relationships in a document based on an interpretation of visual presentation patterns in the documents. The model describes the visually expressed relationships between individual parts of the contents independently of the document format and the particular way of presentation. Therefore, it can be used as an appropriate document model in many information retrieval or extraction applications. The authors formally define the model, the authors introduce a method of extracting the relationships between the content parts based on the visual presentation analysis and the authors discuss the expected applications. The authors also present a new dataset consisting of programmes of conferences and other scientific events and the authors discuss its suitability for the task in hand. Finally, the authors use the dataset to evaluate results of the implemented system.
APA, Harvard, Vancouver, ISO, and other styles
14

Shipman, Jean. "Document Delivery Suppliers Web Page." Journal of Interlibrary Loan, Document Delivery & Information Supply 9, no. 2 (December 16, 1998): 1–62. http://dx.doi.org/10.1300/j110v09n02_01.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Balasubramanian, V., and Alf Bashian. "Document management and Web technologies." Communications of the ACM 41, no. 7 (July 1998): 107–15. http://dx.doi.org/10.1145/278476.278498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Prichystal, Jan, and Jiri Rybicka. "Web interface for document typesetting." Zpravodaj Československého sdružení uživatelů TeXu 14, no. 3-4 (2004): 190–95. http://dx.doi.org/10.5300/2004-3-4/190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Agosti, Maristella, and Massimo Melucci. "Evaluation of web document retrieval." ACM SIGIR Forum 33, no. 1 (September 1999): 23–27. http://dx.doi.org/10.1145/331403.331409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Hammouda, K. M., and M. S. Kamel. "Efficient phrase-based document indexing for Web document clustering." IEEE Transactions on Knowledge and Data Engineering 16, no. 10 (October 2004): 1279–96. http://dx.doi.org/10.1109/tkde.2004.58.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bai, Yun. "A Formal Approach for Securing XML Document." International Journal of Secure Software Engineering 1, no. 1 (January 2010): 41–53. http://dx.doi.org/10.4018/jsse.2010102003.

Full text
Abstract:
With the ever increasing demand for the Web-based applications over the Internet, the related security issue has become a great concern. Web document security has been studied by many researchers and various security mechanisms have been proposed. The aim of this article is to investigate the security issue of the XML documents. We discuss a protection mechanism and investigate a formal approach to ensure the security of Web-based XML documents. Our approach starts by introducing a high level language to specify an XML document and its protection authorizations. We also discuss and investigate the syntax and semantics of the language. The flexible and powerful access control specification can effectively protect the documents from unauthorized attempts.
APA, Harvard, Vancouver, ISO, and other styles
20

Venkata, Divya Ragatha, and Deepika Kulshreshtha. "Techniques for Refreshing Images in Web Documents." Advanced Materials Research 403-408 (November 2011): 1008–13. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.1008.

Full text
Abstract:
In this paper, we put forward a technique for keeping web pages up-to-date, later used by search engine to serve the end user queries. A major part of the Web is dynamic and hence, a need arises to constantly update the changed web documents in search engine’s repository. In this paper we used the client-server architecture for crawling the web and propose a technique for detecting changes in web page based on the content of the images present if any in web documents. Once it is being identified that the image embedded in the web document is changed then the previous copy of the web document present in the search engine’s database/repository is replaced with the changed one.
APA, Harvard, Vancouver, ISO, and other styles
21

Chahal, Poonam, Manjeet Singh, and Suresh Kumar. "Web documents semantic similarity by extending document ontology using current trends." International Journal of Web Science 3, no. 1 (2017): 1. http://dx.doi.org/10.1504/ijws.2017.088673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Chahal, Poonam, Manjeet Singh, and Suresh Kumar. "Web documents semantic similarity by extending document ontology using current trends." International Journal of Web Science 3, no. 1 (2017): 1. http://dx.doi.org/10.1504/ijws.2017.10009569.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

I Nyoman Purnama and Ida Bagus Kresna Sudiatmika. "Sosialisasi Penerapan Sistem Informasi Dokumen Perencanaan Berbasis Web di Bappeda Gianyar." ABDIKAN: Jurnal Pengabdian Masyarakat Bidang Sains dan Teknologi 1, no. 2 (May 30, 2022): 183–88. http://dx.doi.org/10.55123/abdikan.v1i2.278.

Full text
Abstract:
Document management of an institution/organization is very important to do. Documents are not managed properly, resulting in reduced work efficiency. The Regional Development Planning Agency or Bappeda Gianyar is a Regional Apparatus Organization in Gianyar Regency - Bali, which is tasked with coordinating the preparation, control and evaluation of the implementation of regional development plans (RPD). Collection and recording of planning documents in the form of softcopies from Regional Apparatus Organizations sent via Whatsapp to different employees in the Program Sector, to allow the existence of these softcopies to be scattered, and also when there is a hardcopy of planning document borrowing to the Program Preparation Division, the recording of the loan is not recorded so well that it would be difficult to trace. In this community service, a Planning Document Information System was designed which aims to serve as a forum for the management of planning documents managed by Bappeda also Research and Development of Gianyar Regency. In order to provide an understanding in terms of business processes and workflows of the system that has been built, in this community service training/socialization of the use of the planning document system is held. So it is hoped that after the socialization of the use of the planning document system, it can provide convenience and work efficiency for state officials in managing planning documents.
APA, Harvard, Vancouver, ISO, and other styles
24

Singh, Nanhay, R. K. Chauhan, and Raghuraj Singh. "Inference Document Type (Dtd) From Xml Document: Web Structure Mining." International Journal of Computer Applications 7, no. 9 (October 10, 2010): 6–9. http://dx.doi.org/10.5120/1280-1645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Cai, Fei, Honghui Chen, and Zhen Shu. "Web document ranking via active learning and kernel principal component analysis." International Journal of Modern Physics C 26, no. 04 (February 25, 2015): 1550041. http://dx.doi.org/10.1142/s0129183115500412.

Full text
Abstract:
Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.
APA, Harvard, Vancouver, ISO, and other styles
26

Konecky, Joan Latta, and Carla Rosenquist-Buhler. "WEB BROWSERS: UNTANGLING THE WORLD WIDE WEB." Education Libraries 19, no. 1 (September 5, 2017): 13. http://dx.doi.org/10.26443/el.v19i1.77.

Full text
Abstract:
Not only are Internet resources expanding exponentially, but they are becoming more sophisticated, incorporating a variety of multimedia and hypertext components. Internet documents on the World Wide Web may contain elaborately formatted text, color graphics, audio, and video as well as dynamic connections to other Internet resources via hypertext links. In addition to providing user-friendly access to hypermedia resources, most Web browsers (client software) provide a rich graphical environment for authoring and displaying electronic documents locally. This article describes the World Wide Web and a sampling of the available Web browsers. It then discusses a testproject developed at the University of Nebraska-Lincoln Libraries designed to explore the potential, demands, and pitfalls of Web access to the Internet, as well as to investigate hypermedia document creation in an academic libraryenvironment. The experiences with the project qonfirmed the importance of the World Wide Web and Web browsers to this environment, so much so that providing access to these Internet resources must be seen as mandatory to any academic or upper level educational library providing electronic information access.
APA, Harvard, Vancouver, ISO, and other styles
27

Bebbington, Laurence W. "IT in Law and Law on the Web." Legal Information Management 1, no. 2 (2001): 66–69. http://dx.doi.org/10.1017/s1472669600000463.

Full text
Abstract:
The use of intelligent technologies in legal document preparation continue to develop so as to integrate with new software and operating system developments. At the end of May products were launched by both the West Group (WestCiteLink) and LexisNexis containing smart tag features allowing integration between Windows XP (Microsoft's new operating system to be launched in the UK in October) and legal databases. As users type, smart tags activate automatically and find and identify legal citations to statutes, cases etc. Once identified a set of options can provide automatic link creation to the documents on Westlaw or LexisNexis, without the user having to leave the Word document. WestCiteLink is the result of a strategic alliance between West Group and Microsoft. As the cursor hangs over the citation the user is presented with options which can:• find the document on Westlaw• determine the KeyCite history• identify citing references• create a hyperlink to the document• mark the cite for inclusion in a Table of Authorities.
APA, Harvard, Vancouver, ISO, and other styles
28

Im, Yeong-Hui. "A Post Web Document Clustering Algorithm." KIPS Transactions:PartB 9B, no. 1 (February 1, 2002): 7–16. http://dx.doi.org/10.3745/kipstb.2002.9b.1.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Adetunji, A. B., J. P. Oguntoye, O. D. Fenwa, and N. O. Akande. "Web Document Classification Using Naïve Bayes." Journal of Advances in Mathematics and Computer Science 29, no. 6 (December 16, 2018): 1–11. http://dx.doi.org/10.9734/jamcs/2018/34128.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Amato, Giuseppe, Fausto Rabitti, and Pasquale Savino. "Multimedia document search on the Web." Computer Networks and ISDN Systems 30, no. 1-7 (April 1998): 604–6. http://dx.doi.org/10.1016/s0169-7552(98)00096-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

He, Xiaofeng, Hongyuan Zha, Chris H.Q. Ding, and Horst D. Simon. "Web document clustering using hyperlink structures." Computational Statistics & Data Analysis 41, no. 1 (November 2002): 19–45. http://dx.doi.org/10.1016/s0167-9473(02)00070-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yun-tao, Zhang, Gong Ling, and Wang Yong-cheng. "Hierarchical subtopic segmentation of web document." Wuhan University Journal of Natural Sciences 11, no. 1 (January 2006): 47–50. http://dx.doi.org/10.1007/bf02831702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Priestley, Michael. "Managing web relationships with document structures." Markup Languages: Theory and Practice 2, no. 3 (August 1, 2000): 235–54. http://dx.doi.org/10.1162/109966200750363607.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Hu, Bo, Florian Lauck, and Jan Scheffczyk. "How Recent is a Web Document?" Electronic Notes in Theoretical Computer Science 157, no. 2 (May 2006): 147–66. http://dx.doi.org/10.1016/j.entcs.2005.12.052.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Diana, Diana, and Yozi Dwi Putra. "Sequential Search Algorithm Implementation on Web-Based EDocument Search Application." CESS (Journal of Computer Engineering, System and Science) 7, no. 1 (December 27, 2021): 106. http://dx.doi.org/10.24114/cess.v7i1.26114.

Full text
Abstract:
Technological developments have entered the era of the Industrial Revolution 4.0. The Industrial Revolution 4.0 is called the digital era. One of the keys that must be improved in this era is the ability to manage data that is safe and precise in accordance with applicable regulations. The first step taken in managing documents is converting the document into an e-document and saving it to an application. The e-document application that was built is a web-based application because of the ease in accessing e-documents, it can be done anywhere and anytime via internet access. Search is a fundamental process in processing large amounts of data. Search requires an algorithm. The sequential search algorithm is the simplest search model performed on a data set. The test results on the application show that the e-document application can run well on the web, the source code in the application and the sequential search algorithm logic path in the application are declared valid, and the data search process finds results that match the keywords.
APA, Harvard, Vancouver, ISO, and other styles
36

Kusuma, Aniek Suryanti, and Komang Sri Aryati. "SISTEM PENGARSIPAN DOKUMEN AKREDITASI BERBASIS WEB." Jurnal Teknologi Informasi dan Komputer 5, no. 1 (February 5, 2019): 139–47. http://dx.doi.org/10.36002/jutik.v5i1.647.

Full text
Abstract:
ABSTRACT<br />A college must be accredited which the accreditation form ( Borang ) is used as a reference in the quality and feasibility assessment of a study program that conducted by the National Accreditation Board of Higher Education (BAN PT). The college will build a team which consist of some divisions that have their responsibilties for composing the documents based on the accreditation standard. Unfortunatelly, the team faces some problems in composing process. The main problem is the delay in collecting documents that make the team leader could not recapilutating documents. The other problem is the team leader unable to monitor the accreditation progress since there is no monitoring system. This research make a system for managing documents, so the team leader could monitor easier the completeness of document. This application would be developed using HTML, CSS, Javascript, and PHP programming languages. This archive system will store all documents in one place that could be accessed from anywhere. In addition, this system has a facility to communicate between divisions. The system testing using black box testing method to ensure all functions running properly. From the results of this research, it could be concluded this document archive system can monitor the document collection process that make accreditation process run smoothly.<br />Keywords : Document, Accreditation, Archive System<br />ABSTRAK<br />Sebuah perguruan tinggi harus terakreditasi dimana Borang digunakan sebagai referensi dalam proses penilaian kualitas dan kelayakan dari sebuah program studi yang dilakukan oleh Badan Akreditasi Nasional Pendidikan Tinggi ( BAN PT ). Perguruan tinggi akan membentuk sebuah tim khusus, yang terdiri dari beberapa divisi yang memiliki tanggung jawab masing-masing dalam penyusunan buku Borang berdasarkan standar yang sudah di tentukan. Tetapi dalam proses penyusunan, tim mengalami beberapa kendala. Kendala utamanya adalah keterlambatan pengumpulan dokumen sehingga kepala tim mengalami masalah dalam proses perekapan dokumen. Selain itu, kepala tim tidak dapat memantau perkembangan persiapan akreditasi dari masing-masing divisi karena tidak ada sistem pengelolaan dokumen.Penelitian ini akan membuat sebuah sistem pengelolaan dokumen-dokumen Borang sehingga mempermudah kepala tim untuk memantau kelengkapan dokumen. Pembangunan aplikasi ini menggunakan bahasa pemrograman HTML, CSS, Javascript dan PHP. Sistem pengarsipan ini akan menyimpan semua dokumen dalam satu wardah yang dapat diakses dari manapun. Selain itu, terdapat fasilitas untuk dapat berkomunikasi antar divisi.Pengujian sistem menggunakan metode black box testing bertujuan untuk memastikan bahwa sistem yang telah dibuat dapat berjalan dengan baik atau tidak. Dari hasil penelitian ini dapat disimpulkan bahwa sistem pengarsipan dokumen akreditasi dapat membantu proses pengumpulan berkas sehingga proses akreditasi dapat berjalan dengan lancar.<br />Kata kunci : Dokumen, Akreditasi, Sistem Pengarsipan
APA, Harvard, Vancouver, ISO, and other styles
37

Biber, Douglas, Jesse Egbert, and Mark Davies. "Exploring the composition of the searchable web: a corpus-based taxonomy of web registers." Corpora 10, no. 1 (April 2015): 11–45. http://dx.doi.org/10.3366/cor.2015.0065.

Full text
Abstract:
One major challenge for Web-As-Corpus research is that a typical Web search provides little information about the register of the documents that are searched. Previous research has attempted to address this problem (e.g., through the Automatic Genre Identification initiative), but with only limited success. As a result, we currently know surprisingly little about the distribution of registers on the web. In this study, we tackle this problem through a bottom-up user-based investigation of a large, representative corpus of web documents. We base our investigation on a much larger corpus than those used in previous research (48,571 web documents), and obtained through random sampling from across the full range of documents that are publically available on the searchable web. Instead of relying on individual expert coders, we recruit typical end-users of the Web for register coding, with each document in the corpus coded by four different raters. End-users identify basic situational characteristics of each web document, coded in a hierarchical manner. Those situational characteristics lead to general register categories, which eventually lead to lists of specific sub-registers. By working through a hierarchical decision tree, users are able to identify the register category of most Internet texts with a high degree of reliability. After summarising our methodological approach, this paper documents the register composition of the searchable web. Narrative registers are found to be the most prevalent, while Opinion and Informational Description/Explanation registers are also found to be extremely common. One of the major innovations of the approach adopted here is that it permits an empirical identification of ‘hybrid’ documents, which integrate characteristics from multiple general register categories (e.g., opinionated-narrative). These patterns are described and illustrated through sample Internet documents.
APA, Harvard, Vancouver, ISO, and other styles
38

Nair, B. J. Bipin, Gopikrishna Ashok, and N. R. Sreekumar. "Binarization of Ancient Malayalam Documents - A Novel Weight-based Denoising Approach." Webology 18, Special Issue 04 (September 30, 2021): 813–31. http://dx.doi.org/10.14704/web/v18si04/web18167.

Full text
Abstract:
Even though several studies exist on denoising degraded documents, now a days it is a tedious task in the field of document image processing because ancient document may contain several degradations which will be a barrier for reader. Here we use old Malayalam Grantha scripts that contain useful information like the poem titled ‘Njana Stuthi’ and ancient literature. These historical documents are losing content due to heavy degradations such as, ink bleed, fungi-found to be brittleness & show through. In order to remove these kind of degradations, the study is proposing a novel binarization algorithm which remove noises from Grantha scripts as well as notebook images and make the document readable. Here we use 500 datasets of Grantha scripts for experimentation. In our proposed method, binarization is done through a channel based method in which we are converting image in to RGB, further adding weights to make the image darker or brighter followed by morphological operation open and finally passing it RGB and HSV channel for more clarity and clear separation of black text and white background, remaining noise will be removed using adaptive thresholding technique. The proposed method is outperformed with good accuracy.
APA, Harvard, Vancouver, ISO, and other styles
39

Ito, Masashi, Tomohiro Ohno, and Shigeki Matsubara. "Text-Style Conversion of Speech Transcript into Web Document for Lecture Archive." Journal of Advanced Computational Intelligence and Intelligent Informatics 13, no. 4 (July 20, 2009): 499–505. http://dx.doi.org/10.20965/jaciii.2009.p0499.

Full text
Abstract:
It is very significant to the knowledge society to accumulate spoken documents on the web. However, because of the high redundancy of spontaneous speech, the faithfully transcribed text is not readable on an Internet browser, and therefore not suitable as a web document. This paper proposes a technique for converting spoken documents into web documents for the purpose of building a speech archiving system. The technique edits automatically transcribed texts and improves their readability on the browser. The readable text can be generated by applying technology such as paraphrasing, segmentation, and structuring transcribed texts. Editing experiments using lecture data demonstrated the feasibility of the technique. A prototype system of spoken document archiving was implemented to confirm its effectiveness.
APA, Harvard, Vancouver, ISO, and other styles
40

Su, Zhong, Qiang Yang, Hongjiang Zhang, Xiaowei Xu, Yu-Hen Hu, and Shaoping Ma. "Correlation-Based Web Document Clustering for Adaptive Web Interface Design." Knowledge and Information Systems 4, no. 2 (April 2002): 151–67. http://dx.doi.org/10.1007/s101150200002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Bagban, T. I., and P. J. Kulkarni. "On Applying Document Similarity Measures for Template based Clustering of Web Documents." International Journal of Computer Sciences and Engineering 06, no. 01 (February 28, 2018): 37–42. http://dx.doi.org/10.26438/ijcse/v6si1.3742.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Pimentel, Maria da Graça, Dick C. A. Bulterman, and Luiz Fernando Gomes Soares. "Document engineering approaches toward scalable and structured multimedia, web and printable documents." Multimedia Tools and Applications 43, no. 3 (May 8, 2009): 195–202. http://dx.doi.org/10.1007/s11042-009-0288-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Boutaounte, Mehdi, Driss Naji, M. Fakir, B. Bouikhalene, and A. Merbouha. "Tifinaghe Document Converter." International Journal of Computer Vision and Image Processing 3, no. 3 (July 2013): 54–68. http://dx.doi.org/10.4018/ijcvip.2013070104.

Full text
Abstract:
Recognition of documents has become a basic necessity for two reasons: first to secure the existing data in paper because of the limited of their lives duration and the high rate of destruction insects, fire or humidity secondly to reduce space of archives. The aim of this work is to realize a converter that detects images and text within a document image taken by a scanner and applying a system for the recognition of characters (OCR) in order to obtain a web page (HTML extension) ready to be used in the same computer or on the web hosts to be accessible by everyone.
APA, Harvard, Vancouver, ISO, and other styles
44

PERA, MARIA SOLEDAD, and YIU-KAI NG. "A NAÏVE BAYES CLASSIFIER FOR WEB DOCUMENT SUMMARIES CREATED BY USING WORD SIMILARITY AND SIGNIFICANT FACTORS." International Journal on Artificial Intelligence Tools 19, no. 04 (August 2010): 465–86. http://dx.doi.org/10.1142/s0218213010000285.

Full text
Abstract:
Text classification categorizes web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified web documents to identify the ones with contents that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. We further enhance CorSum by considering the significance factor of sentences in documents, in addition to using word-correlation factors, for document summarization. We denote the enhanced approach CorSum-SF and use the summaries generated by CorSum-SF to train a Multinomial Naïve Bayes classifier for categorizing web document summaries into predefined classes. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum-SF outperforms other extractive summarization methods, and classification time (accuracy, respectively) is significantly reduced (compatible, respectively) using CorSum-SF generated summaries compared with using the entire documents. More importantly, browsing summaries, instead of entire documents, which are assigned to predefined categories, facilitates the information search process on the Web.
APA, Harvard, Vancouver, ISO, and other styles
45

GUAN, SHENG-UEI, and P. MCMULLEN. "ORGANIZING INFORMATION ON THE NEXT GENERATION WEB — DESIGN AND IMPLEMENTATION OF A NEW BOOKMARK STRUCTURE." International Journal of Information Technology & Decision Making 04, no. 01 (March 2005): 97–115. http://dx.doi.org/10.1142/s0219622005001404.

Full text
Abstract:
The next-generation Web will increase the need for a highly organized and ever evolving method to store references to Web objects. These requirements could be realized by the development of a new bookmark structure. This paper endeavors to identify the key requirements of such a bookmark, specifically in relation to Web documents, and sets out a suggested design through which these needs may be accomplished. A prototype developed offers such features as the sharing of bookmarks between users and groups of users. Bookmarks for Web documents in this prototype allow more specific information to be stored such as: URL, the document type, the document title, keywords, a summary, user annotations, date added, date last visited and date last modified. Individuals may access the service from anywhere on the Internet, as long as they have a Java-enabled Web browser.
APA, Harvard, Vancouver, ISO, and other styles
46

Qumsiyeh, Rani, and Yiu-Kai Ng. "Searching web documents using a summarization approach." International Journal of Web Information Systems 12, no. 1 (April 18, 2016): 83–101. http://dx.doi.org/10.1108/ijwis-11-2015-0039.

Full text
Abstract:
Purpose The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results with contents addressing the same topic, which should allow the user to quickly identify the information covered in the clustered search results. Web search engines, such as Google, Bing and Yahoo!, rank the set of documents S retrieved in response to a user query and represent each document D in S using a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e. assisting its users to quickly identify results of interest. These snippets are inadequate in providing distinct information and capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional information. Furthermore, a document title is not always a good indicator of the content of the corresponding document either. Design/methodology/approach The authors propose to develop a query-based summarizer, called QSum, in solving the existing problems of Web search engines which use titles and abstracts in capturing the contents of retrieved documents. QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest by skipping the step to browse through the retrieved documents one by one. Findings Experimental results show that QSum is effective and efficient in creating a high-quality summary for each cluster to enhance Web search. Originality/value The proposed query-based summarizer, QSum, is unique based on its searching approach. QSum is also a significant contribution to the Web search community, as it handles the ambiguous problem of a search query by creating summaries in response to different interpretations of the search which offer a “road map” to assist users to quickly identify information of interest.
APA, Harvard, Vancouver, ISO, and other styles
47

Fadllullah, Arif, Dasrit Debora Kamudi, Muhamad Nasir, Agus Zainal Arifin, and Diana Purwitasari. "WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS." Jurnal Ilmu Komputer dan Informasi 9, no. 1 (February 15, 2016): 17. http://dx.doi.org/10.21609/jiki.v9i1.362.

Full text
Abstract:
Ant-based document clustering is a cluster method of measuring text documents similarity based on the shortest path between nodes (trial phase) and determines the optimal clusters of sequence document similarity (dividing phase). The processing time of trial phase Ant algorithms to make document vectors is very long because of high dimensional Document-Term Matrix (DTM). In this paper, we proposed a document clustering method for optimizing dimension reduction using Singular Value Decomposition-Principal Component Analysis (SVDPCA) and Ant algorithms. SVDPCA reduces size of the DTM dimensions by converting freq-term of conventional DTM to score-pc of Document-PC Matrix (DPCM). Ant algorithms creates documents clustering using the vector space model based on the dimension reduction result of DPCM. The experimental results on 506 news documents in Indonesian language demonstrated that the proposed method worked well to optimize dimension reduction up to 99.7%. We could speed up execution time efficiently of the trial phase and maintain the best F-measure achieved from experiments was 0.88 (88%).
APA, Harvard, Vancouver, ISO, and other styles
48

Poonkuzhali, G., R. Kishore Kumar, R. Kripa Keshav, P. Sudhakar, and K. Sarukesi. "Correlation Based Method to Detect and Remove Redundant Web Document." Advanced Materials Research 171-172 (December 2010): 543–46. http://dx.doi.org/10.4028/www.scientific.net/amr.171-172.543.

Full text
Abstract:
The enrichment of internet has resulted in the flooding of abundant information on WWW with more replicas. As the duplicated web pages increase the indexing space and time complexity, finding and removing these pages becomes significant for search engines and other likely system which will improve on accuracy of search results as well as search speed. Web content mining plays a vital role in resolving these aspects. Existing algorithm for web content mining focus attention on applying weightage to structured documents whereas in this research work, a mathematical approach based on linear correlation is developed to detect and remove the duplicates present in both structured and unstructured web document. In the proposed work, linear correlation between two web documents is found out. If the correlated value is 1 then the documents are said to be exactly redundant and it should be eliminated otherwise not redundant.
APA, Harvard, Vancouver, ISO, and other styles
49

BOURBAKIS, N., W. MENG, C. ZHANG, Z. WU, N. J. SALERNO, and S. BOREK. "RETRIEVAL OF MULTIMEDIA WEB DOCUMENTS AND REMOVAL OF REDUNDANT INFORMATION." International Journal on Artificial Intelligence Tools 08, no. 01 (March 1999): 19–42. http://dx.doi.org/10.1142/s0218213099000038.

Full text
Abstract:
This paper describes a search engine for multimedia web documents and a methodology for removing (partially or totally) redundant information from multiple documents in an effort to synthesize new documents. In this paper, a typical multimedia document contains free text and images and additionally has associating well-structured data. An SQL-like query language, WebSSQL, is proposed to retrieve this type of documents. The main differences between WebSSQL and other proposed SQL extensions for retrieving web documents are that WebSSQL is similarity-based and supports conditions on images. This paper also deals with the detection and removal of redundant information (text paragraphs and images) from multiple retrieved documents. Documents reporting the same or related events and stories may contain substantial redundant information. The removal of the redundant information and the synthesis of these documents into a single document can not only save a user's time to acquire the information but also storage space to archive the data. The methodology reported here consists of techniques for analyzing text paragraphs and images as well as a set of similarity criteria used to detect redundant paragraphs and images. Examples are provided to illustrate these techniques.
APA, Harvard, Vancouver, ISO, and other styles
50

Obidallah, Waeal J., Bijan Raahemi, and Waleed Rashideh. "Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques." Data 7, no. 5 (May 4, 2022): 57. http://dx.doi.org/10.3390/data7050057.

Full text
Abstract:
We propose a multi-layer data mining architecture for web services discovery using word embedding and clustering techniques to improve the web service discovery process. The proposed architecture consists of five layers: web services description and data preprocessing; word embedding and representation; syntactic similarity; semantic similarity; and clustering. In the first layer, we identify the steps to parse and preprocess the web services documents. In the second layer, Bag of Words with Term Frequency–Inverse Document Frequency and three word-embedding models are employed for web services representation. In the third layer, four distance measures, namely, Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities between Web services documents. In layer four, WordNet and Normalized Google Distance are employed to represent and find the similarity between web services documents. Finally, in the fifth layer, three clustering algorithms, namely, affinity propagation, K-means, and hierarchical agglomerative clustering, are investigated for clustering of web services based on observed similarities in documents. We demonstrate how each component of the five layers is employed in web services clustering using randomly selected web services documents. We conduct experimental analysis to cluster web services using a collected dataset consisting of web services documents and evaluate their clustering performances. Using a ground truth for evaluation purposes, we observe that clusters built based on the word embedding models performed better than those built using the Bag of Words with Term Frequency–Inverse Document Frequency model. Among the three word embedding models, the pre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services. Among the three semantic similarity measures, path-based WordNet similarity reported higher clustering performance. By considering the different word representations models and syntactic and semantic similarity measures, we found that the affinity propagation clustering technique performed better in discovering similarities among Web services.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography