To see the other types of publications on this topic, follow the link: HTML documents.

Journal articles on the topic 'HTML documents'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'HTML documents.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bonhomme, Stéphane, and Cécile Roisin. "Interactively restructuring HTML documents." Computer Networks and ISDN Systems 28, no. 7-11 (May 1996): 1075–84. http://dx.doi.org/10.1016/0169-7552(96)00042-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sato, S. y. "Dynamic rewriting of HTML documents." Computer Networks and ISDN Systems 27, no. 2 (November 1994): 307–8. http://dx.doi.org/10.1016/s0169-7552(94)90147-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

von Tetzchner, J. Stephenson. "Converting formatted documents to HTML." Computer Networks and ISDN Systems 27, no. 2 (November 1994): 309–10. http://dx.doi.org/10.1016/s0169-7552(94)90154-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

O, Geum-Yong, and In-Jun Hwang. "Automatically Converting HTML Documents with Similar Pattern into XML Documents." KIPS Transactions:PartD 9D, no. 3 (June 1, 2002): 355–64. http://dx.doi.org/10.3745/kipstd.2002.9d.3.355.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

KAJI, NOBUHIRO, and MASARU KITSUREGAWA. "Acquiring Polar Sentences from HTML Documents." Journal of Natural Language Processing 15, no. 3 (2008): 77–90. http://dx.doi.org/10.5715/jnlp.15.3_77.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Gupta, Suhit, Gail E. Kaiser, Peter Grimm, Michael F. Chiang, and Justin Starren. "Automating Content Extraction of HTML Documents." World Wide Web 8, no. 2 (June 2005): 179–224. http://dx.doi.org/10.1007/s11280-004-4873-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Vállez, Mari, Rafael Pedraza-Jiménez, Lluís Codina, Saúl Blanco, and Cristòfol Rovira. "A semi-automatic indexing system based on embedded information in HTML documents." Library Hi Tech 33, no. 2 (June 15, 2015): 195–210. http://dx.doi.org/10.1108/lht-12-2014-0114.

Full text
Abstract:
Purpose – The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying and suggesting keywords from a thesaurus according to the embedded information in HTML documents. This enables the parameterization of keyword assignment based on how frequently the terms appear in the document, the relevance of their position, and the combination of both. Design/methodology/approach – In order to evaluate the efficiency of the indexing tool, the descriptors/keywords suggested by the indexing tool are compared to the keywords which have been indexed manually by human experts. To make this comparison a corpus of HTML documents are randomly selected from a journal devoted to Library and Information Science. Findings – The results of the evaluation show that there: first, is close to a 50 per cent match or overlap between the two indexing systems, however, if you take into consideration the related terms and the narrow terms the matches can reach 73 per cent; and second, the first terms identified by the tool are the most relevant. Originality/value – The tool presented identifies the most important keywords in an HTML document based on the embedded information in HTML documents. Nowadays, representing the contents of documents with keywords is an essential practice in areas such as information retrieval and e-commerce.
APA, Harvard, Vancouver, ISO, and other styles
8

THIEMANN, PETER. "A typed representation for HTML and XML documents in Haskell." Journal of Functional Programming 12, no. 4-5 (July 2002): 435–68. http://dx.doi.org/10.1017/s0956796802004392.

Full text
Abstract:
We define a family of embedded domain specific languages for generating HTML and XML documents. Each language is implemented as a combinator library in Haskell. The generated HTML/XML documents are guaranteed to be well-formed. In addition, each library can guarantee that the generated documents are valid XML documents to a certain extent (for HTML only a weaker guarantee is possible). On top of the libraries, Haskell serves as a meta language to define parameterized documents, to map structured documents to HTML/XML, to define conditional content, or to define entire web sites. The combinator libraries support element-transforming style, a programming style that allows programs to have a visual appearance similar to HTML/XML documents, without modifying the syntax of Haskell.
APA, Harvard, Vancouver, ISO, and other styles
9

Gupta, Shivangi, and Mukesh Rawat. "Keyword based Automatic Summarization of HTML Documents." International Journal of Computer Applications 127, no. 8 (October 15, 2015): 24–29. http://dx.doi.org/10.5120/ijca2015906421.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Qi, Xing-shu Chen, Kai Zhu, and Chun-hui Wang. "Relevance-based content extraction of HTML documents." Journal of Central South University 19, no. 7 (July 2012): 1921–26. http://dx.doi.org/10.1007/s11771-012-1226-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Plch, Roman, and Petra Sarmanova. "Interactive 3D Graphics in HTML and PDF Documents." Zpravodaj Československého sdružení uživatelů TeXu 18, no. 1-2 (2008): 76–92. http://dx.doi.org/10.5300/2008-1-2/76.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Jann, Ben. "Creating HTML or Markdown Documents from within Stata using Webdoc." Stata Journal: Promoting communications on statistics and Stata 17, no. 1 (March 2017): 3–38. http://dx.doi.org/10.1177/1536867x1701700102.

Full text
Abstract:
In this article, I discuss the use of webdoc for creating HTML or Markdown documents from within Stata. The webdoc command provides a way to embed HTML or Markdown code directly in a do-file and automate the integration of results from Stata in the final document. The command can be used, for example, to create a webpage documenting your data analysis, including all Stata output and graphs. More generally, the command can be used to create and maintain a website that contains results computed by Stata.
APA, Harvard, Vancouver, ISO, and other styles
13

SHINZATO, KEIJI, and KENTARO TORISAWA. "Automatic acquisition of hyponymy relations from HTML documents." Journal of Natural Language Processing 12, no. 1 (2005): 125–50. http://dx.doi.org/10.5715/jnlp.12.125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Pau, Gregoire, and Wolfgang Huber. "The hwriter package: Composing HTML documents with R objects." R Journal 1, no. 1 (2009): 22. http://dx.doi.org/10.32614/rj-2009-009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Lim, Jong-Gyun. "Using Coollists to index HTML documents in the Web." Computer Networks and ISDN Systems 28, no. 1-2 (December 1995): 147–54. http://dx.doi.org/10.1016/0169-7552(95)00114-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Umehara, Masayuki, Koji Iwanuma, and Hirokazu Nagai. "A Case-Based Semi-automatic Transformation from HTML Documents to XML Ones — Using the Similarity between HTML Documents Constituting a Series —." Transactions of the Japanese Society for Artificial Intelligence 16, no. 5 (2001): 408–16. http://dx.doi.org/10.1527/tjsai.16.408.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Haghish, E. F. "Markdoc: Literate Programming in Stata." Stata Journal: Promoting communications on statistics and Stata 16, no. 4 (December 2016): 964–88. http://dx.doi.org/10.1177/1536867x1601600409.

Full text
Abstract:
Rigorous documentation of the analysis plan, procedure, and computer codes enhances the comprehensibility and transparency of data analysis. Documentation is particularly critical when the codes and data are meant to be publicly shared and examined by the scientific community to evaluate the analysis or adapt the results. The popular approach for documenting computer codes is known as literate programming, which requires preparing a trilingual script file that includes a programming language for running the data analysis, a human language for documentation, and a markup language for typesetting the document. In this article, I introduce markdoc, a software package for interactive literate programming and generating dynamic-analysis documents in Stata. markdoc recognizes Mark-down, LATEX, and HTML markup languages and can export documents in several formats, such as PDF, Microsoft Office .docx, OpenOffice and LibreOffice .odt, LATEX, HTML, ePub, and Markdown.
APA, Harvard, Vancouver, ISO, and other styles
18

Manabe, Tomohiro, and Keishi Tajima. "Extracting logical hierarchical structure of HTML documents based on headings." Proceedings of the VLDB Endowment 8, no. 12 (August 2015): 1606–17. http://dx.doi.org/10.14778/2824032.2824058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ashraf, F., T. Ozyer, and R. Alhajj. "Employing Clustering Techniques for Automatic Information Extraction From HTML Documents." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, no. 5 (September 2008): 660–73. http://dx.doi.org/10.1109/tsmcc.2008.923882.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

ZHANG, LIHUA, and YIU-KAI NG. "A Query Engine for Retrieving Information from Chinese HTML Documents." International Journal of Computer Processing of Languages 17, no. 03 (September 2004): 135–64. http://dx.doi.org/10.1142/s0219427904001085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Umehara, Masayuki, Koji Iwanuma, and Hidetomo Nabashima. "A Case-Based Recognition of Semantic Structures in HTML Documents Which Constitutes a Document Series." Transactions of the Japanese Society for Artificial Intelligence 17, no. 6 (2002): 690–98. http://dx.doi.org/10.1527/tjsai.17.690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Andarwati, Hayu, R. Rizal Isnanto, and Ike Pertiwi Windasari. "Sistem Informasi Manajemen Surat pada Dinas Pendapatan, Pengelolaan Keuangan dan Aset Daerah Kabupaten Pati." Jurnal Teknologi dan Sistem Komputer 2, no. 3 (August 31, 2014): 195–202. http://dx.doi.org/10.14710/jtsiskom.2.3.2014.195-202.

Full text
Abstract:
Abstract - Documents handling at Dinas Pendapatan, Pengelolaan Keuangan dan Aset Daerah Kabupaten Pati was done manually and was not computerized. Because of that, it is needed to build documents management information system to automate the document control activity at department include the function of documents recording, documents making, and documents tracking. For gaining the purpose, a research must be done. That research used Framework for Application System Thinking (FAST) with eight phase, that is scope definition, problem analyst, requirement analyst, logical design, decision analyst, physical design and integration, construction and testing, and also installation and delivery. This research was implemented by using PHP, HTML, and Javascript as programming language and MySQL as database. System testing was done by using black box testing method.The result from this research is documents management information system.
APA, Harvard, Vancouver, ISO, and other styles
23

WIELEMAKER, JAN, ZHISHENG HUANG, and LOURENS VAN DER MEIJ. "SWI-Prolog and the web." Theory and Practice of Logic Programming 8, no. 3 (May 2008): 363–92. http://dx.doi.org/10.1017/s1471068407003237.

Full text
Abstract:
AbstractProlog is an excellent tool for representing and manipulating data written in formal languages as well as natural language. Its safe semantics and automatic memory management make it a prime candidate for programming robust Web services. Although Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers, development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This article motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. The benefits of using Prolog for Web-related tasks are illustrated using three case studies.
APA, Harvard, Vancouver, ISO, and other styles
24

Mattheos, Nikos, Anders Nattestad, and Rolf Attström. "Local CD-ROM in interaction with HTML documents over the Internet." European Journal of Dental Education 4, no. 3 (August 2000): 124–27. http://dx.doi.org/10.1034/j.1600-0579.2000.040306.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

CABEZA, DANIEL, and MANUEL HERMENEGILDO. "Distributed WWW programming using (Ciao-)Prolog and the PiLLoW library." Theory and Practice of Logic Programming 1, no. 3 (May 2001): 251–82. http://dx.doi.org/10.1017/s147106840100117x.

Full text
Abstract:
We discuss from a practical point of view a number of issues involved in writing distributed Internet and WWW applications using LP/CLP systems. We describe PiLLoW, a public-domain Internet and WWW programming library for LP/CLP systems that we have designed to simplify the process of writing such applications. PiLLoW provides facilities for accessing documents and code on the WWW; parsing, manipulating and generating HTML and XML structured documents and data; producing HTML forms; writing form handlers and CGI-scripts; and processing HTML/XML templates. An important contribution of PiLLoW is to model HTML/XML code (and, thus, the content of WWW pages) as terms. The PiLLoW library has been developed in the context of the Ciao Prolog system, but it has been adapted to a number of popular LP/CLP systems, supporting most of its functionality. We also describe the use of concurrency and a high-level model of client-server interaction, Ciao Prolog's active modules, in the context of WWW programming. We propose a solution for client-side downloading and execution of Prolog code, using generic browsers. Finally, we also provide an overview of related work on the topic.
APA, Harvard, Vancouver, ISO, and other styles
26

BRZEMINSKI, PAWEL, and WITOLD PEDRYCZ. "TEXTUAL-BASED CLUSTERING OF WEB DOCUMENTS." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 12, no. 06 (December 2004): 715–43. http://dx.doi.org/10.1142/s021848850400317x.

Full text
Abstract:
In our study we presented an effective method for clustering of Web pages. From flat HTML files we extracted keywords, formed feature vectors as representation of Web pages and applied them to a clustering method. We took advantage of the Fuzzy C-Means clustering algorithm (FCM). We demonstrated an organized and schematic manner of data collection. Various categories of Web pages were retrieved from ODP (Open Directory Project) in order to create our datasets. The results of clustering proved that the method performs well for all datasets. Finally, we presented a comprehensive experimental study examining: the behavior of the algorithm for different input parameters, internal structure of datasets and classification experiments.
APA, Harvard, Vancouver, ISO, and other styles
27

Pereira, R. A. Marques, A. Molinari, and G. Pasi. "Contextual weighted representations and indexing models for the retrieval of HTML documents." Soft Computing 9, no. 7 (November 19, 2004): 481–92. http://dx.doi.org/10.1007/s00500-004-0361-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Al-Dallal, Ammar, and Rasha S. Abdul-Wahab. "GA on IR." International Journal of Artificial Life Research 3, no. 2 (April 2012): 1–14. http://dx.doi.org/10.4018/jalr.2012040101.

Full text
Abstract:
Increasing the growth rates of websites’ number has led to the challenge of assisting Web customers in finding appropriate details from the Internet using an intelligent search engine. Information retrieval (IR) is an essential and useful strategy for Web users; thus, different strategies and techniques are designed for such purpose. Currently, the focus on the usefulness of Artificial Intelligence (AI) has been improved with IR. One AI area is Evolutionary Computation (EC), which is based on designs of natural selection. A traditional and important strategy in EC is Genetic Algorithm (GA); this paper adopts the GA technique to enhance the retrieval of HTML documents. This improvement is obtained by creating a modern evaluation function and applying a hybrid crossover operator. The proposed evaluation function is based on term proximity, keyword probability within the document, and HTML tag weight query. Experimental results are compared with two well known evaluation function functions applied in IR domain which are Okapi-BM25 and Bayesian interface network model. The results demonstrate a good level of enhancement to the recall and precision. In addition, the documents retrieved by the proposed system were more accurate and relevant to the queries than that retrieved by other models.
APA, Harvard, Vancouver, ISO, and other styles
29

Goto, Kento, Ryosuke Koshijima, and Motomichi Toyama. "Responsive HTML generation using SuperSQL." International Journal of Web Information Systems 13, no. 3 (August 21, 2017): 324–51. http://dx.doi.org/10.1108/ijwis-04-2017-0032.

Full text
Abstract:
Purpose With the rapid spread of smartphones and tablets, it is becoming necessary for web developers to create responsive web pages which are visually appealing on devices of various sizes. However, building responsive UIs is a very challenging task, requiring deep knowledge of HTML and CSS. This paper aims to propose an approach to generate responsive web pages using SuperSQL, which is an extension of SQL that can format data retrieved from a database into various kinds of structured documents. Design/methodology/approach By incorporating the methodology of bootstrap, a grid-based framework for front-end development, the authors have made it possible to create responsive web pages from simple SuperSQL queries. In addition, by utilizing SuperSQL’s unique feature that can describe the structure of the output web page, the authors have proposed and implemented a mechanism to automatically optimize the web content’s size and position. Findings In the evaluation, the authors created some actual web applications with and without the use of SuperSQL and compared the code amount (number of lines). As a result, when using the proposed system, the amount of code was reduced to about 1/5. The authors also compared the layout generated by the proposed automatic layout generation mechanism with the responsive layout that was generated manually. As a result, the automatic layout generation mechanism created the same layout as the manually created layout 74.8 per cent of the time, and the user satisfaction level turned out to be 85.8 per cent. Originality/value The way to generate a responsive HTML by using a single SuperSQL query, and the mechanism for automatic responsive layout generation.
APA, Harvard, Vancouver, ISO, and other styles
30

Rajagopal, Prabha, Sri Devi Ravana, Yun Sing Koh, and Vimala Balakrishnan. "Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment." Aslib Journal of Information Management 71, no. 1 (January 21, 2019): 2–17. http://dx.doi.org/10.1108/ajim-04-2018-0086.

Full text
Abstract:
Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges’ involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents’ effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.
APA, Harvard, Vancouver, ISO, and other styles
31

Zhang, Xiaoming, Pengtao Lv, Chongchong Zhao, and Jianxian Wang. "A Method for Materials Knowledge Extraction from HTML Tables Based on Sibling Comparison." International Journal of Software Engineering and Knowledge Engineering 26, no. 06 (August 2016): 897–926. http://dx.doi.org/10.1142/s0218194016500303.

Full text
Abstract:
There are rich data resources residing in available materials websites, and most of these data resources are shown in the form of HTML tables. However, it is difficult to distinguish the attributes and values because of the semi-structured feature of HTML tables. Therefore, identifying attributes in HTML tables is the key issue for the information acquisition. In this paper, based on sibling comparison, a method for materials knowledge extraction from HTML tables is proposed, which consists of three steps: acquiring sibling tables, identifying table pattern and extracting table data. We show how to use [Formula: see text]-measure to find the appropriate thresholds for matching of tables from materials websites when acquiring sibling tables. Further, we propose a strategy named FRFC (i.e. the First Row matching and First Column matching) to distinguish attributes and values, so that table pattern is identified. Moreover, the data from HTML tables is extracted based on their corresponding table patterns and mapped to a predefined schema, which will facilitate the population to materials ontology. The proposed approach is applicable to circumstances, where an attribute in the table may span multiple cells and matched attributes in sibling tables are more. We acquire desired accuracy ([Formula: see text]%) through using FRFC for identifying table pattern. The time about extraction may not increase significantly with increasing number of documents and cells in tables, so our approach is effective to process a large number of documents. A prototype named MTES is developed and demonstrates the effectiveness of our proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
32

Adefowoke Ojokoh, Bolanle, Olumide Sunday Adewale, and Samuel Oluwole Falaki. "Automated document metadata extraction." Journal of Information Science 35, no. 5 (June 11, 2009): 563–70. http://dx.doi.org/10.1177/0165551509105195.

Full text
Abstract:
Web documents are available in various forms, most of which do not carry additional semantics. This paper presents a model for general document metadata extraction. The model, which combines segmentation by keywords and pattern matching techniques, was implemented using PHP, MySQL, JavaScript and HTML. The system was tested with 40 randomly selected PDF documents (mainly theses). An evaluation of the system was done using standard criteria measures namely precision, recall, accuracy and F-measure. The results show that the model is relatively effective for the task of metadata extraction, especially for theses and dissertations. A combination of machine learning with these rule-based methods will be explored in the future for better results.
APA, Harvard, Vancouver, ISO, and other styles
33

Peroni, Silvio, Francesco Osborne, Angelo Di Iorio, Andrea Giovanni Nuzzolese, Francesco Poggi, Fabio Vitali, and Enrico Motta. "Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles." PeerJ Computer Science 3 (October 2, 2017): e132. http://dx.doi.org/10.7717/peerj-cs.132.

Full text
Abstract:
PurposeThis paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops.DesignRASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow.FindingsThe evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML.Research LimitationsThe evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML.Practical ImplicationsRASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers.Social ImplicationsRASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).ValueRASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework.
APA, Harvard, Vancouver, ISO, and other styles
34

HaCohen-Kerner, Yaakov, Ittay Stern, David Korkus, and Erick Fredj. "AUTOMATIC MACHINE LEARNING OF KEYPHRASE EXTRACTION FROM SHORT HTML DOCUMENTS WRITTEN IN HEBREW." Cybernetics and Systems 38, no. 1 (January 2007): 1–21. http://dx.doi.org/10.1080/01969720600998546.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sanka, Anoop, Shravan Chamakura, and Sharma Chakravarthy. "A dataflow approach to efficient change detection of HTML/XML documents in WebVigiL." Computer Networks 50, no. 10 (July 2006): 1547–63. http://dx.doi.org/10.1016/j.comnet.2005.10.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Rait, Aishanou Osha, and K. S. Venkatesh. "Automatic Language-Independent Indexing of Documents Using Image Processing." Advanced Materials Research 403-408 (November 2011): 817–22. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.817.

Full text
Abstract:
Image processing techniques have been used over the years to convert printed material into electronic form. In our work we exploit the fact that some applications may find such conversions redundant and yet satisfactorily meet the demands of the end user. Using the horizontal and vertical white-spaces present in any document, independent regions of text, pictures, tables etc. could be identified. Inherent characteristic disparities were then used to distinguish pictures from text, and section-headings from the explanations that follow them. A table of contents, showing the heading and the associated page number, was generated and displayed on the browser. Each heading was hyperlinked to the corresponding page of the original document. HTML code was written dynamically, using file handling techniques in MATLAB to accommodate for variable number of headings obtained for different documents and also from different pages of a single document. The platform thus developed was tested on various languages and it was verified that the method implemented was language independent.
APA, Harvard, Vancouver, ISO, and other styles
37

Rau, Pei-Luen Patrick, and Sho-Hsen Chen. "A Study of Electronic Annotation on Web Documents." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 46, no. 5 (September 2002): 680–84. http://dx.doi.org/10.1177/154193120204600517.

Full text
Abstract:
This study develops an electronic annotation system, allowing users to annotate on hypertexts, to build up knowledge structure, and to browse instructions provided by the system administrator or the instructor electronically. The electronic annotation system is a distributed World Wide Web application based on HTTP access and allows annotations on HTML documents. The major functions of the electronic annotation system include highlighting texts, inserting and editing annotations, and organizing and presenting annotations hierarchically. The five interactive components of the electronic annotation system are Main Tool Bar, Hypertext, Annotation Editor, Hierarchy Viewer, and Instruction Viewer. A user test was conducted to investigate the effect of the location of electronic annotating (Annotation Editor) on reading performance in terms of recall and degree of satisfaction.
APA, Harvard, Vancouver, ISO, and other styles
38

Boutaounte, Mehdi, Driss Naji, M. Fakir, B. Bouikhalene, and A. Merbouha. "Tifinaghe Document Converter." International Journal of Computer Vision and Image Processing 3, no. 3 (July 2013): 54–68. http://dx.doi.org/10.4018/ijcvip.2013070104.

Full text
Abstract:
Recognition of documents has become a basic necessity for two reasons: first to secure the existing data in paper because of the limited of their lives duration and the high rate of destruction insects, fire or humidity secondly to reduce space of archives. The aim of this work is to realize a converter that detects images and text within a document image taken by a scanner and applying a system for the recognition of characters (OCR) in order to obtain a web page (HTML extension) ready to be used in the same computer or on the web hosts to be accessible by everyone.
APA, Harvard, Vancouver, ISO, and other styles
39

Yamasaki, Takahiro, and Kin-ichiroh Tokiwa. "A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures." IEEJ Transactions on Electronics, Information and Systems 132, no. 9 (2012): 1524–32. http://dx.doi.org/10.1541/ieejeiss.132.1524.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Vadivu, P. Shanmuga, P. Sumathy, and A. Vadivel. "Ranking images in web documents based on HTML TAGs for image retrieval from WWW." International Journal of Computational Intelligence Studies 3, no. 2/3 (2014): 176. http://dx.doi.org/10.1504/ijcistudies.2014.062730.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

NAKANO, Y. "InCom: Support System for Informal Communication in 3D Virtual Worlds Generated from HTML Documents." IEICE Transactions on Information and Systems E88-D, no. 5 (May 1, 2005): 872–79. http://dx.doi.org/10.1093/ietisy/e88-d.5.872.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Yamasaki, Takahiro, and Kin-Ichiroh Tokiwa. "A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures." Electronics and Communications in Japan 97, no. 10 (September 8, 2014): 1–10. http://dx.doi.org/10.1002/ecj.11565.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

DEL ROSARIO, Marco Jr, and Julius SARENO. "Theses and Capstone Projects Plagiarism Checker using Kolmogorov Complexity Algorithm." Walailak Journal of Science and Technology (WJST) 17, no. 7 (July 1, 2020): 726–44. http://dx.doi.org/10.48048/wjst.2020.6498.

Full text
Abstract:
In education, students attempt to copy previous works and are relying on prepared solutions available on the Internet in order to meet their requirements. This action leads to plagiarism, which is becoming part of educational institutions’ concern to reduce growing academic dishonesty. With regards to the aforementioned issue, this study aims to design and develop a plagiarism checker capable of registering documents, granting access to users, and calculating the similarity between documents. Thus, the software was constructed using HTML, PHP, JavaScript, CSS, and MySQL. The developed system is composed of three main modules; the Document Search which enables users to browse documents, the Document Registration which enables the administrator to add and manage the stored documents, and the document Comparison, which serves as the system plagiarism detection mechanism. The algorithm Normalized Compression Distance was used to measure similarity and the Boyer-Moore Algorithm to highlight the suspected plagiarized document. Moreover, tests were conducted to determine if the system is functioning as expected and to measure the accuracy of the output produced by the system. The developed system was evaluated using the ISO 25010 software quality model in terms of Product Quality and was rated by one hundred respondents. The system obtained a mean of 4.70 which is equivalent to “excellent” in descriptive terms. This validates that the objectives of the study were met and achieved. This further indicates that the system was developed according to its desired functions and requirements.
APA, Harvard, Vancouver, ISO, and other styles
44

WEITZ, WOLFGANG. "COMBINING STRUCTURED DOCUMENTS WITH HIGH-LEVEL PETRI-NETS FOR WORKFLOW MODELING IN INTERNET-BASED COMMERCE." International Journal of Cooperative Information Systems 07, no. 04 (December 1998): 275–96. http://dx.doi.org/10.1142/s0218843098000131.

Full text
Abstract:
This article discusses the application of a new variant of high-level Petri nets, the so-called SGML nets, for modeling business processes in the area of Internet-based commerce. SGML nets are designed to capture the process of generating and manipulating structured documents based on the international standard SGML. Since the currently most relevant document standards on the Internet are HTML (an SGML application) and XML (a subset of SGML), SGML nets offer an elegant way to integrate central aspects of Electronic Commerce applications, such as the generation of online product catalogs, processing of online orders, and electronic document interchange between companies, into a unified formal workflow model. The article gives an introduction to the central concepts of SGML nets and includes an example of their application from the area of online order processing.
APA, Harvard, Vancouver, ISO, and other styles
45

Haghish, E. F. "Rethinking Literate Programming in Statistics." Stata Journal: Promoting communications on statistics and Stata 16, no. 4 (December 2016): 938–63. http://dx.doi.org/10.1177/1536867x1601600408.

Full text
Abstract:
Literate programming is becoming increasingly trendy for data analysis because it allows the generation of dynamic-analysis reports for communicating data analysis and eliminates untraceable human errors in analysis reports. Traditionally, literate programming includes separate processes for compiling the code and preparing the documentation. While this workflow might be satisfactory for software documentation, it is not ideal for writing statistical analysis reports. Instead, these processes should run in parallel. In this article, I introduce the weaver package, which examines this idea by creating a new log system in HTML or LATEX that can be used simultaneously with the Stata log system. The new log system provides many features that the Stata log system lacks; for example, it can render mathematical notations, insert figures, create publication-ready dynamic tables, and style text, and it includes a built-in syntax highlighter. The weaver package also produces dynamic PDF documents by converting the HTML log to PDF or by typesetting the LATEX log and thus provides a real-time preview of the document without recompiling the code. I also discuss potential applications of the weaver package.
APA, Harvard, Vancouver, ISO, and other styles
46

Kusuma, Aniek Suryanti, and Komang Sri Aryati. "SISTEM PENGARSIPAN DOKUMEN AKREDITASI BERBASIS WEB." Jurnal Teknologi Informasi dan Komputer 5, no. 1 (February 5, 2019): 139–47. http://dx.doi.org/10.36002/jutik.v5i1.647.

Full text
Abstract:
ABSTRACT<br />A college must be accredited which the accreditation form ( Borang ) is used as a reference in the quality and feasibility assessment of a study program that conducted by the National Accreditation Board of Higher Education (BAN PT). The college will build a team which consist of some divisions that have their responsibilties for composing the documents based on the accreditation standard. Unfortunatelly, the team faces some problems in composing process. The main problem is the delay in collecting documents that make the team leader could not recapilutating documents. The other problem is the team leader unable to monitor the accreditation progress since there is no monitoring system. This research make a system for managing documents, so the team leader could monitor easier the completeness of document. This application would be developed using HTML, CSS, Javascript, and PHP programming languages. This archive system will store all documents in one place that could be accessed from anywhere. In addition, this system has a facility to communicate between divisions. The system testing using black box testing method to ensure all functions running properly. From the results of this research, it could be concluded this document archive system can monitor the document collection process that make accreditation process run smoothly.<br />Keywords : Document, Accreditation, Archive System<br />ABSTRAK<br />Sebuah perguruan tinggi harus terakreditasi dimana Borang digunakan sebagai referensi dalam proses penilaian kualitas dan kelayakan dari sebuah program studi yang dilakukan oleh Badan Akreditasi Nasional Pendidikan Tinggi ( BAN PT ). Perguruan tinggi akan membentuk sebuah tim khusus, yang terdiri dari beberapa divisi yang memiliki tanggung jawab masing-masing dalam penyusunan buku Borang berdasarkan standar yang sudah di tentukan. Tetapi dalam proses penyusunan, tim mengalami beberapa kendala. Kendala utamanya adalah keterlambatan pengumpulan dokumen sehingga kepala tim mengalami masalah dalam proses perekapan dokumen. Selain itu, kepala tim tidak dapat memantau perkembangan persiapan akreditasi dari masing-masing divisi karena tidak ada sistem pengelolaan dokumen.Penelitian ini akan membuat sebuah sistem pengelolaan dokumen-dokumen Borang sehingga mempermudah kepala tim untuk memantau kelengkapan dokumen. Pembangunan aplikasi ini menggunakan bahasa pemrograman HTML, CSS, Javascript dan PHP. Sistem pengarsipan ini akan menyimpan semua dokumen dalam satu wardah yang dapat diakses dari manapun. Selain itu, terdapat fasilitas untuk dapat berkomunikasi antar divisi.Pengujian sistem menggunakan metode black box testing bertujuan untuk memastikan bahwa sistem yang telah dibuat dapat berjalan dengan baik atau tidak. Dari hasil penelitian ini dapat disimpulkan bahwa sistem pengarsipan dokumen akreditasi dapat membantu proses pengumpulan berkas sehingga proses akreditasi dapat berjalan dengan lancar.<br />Kata kunci : Dokumen, Akreditasi, Sistem Pengarsipan
APA, Harvard, Vancouver, ISO, and other styles
47

Iwaniak, A., I. Kaczmarek, J. Łukowicz, M. Strzelecki, S. Coetzee, and W. Paluszyński. "SEMANTIC METADATA FOR HETEROGENEOUS SPATIAL PLANNING DOCUMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-4/W1 (September 5, 2016): 27–36. http://dx.doi.org/10.5194/isprs-annals-iv-4-w1-27-2016.

Full text
Abstract:
Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
APA, Harvard, Vancouver, ISO, and other styles
48

Kitaev, Evgeny L’vovich, and Rimma Yuryevna Skornyakova. "Leveraging Semantic Markups for Incorporating External Resources Data to the Content of a Web Page." Russian Digital Libraries Journal 23, no. 3 (May 9, 2020): 494–513. http://dx.doi.org/10.26907/1562-5419-2020-23-3-494-513.

Full text
Abstract:
The semantic markups of the World Wide Web have accumulated a large amount of data and their number continues to grow. However, the potential of these data is, in our opinion, not fully utilized. The semantic markups contents are widely used by search systems, partly by social networks, but the usual approach to using that data by application developers is based on converting data to RDF standard and executing SPARQL queries, which requires good knowledge of this language and programming skills. In this paper, we propose to leverage the semantic markups available on the Web to automatically incorporate their contents to the content of other web pages. We also present a software tool for implementing such incorporation that does not require a web page developer to have knowledge of any programming languages ​​other than HTML and CSS. The developed tool does not require installation, the work is performed by JavaScript plugins. Currently, the tool supports semantic data contained in the popular types of semantic markups “microdata” and JSON-LD, in the tags of HTML documents and the properties of Word and PDF documents.
APA, Harvard, Vancouver, ISO, and other styles
49

Denoyer, Ludovic, and Patrick Gallinari. "Un modèle de mixture de modèles génératifs pour les documents structurés multimédias. Application à la classification de documents XML et HTML." Document numérique 8, no. 3 (September 1, 2004): 35–54. http://dx.doi.org/10.3166/dn.8.3.35-54.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Haghish, E. F. "On the importance of syntax coloring for teaching statistics." Stata Journal: Promoting communications on statistics and Stata 19, no. 1 (March 2019): 83–86. http://dx.doi.org/10.1177/1536867x19830892.

Full text
Abstract:
In this article, I underscore the importance of syntax coloring in teaching statistics. I also introduce the statax package, which includes JavaScript and LATEX programs for highlighting Stata code in HTML and LATEX documents. Furthermore, I provide examples showing how to implement this package for developing educational materials on the web or for a classroom handout.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography