Relevant bibliographies by topics / Web Crawler

Academic literature on the topic 'Web Crawler'

Author: Grafiati

Published: 4 June 2021

Last updated: 25 April 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Web Crawler.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Web Crawler"

Feng, Guilian. "Implementation of Web Data Mining Technology Based on Python." Journal of Physics: Conference Series 2066, no. 1 (November 1, 2021): 012033. http://dx.doi.org/10.1088/1742-6596/2066/1/012033.

Full text

Abstract:

Abstract With the arrival of the era of big data, people have gradually realized the importance of data. Data is not just a resource, it is an asset. This paper mainly studies the realization of Web data mining technology based on Python. This paper analyzes the overall architecture design of distributed web crawler system, and then analyzes in detail the principles of crawler’s URL function module, crawler’s web crawl function module, crawler’s web page parsing function module, crawler’s data storage function module and so on. Each function module of the crawler system was tested on the experimental computer, and the data information was summarized for comparative analysis. The main significance of this paper lies in the design and implementation of a distributed web crawler system, which, to a certain extent, solves the problems of slow speed, low efficiency and poor scalability of traditional single computer web crawler, and improves the speed and efficiency of web crawler in grasping information and web page data.

APA, Harvard, Vancouver, ISO, and other styles

Liu, Dong Fei, and Xian Shuang Fan. "Study and Application of Web Crawler Algorithm Based on Heritrix." Advanced Materials Research 219-220 (March 2011): 1069–72. http://dx.doi.org/10.4028/www.scientific.net/amr.219-220.1069.

Full text

Abstract:

In this paper, the web crawler in search engine was introduced firstly, based on the detailed analysis of the system architecture about open source web crawler Heritrix, proposed design of a particular parser, parsed the particular web site to achieve the purpose of particular crawl. Then by eliminating the impact on individual processors caused by robots.txt file, and introduced the ELFHash algorithm implements the purpose of efficient, multi-thread access to the web crawler resources. Finally, by the comparison of the speed of crawl web page between before-improved and after-improved, and the analysis of the number of crawled pages in the same long time, verify the performance of the after-improved web crawler has been more obvious increased.

APA, Harvard, Vancouver, ISO, and other styles

Yang, Juan. "Analysis on the Judicial Interpretation of the Crawler Technology Infringing on the Intellectual Property Rights of Enterprise Data." E3S Web of Conferences 251 (2021): 01038. http://dx.doi.org/10.1051/e3sconf/202125101038.

Full text

Abstract:

In the actual process of web crawler infringement and criminal identification, there is a theory of “weakening the infringement typology and strengthening the presumption of legal interest”. This is also the basic method for the subsequent identification of infringement in the commercial application of enterprise data through crawler technology. This article summarizes the criminal risks of abusing crawler technology based on previous work experience. The author discusses the judicial interpretation of crawler technology infringing the legal benefits of corporate data intellectual property rights from the categories of data types crawled by various crawlers and the types of data crawled by crawlers determine the types of applicable laws, increase the concept of trade secrets, and determine the standards and other judicial interpretation clauses three sides.

APA, Harvard, Vancouver, ISO, and other styles

Boppana, Venugopal, and Sandhya P. "Focused crawling from the basic approach to context aware notification architecture." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 2 (February 1, 2019): 492. http://dx.doi.org/10.11591/ijeecs.v13.i2.pp492-498.

Full text

Abstract:

<p><span lang="EN-IN">The large and wide range of information has become a tough time for crawlers and search engines to extract related information. This paper discusses about focused crawlers also called as topic specific crawler and variations of focused crawlers leading to distributed architecture, i.e., context aware notification architecture. To get the relevant pages from a huge amount of information available in the internet we use the focused crawler. This can bring out the relevant pages for the given topic with less number of searches in a short time. Here the input to the focused crawler is a topic specified using exemplary documents, but not using the keywords. Focused crawlers avoid the searching of all the web documents instead it searches over the links that are relevant to the crawler boundary. The Focused crawling mechanism helps us to save CPU time to large extent to keep the crawl up-to-date.</span></p>

APA, Harvard, Vancouver, ISO, and other styles

Mani Sekhar, S. R., G. M. Siddesh, Sunilkumar S. Manvi, and K. G. Srinivasa. "Optimized Focused Web Crawler with Natural Language Processing Based Relevance Measure in Bioinformatics Web Sources." Cybernetics and Information Technologies 19, no. 2 (June 1, 2019): 146–58. http://dx.doi.org/10.2478/cait-2019-0021.

Full text

Abstract:

Abstract In the fast growing of digital technologies, crawlers and search engines face unpredictable challenges. Focused web-crawlers are essential for mining the boundless data available on the internet. Web-Crawlers face indeterminate latency problem due to differences in their response time. The proposed work attempts to optimize the designing and implementation of Focused Web-Crawlers using Master-Slave architecture for Bioinformatics web sources. Focused Crawlers ideally should crawl only relevant pages, but the relevance of the page can only be estimated after crawling the genomics pages. A solution for predicting the page relevance, which is based on Natural Language Processing, is proposed in the paper. The frequency of the keywords on the top ranked sentences of the page determines the relevance of the pages within genomics sources. The proposed solution uses a TextRank algorithm to rank the sentences, as well as ensuring the correct classification of Bioinformatics web page. Finally, the model is validated by being compared with a breadth first search web-crawler. The comparison shows significant reduction in run time for the same harvest rate.

APA, Harvard, Vancouver, ISO, and other styles

Lu, Houqing, Donghui Zhan, Lei Zhou, and Dengchao He. "An Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation." Mathematical Problems in Engineering 2016 (2016): 1–10. http://dx.doi.org/10.1155/2016/6406901.

Full text

Abstract:

A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. However, the performance of the current focused crawling can easily suffer the impact of the environments of web pages and multiple topic web pages. In the crawling process, a highly relevant region may be ignored owing to the low overall relevance of that page, and anchor text or link-context may misguide crawlers. In order to solve these problems, this paper proposes a new focused crawler. First, we build a web page classifier based on improved term weighting approach (ITFIDF), in order to gain highly relevant web pages. In addition, this paper introduces an evaluation approach of the link, link priority evaluation (LPE), which combines web page content block partition algorithm and the strategy of joint feature evaluation (JFE), to better judge the relevance between URLs on the web page and the given topic. The experimental results demonstrate that the classifier using ITFIDF outperforms TFIDF, and our focused crawler is superior to other focused crawlers based on breadth-first, best-first, anchor text only, link-context only, and content block partition in terms of harvest rate and target recall. In conclusion, our methods are significant and effective for focused crawler.

APA, Harvard, Vancouver, ISO, and other styles

Qiu, Zhao, Ceng Jun Dai, and Tao Liu. "Design of Theme Crawler for Web Forum." Applied Mechanics and Materials 548-549 (April 2014): 1330–33. http://dx.doi.org/10.4028/www.scientific.net/amm.548-549.1330.

Full text

Abstract:

Network crawler as web information extraction tools, it can download web pages from internet for the engine. The implementation strategy and operating efficiency of crawling program have a direct influence on results of subsequent work. The paper aimed at the shortcomings of ordinary crawler, puts forward a practical and efficient precise crawler theme method for the BBS, the method for the BBS characteristics, attempts in the web page parsing, theme correlation analysis and the crawling strategy, using the template configuration, analyze and crawl on the article. The method is better than the general crawler in the performance, accuracy and comprehensive rate.

APA, Harvard, Vancouver, ISO, and other styles

Subatra Devi, S. "A Novel Approach on Focused Crawling With Anchor Text." Asian Journal of Computer Science and Technology 7, no. 1 (May 5, 2018): 7–15. http://dx.doi.org/10.51983/ajcst-2018.7.1.1849.

Full text

Abstract:

A novel approach with focused crawling for various anchor texts is discussed in this paper. Most of the search engines search the web with the anchor text to retrieve the relevant pages and answer the queries given by the users. The crawler usually searches the web pages and filters the unnecessary pages which can be done through focused crawling. A focused crawler generates its boundary to crawl the relevant pages based on the link and ignores the irrelevant pages on the web. In this paper, an effective focused crawling method is implemented to improve the quality of the search. Here, three learning phases are considered namely, content-based, link-based and sibling-based learning are undergone to improve the navigation of the search. In this approach, the crawler crawls through the relevant pages efficiently and more relevant pages are retrieved in an effective way. It is proved experimentally that more number of relevant pages are retrieved for different anchor texts with three learning phases using focused crawling.

APA, Harvard, Vancouver, ISO, and other styles

Ro, Inwoo, Joong Soo Han, and Eul Gyu Im. "Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model." Security and Communication Networks 2018 (December 4, 2018): 1–7. http://dx.doi.org/10.1155/2018/9065424.

Full text

Abstract:

This paper proposes an advanced countermeasure against distributed web-crawlers. We investigated other methods for crawler detection and analyzed how distributed crawlers can bypass these methods. Our method can detect distributed crawlers by focusing on the property that web traffic follows the power distribution. When we sort web pages by the number of requests, most of requests are concentrated on the most frequently requested web pages. In addition, there will be some web pages that normal users do not generally request. But crawlers will request for these web pages because their algorithms are intended to request iteratively by parsing web pages to collect every item the crawlers encounter. Therefore, we can assume that if some IP addresses are frequently used to request the web pages that are located in the long-tail area of a power distribution graph, those IP addresses can be classified as crawler nodes. The experimental results with NASA web traffic data showed that our method was effective in identifying distributed crawlers with 0.0275% false positives when a conventional frequency-based detection method shows 2.882% false positives with an equal access threshold.

APA, Harvard, Vancouver, ISO, and other styles

Sakunthala Prabha, K. S., C. Mahesh, and S. P. Raja. "An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm." Cybernetics and Information Technologies 21, no. 2 (June 1, 2021): 105–20. http://dx.doi.org/10.2478/cait-2021-0022.

Full text

Abstract:

Abstract Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Web Crawler"

PAES, Vinicius de Carvalho. "Crawler de Faces na Web." reponame:Repositório Institucional da UNIFEI, 2012. http://repositorio.unifei.edu.br/xmlui/handle/123456789/1099.

Full text

Abstract:

Submitted by repositorio repositorio (repositorio@unifei.edu.br) on 2018-02-26T15:29:14Z No. of bitstreams: 1 dissertacao_paes_2012.pdf: 2653704 bytes, checksum: ad170caa9b81a8332ad66d442fdf9289 (MD5)
Made available in DSpace on 2018-02-26T15:29:14Z (GMT). No. of bitstreams: 1 dissertacao_paes_2012.pdf: 2653704 bytes, checksum: ad170caa9b81a8332ad66d442fdf9289 (MD5) Previous issue date: 2012-11
O foco primordial neste projeto é definir a estrutura básica necessária para o desenvolvimento e aplicação prática de uma máquina de busca de faces, afim de garantir uma busca com parâmetros qualitativos apropriados.

APA, Harvard, Vancouver, ISO, and other styles

Nguyen, Qui V. "Enhancing a Web Crawler with Arabic Search." Thesis, Monterey, California: Naval Postgraduate School, 2012.

Find full text

Abstract:

Many advantages of the Internetâ ease of access, limited regulation, vast potential audience, and fast flow of informationâ have turned it into the most popular way to communicate and exchange ideas. Criminal and terrorist groups also use these advantages to turn the Internet into their new play/battle fields to conduct their illegal/terror activities. There are millions of Web sites in different languages on the Internet, but the lack of foreign language search engines makes it impossible to analyze foreign language Web sites efficiently. This thesis will enhance an open source Web crawler with Arabic search capability, thus improving an existing social networking tool to perform page correlation and analysis of Arabic Web sites. A social networking tool with Arabic search capabilities could become a valuable tool for the intelligence community. Its page correlation and analysis results could be used to collect open source intelligence and build a network of Web sites that are related to terrorist or criminal activities.

APA, Harvard, Vancouver, ISO, and other styles

Ali, Halil, and hali@cs rmit edu au. "Effective web crawlers." RMIT University. CS&IT, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20081127.164414.

Full text

Abstract:

Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed.

APA, Harvard, Vancouver, ISO, and other styles

Kayisoglu, Altug. "Lokman: A Medical Ontology Based Topical Web Crawler." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/2/12606468/index.pdf.

Full text

Abstract:

Use of ontology is an approach to overcome the &ldquo
search-on-the-net&rdquo
problem. An ontology based web information retrieval system requires a topical web crawler to construct a high quality document collection. This thesis focuses on implementing a topical web crawler with medical domain ontology in order to find out the advantages of ontological information in web crawling. Crawler is implemented with Best-First search algorithm. Design of the crawler is optimized to UMLS ontology. Crawler is tested with Harvest Rate and Target Recall Metrics and compared to a non-ontology based Best-First Crawler. Performed test results proved that ontology use in crawler URL selection algorithm improved the crawler performance by 76%.

APA, Harvard, Vancouver, ISO, and other styles

Pandya, Milan. "A Domain Based Approach to Crawl the Hidden Web." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/32.

Full text

Abstract:

There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database.

APA, Harvard, Vancouver, ISO, and other styles

Koron, Ronald Dean. "Developing a Semantic Web Crawler to Locate OWL Documents." Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1347937844.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Stivala, Giada Martina. "Perceptual Web Crawlers." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

Web crawlers are a fundamental component of web application scanners and are used to explore the attack surface of web applications. Crawlers work as follows. First, for each page, they extract URLs and UI elements that may lead to new pages. Then, they use a depth-first or breadth-first tree traversal to explore new pages. In this approach, crawlers cannot distinguish between "terminate user account" and "next page" buttons and they will click on both without taking into account the consequences of their actions. The goal of this project is to devise a new family of crawlers that builds on client-side code analysis and expand with the inference of the semantic of UI element by using visual clues. The new crawler will be able to identify in real time types and semantics of the UI elements, and it will use the semantics to choose the right action. This project will include the development of a prototype and evaluation against a selection of real-size web applications.

APA, Harvard, Vancouver, ISO, and other styles

Choudhary, Suryakant. "M-crawler: Crawling Rich Internet Applications Using Menu Meta-model." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23118.

Full text

Abstract:

Web applications have come a long way both in terms of adoption to provide information and services and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as Ajax, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs) changed the traditional web applications in two primary ways: Dynamic manipulation of client side state and Asynchronous communication with the server. At the same time, such techniques also introduce new challenges. Among these challenges, an important one is the difficulty of automatically crawling these new applications. Crawling is not only important for indexing the contents but also critical to web application assessment such as testing for security vulnerabilities or accessibility. Traditional crawlers are no longer sufficient for these newer technologies and crawling in RIAs is either inexistent or far from perfect. There is a need for an efficient crawler for web applications developed using these new technologies. Further, as more and more enterprises use these new technologies to provide their services, the requirement for a better crawler becomes inevitable. This thesis studies the problems associated with crawling RIAs. Crawling RIAs is fundamentally more difficult than crawling traditional multi-page web applications. The thesis also presents an efficient RIA crawling strategy and compares it with existing methods.

APA, Harvard, Vancouver, ISO, and other styles

Lee, Hsin-Tsang. "IRLbot: design and performance analysis of a large-scale web crawler." Texas A&M University, 2008. http://hdl.handle.net/1969.1/85914.

Full text

Abstract:

This thesis shares our experience in designing web crawlers that scale to billions of pages and models their performance. We show that with the quadratically increasing complexity of verifying URL uniqueness, breadth-first search (BFS) crawl order, and fixed per-host rate-limiting, current crawling algorithms cannot effectively cope with the sheer volume of URLs generated in large crawls, highly-branching spam, legitimate multi-million-page blog sites, and infinite loops created by server-side scripts. We offer a set of techniques for dealing with these issues and test their performance in an implementation we call IRLbot. In our recent experiment that lasted 41 days, IRLbot running on a single server successfully crawled 6:3 billion valid HTML pages (7:6 billion connection requests) and sustained an average download rate of 319 mb/s (1,789 pages/s). Unlike our prior experiments with algorithms proposed in related work, this version of IRLbot did not experience any bottlenecks and successfully handled content from over 117 million hosts, parsed out 394 billion links, and discovered a subset of the web graph with 41 billion unique nodes.

APA, Harvard, Vancouver, ISO, and other styles

Karki, Rabin. "Fresh Analysis of Streaming Media Stored on the Web." Digital WPI, 2011. https://digitalcommons.wpi.edu/etd-theses/81.

Full text

Abstract:

With the steady increase in the bandwidth available to end users and Web sites hosting user generated content, there appears to be more multimedia content on the Web than ever before. Studies to quantify media stored on the Web done in 1997 and 2003 are now dated since the nature, size and number of streaming media objects on the Web have changed considerably. Although there have been more recent studies characterizing specific streaming media sites like YouTube, there are only a few studies that focus on characterizing the media stored on the Web as a whole. We build customized tools to crawl the Web, identify streaming media content and extract the characteristics of the streaming media found. We choose 16 different starting points and crawled 1.25 million Web pages from each starting point. Using the custom built tools, the media objects are identified and analyzed to determine attributes including media type, media length, codecs used for encoding, encoded bitrate, resolution, and aspect ratio. A little over half the media clips we encountered are video. MP3 and AAC are the most prevalent audio codecs whereas H.264 and FLV are the most common video codecs. The median size and encoded bitrates of stored media have increased since the last study. Information on the characteristics of stored multimedia and their trends over time can help system designers. The results can also be useful for empirical Internet measurements studies that attempt to mimic the behavior of streaming media traffic over the Internet.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Web Crawler"

(Firm), Granary Books, and Press Collection (Library of Congress), eds. Night crawlers on the Web. Charlottesville, Va: JABBooks, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Taylor, Z. W., and Joshua Childs. Measuring the Metaphysical School: Using Web Crawlers and Search Engine Optimization to Evaluate School Web Metrics. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications, Ltd., 2022. http://dx.doi.org/10.4135/9781529603170.

Full text

APA, Harvard, Vancouver, ISO, and other styles

I wonder why spiders spin webs: And other questions about creepy crawlies. New York: Kingfisher, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

I wonder why spiders spin webs: And other questions about creepy-crawlies. London: Kingfisher, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Drucker, Johanna. Johanna Drucker:Night Crawlers On The Web. Granary Books/JAB Books, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Newman, Mark. Networks of information. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198805090.003.0003.

Full text

Abstract:

A discussion of information networks and their measurement. The world wide web is discussed at length, including HTML, HTTP, and the use of crawlers to measure network structure. Citation networks are also discussed in some detail, including their history, structure, and statistics, and the use of databases of citation records to construct networks. Other networks discussed include peer-to-peer networks, recommender networks, and keyword indexes.

APA, Harvard, Vancouver, ISO, and other styles

Newman, Mark. Network search. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198805090.003.0018.

Full text

Abstract:

This chapter gives a discussion of search processes on networks. It begins with a discussion of web search, including crawlers and web ranking algorithms such as PageRank. Search in distributed databases such as peer-to-peer networks is also discussed, including simple breadth-first search style algorithms and more advanced “supernode” approaches. Finally, network navigation is discussed at some length, motivated by consideration of Milgram's letter passing experiment. Kleinberg's variant of the small-world model is introduced and it is shown that efficient navigation is possible only for certain values of the model parameters. Similar results are also derived for the hierarchical model of Watts et al.

APA, Harvard, Vancouver, ISO, and other styles

I Wonder Why Spiders Spin Webs: And Other Questions About Creepy Crawlies. Tandem Library, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

I Wonder Why Spiders Spin Webs: And Other Questions About Creepy Crawlies (I Wonder Why). Kingfisher, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

I Wonder Why Spiders Spin Webs: And Other Questions About Creepy Crawlies (I Wonder Why). Kingfisher, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Web Crawler"

Najork, Marc. "Web Crawler Architecture." In Encyclopedia of Database Systems, 4608–11. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Najork, Marc. "Web Crawler Architecture." In Encyclopedia of Database Systems, 3462–65. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Najork, Marc. "Web Crawler Architecture." In Encyclopedia of Database Systems, 1–4. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_457-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Najork, Marc. "Web Crawler Architecture." In Encyclopedia of Database Systems, 1–4. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_457-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mukhopadhyay, Debajyoti, and Sukanta Sinha. "Domain-Specific Crawler Design." In Web Searching and Mining, 85–112. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3053-7_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kaur, Sawroop, and G. Geetha. "Smart Focused Web Crawler for Hidden Web." In Information and Communication Technology for Competitive Strategies, 419–27. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-0586-3_42.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mirtaheri, Seyed M., Gregor V. Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut. "PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications." In Web Information Systems Engineering – WISE 2014, 365–80. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11746-1_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kuśmierczyk, Tomasz, and Marcin Sydow. "Towards a Keyword-Focused Web Crawler." In Language Processing and Intelligent Information Systems, 187–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38634-3_21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Choi, Myung Sil, Yong Soo Park, and Kwang Seon Ahn. "Enterprise Management System with Web-Crawler." In Lecture Notes in Computer Science, 539–42. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-88623-5_71.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shaila, S. G., and A. Vadivel. "Intelligent Rule-Based Deep Web Crawler." In Textual and Visual Information Retrieval using Query Refinement and Pattern Analysis, 1–19. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-2559-5_1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Web Crawler"

Dahiwale, P., A. Mokhade, and M. M. Raghuwanshi. "Intelligent web crawler." In ICWET '10: International Conference and Workshop on Emerging Trends in Technology. New York, NY, USA: ACM, 2010. http://dx.doi.org/10.1145/1741906.1742046.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Anbukodi, S., and K. Muthu Manickam. "Reducing web crawler overhead using mobile crawler." In 2011 International Conference on Emerging Trends in Electrical and Computer Technology (ICETECT 2011). IEEE, 2011. http://dx.doi.org/10.1109/icetect.2011.5760252.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Khine, Su Mon, and Yadana Thein. "Myanmar Web Pages Crawler." In Fifth International conference on Computer Science and Information Technology. Academy & Industry Research Collaboration Center (AIRCC), 2015. http://dx.doi.org/10.5121/csit.2015.50410.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bal, Sawroop Kaur, and G. Geetha. "Smart distributed web crawler." In 2016 International Conference on Information Communication and Embedded Systems (ICICES). IEEE, 2016. http://dx.doi.org/10.1109/icices.2016.7518893.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Agre, Gunjan H., and Nikita V. Mahajan. "Keyword focused web crawler." In 2015 2nd International Conference on Electronics and Communication Systems (ICECS). IEEE, 2015. http://dx.doi.org/10.1109/ecs.2015.7124749.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ganesh, S., M. Jayaraj, V. Kalyan, SrinivasaMurthy, and G. Aghila. "Ontology-based Web crawler." In International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. IEEE, 2004. http://dx.doi.org/10.1109/itcc.2004.1286658.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gupta, Pooja, and Kalpana Johari. "Implementation of Web Crawler." In 2009 Second International Conference on Emerging Trends in Engineering & Technology. IEEE, 2009. http://dx.doi.org/10.1109/icetet.2009.124.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cai, Jianfu, and Hua Zhang. "Dis-Dyn Crawler:A Distributed Crawler for Dynamic Web Page." In 2015 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering. Paris, France: Atlantis Press, 2015. http://dx.doi.org/10.2991/icmmcce-15.2015.505.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zambom Santana, Luiz Henrique, Ronaldo dos Santos Mello, and Mauro Roisenberg. "Smart Crawler." In Webmedia '15: 21st Brazilian Symposium on Multimedia and the Web. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2820426.2820437.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Arun Patil, Tejaswini, and Santosh Chobe. "Web Crawler for Searching Deep Web Sites." In 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA). IEEE, 2017. http://dx.doi.org/10.1109/iccubea.2017.8463648.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Web Crawler"

Kang, Byeong Ho, Paul Compton, Hiroshi Motoda, and John Salerno. Dynamic Scheduling for Web Monitoring Crawler. Fort Belvoir, VA: Defense Technical Information Center, February 2009. http://dx.doi.org/10.21236/ada494589.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Web Crawler'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Web Crawler"

Dissertations / Theses on the topic "Web Crawler"

Books on the topic "Web Crawler"

Book chapters on the topic "Web Crawler"

Conference papers on the topic "Web Crawler"

Reports on the topic "Web Crawler"