Dissertations / Theses on the topic 'Web mining'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Web mining.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Zheng, George. "Web Service Mining." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/26324.
Full textPh. D.
Benkovská, Petra. "Web Usage Mining." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-3950.
Full textOosthuizen, Craig Peter. "Web usage mining of organisational web sites." Thesis, Nelson Mandela Metropolitan University, 2005. http://hdl.handle.net/10948/399.
Full textMartins, Bruno. "Geographically Aware Web Text Mining." Master's thesis, Department of Informatics, University of Lisbon, 2009. http://hdl.handle.net/10451/14301.
Full textStavrianou, Anna. "Modeling and mining of Web discussions." Phd thesis, Université Lumière - Lyon II, 2010. http://tel.archives-ouvertes.fr/tel-00564764.
Full textNorguet, Jean-Pierre. "Semantic analysis in web usage mining." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210890.
Full textIndeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.
Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results.
An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics.
In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites.
According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics.
Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished
Chen, Hsinchun. "Special issue: "Web retrieval and mining"." Elsevier, 2003. http://hdl.handle.net/10150/106101.
Full textSearch engines and data mining are two research areas that have experienced significant progress over the past few years. Overwhelming acceptance of the Internet as a primary medium for content delivery and business transactions has created unique opportunities and challenges for researchers. The richness of the webâ s multimedia content, the reach and timeliness of web-based publication, the proliferation of e-commerce activities and the potential for wireless web delivery have generated many interesting research problems. Technical, system, organizational and social research approaches are all needed to address these research problems. Many interesting webretrieval and mining research topics have emerged recently. These include, but are not limited to, the following: text and data mining on the web, web visualization, web intelligence and agents, web-based decision support and knowledge management, wireless web retrieval and visualization, web-based usability methodology, web-based analysis for eCommerce applications. This special issue consists of nine papers that report research in web retrieval and mining.
Khalil, Faten. "Combining web data mining techniques for web page access prediction." University of Southern Queensland, Faculty of Sciences, 2008. http://eprints.usq.edu.au/archive/00004341/.
Full textKhairo-Sindi, Mazin Omar. "Framework for web log pre-processing within web usage mining." Thesis, University of Manchester, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488456.
Full textNagi, Mohamad. "Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14200.
Full textLiu, Qian. "Mining the Web to support Web image retrieval and image annotation." Thesis, University of Macau, 2007. http://umaclib3.umac.mo/record=b1677226.
Full textDonato, Debora. "Web Mining and Exploration: Algorithms and Experiments." Doctoral thesis, La Sapienza, 2006. http://hdl.handle.net/11573/917052.
Full textPoblete, Labra Bárbara. "Query-Based data mining for the web." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7270.
Full textThe objective of this thesis is to study different applications of Web query mining for the improvement of search engine ranking, Web information retrieval and Web site enhancement. The main motivation of this work is to take advantage of the implicit feedback left in the trail of users while navigating through the Web. Throughout this work we seek to demonstrate the value of queries to extract interesting rules, patterns and information about the documents they reach. The models, created in this doctoral work, show that the "wisdom of the crowds" conveyed in queries has many applications that overall provide a better understanding of users' needs in the Web. This allows to improve the general interaction of visitors with Web sites and search engines in a straightforward way.
Ngok, Man Chan. "Log mining to support web query expansions." Thesis, University of Macau, 2008. http://umaclib3.umac.mo/record=b1783608.
Full textTezuka, Taro. "Web mining for extracting cognitive geographic spaces." 京都大学 (Kyoto University), 2005. http://hdl.handle.net/2433/144807.
Full textLi, Liangchun. "Web-based data visualization for data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ35845.pdf.
Full textBa-Omer, Hafidh Taher. "A framework for educational web usage mining." Thesis, University of Manchester, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492063.
Full textMulvenna, Maurice David. "Analyzing computer-mediated behaviour using web mining." Thesis, University of Ulster, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442371.
Full textKong, Wei. "EXPLORING HEALTH WEBSITE USERS BY WEB MINING." Thesis, Universal Access in Human-Computer Interaction. Applications and Services Lecture Notes in Computer Science, 2011, Volume 6768/2011, 376-383, DOI: 10.1007/978-3-642-21657-2_40, 2011. http://hdl.handle.net/1805/2810.
Full textWith the continuous growth of health information on the Internet, providing user-orientated health service online has become a great challenge to health providers. Understanding the information needs of the users is the first step to providing tailored health service. The purpose of this study is to examine the navigation behavior of different user groups by extracting their search terms and to make some suggestions to reconstruct a website for more customized Web service. This study analyzed five months’ of daily access weblog files from one local health provider’s website, discovered the most popular general topics and health related topics, and compared the information search strategies for both patient/consumer and doctor groups. Our findings show that users are not searching health information as much as was thought. The top two health topics which patients are concerned about are children’s health and occupational health. Another topic that both user groups are interested in is medical records. Also, patients and doctors have different search strategies when looking for information on this website. Patients get back to the previous page more often, while doctors usually go to the final page directly and then leave the page without coming back. As a result, some suggestions to redesign and improve the website are discussed; a more intuitive portal and more customized links for both user groups are suggested.
Yang, Yi Yang. "Identifying city landmarks by mining web albums." Thesis, University of Macau, 2015. http://umaclib3.umac.mo/record=b3335394.
Full textEscudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic web resource compilation using data mining." Master's thesis, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/65594.
Full textEscudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic Web Resource Compilation Using Data Mining." Master's thesis, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/10767.
Full textMaster in Data Analysis and Decision Support Systems
Nesta dissertação propomos uma metodologia que automatize a recolha de recursos na Web e facilite a sua exploração. Um recurso é uma colecção de documentos referentes a um tópico específico definido pelo utilizador. A intervenção do utilizador é explicitamente requerida numa fase inicial, quando este especifica as suas necessidades de informação e fornece alguns documentos exemplificativos. Após esta fase inicial, de definição e especificação das necessidades de informação, a metodologia mantém-se alinhada corn a contínua evolução das preferências do utilizador que são permanentemente monitorizadas e seguidas sem que seja necessáio requerer explicitamente a sua intervenção. Para tal, a metodologia analisa as preferencias do utilizador a partir das suas acções - guardar, imprimir, visualizar, alterar a categoria de documentos - que são automaticamente registadas durante cada sessão. Desta forma o utilizador fornece informação valiosa ao sistema sem qualquer esforço adicional. A metodologia prevê um nível de apresentação, desenhado com o objectivo de permitir a exploração e análise de colecções volumosas de documentos, através do qual o utilizador explora os seus recursos. 0 s recursos são compilados através de um processo de meta-search, onde as pesquisas são programadas por um agente que analisa o compromisso entre a actualidade do recurso e a percentagem de documentos duplicados nas respostas do processo de recolha. As pesquisas são programadas de forma a manter a actualidade do recurso, reduzindo, simultaneamente, o número de pesquisas efectuadas. A metodologia propõe também os mecanismos necessários para avaliar e controlar de forma automática a qualidade global do sistema. Esta qualidade é definida num espaço tridimensional cujas dimensões quantificam o desempenho no que se refere ao nível de Automação, Eficácia e Eficiência. Cada uma destas dimensões agrega um conjunto de medidas relevantes para a qualidade global do sistema: o nivel de Automação é calculado a partir da carga de trabalho que é explicitamente requerida ao utilizador; a Eficiência é calculada a partir das medidas de precison e accuracy; a Eficiência é calculada com base nas medidas de recall, freshness e novelty. 0 sistema mede e regista permanentemente o valor dos seus parâmetros de qualidade globais, que são usados para activar procedimentos correctivos ou preventivos de forma a corrigir ou antecipar uma degradação da qualidade global do sistema. A classificação de páginas Web assume-se como uma tarefa critica na nossa metodologia. Para avaliar da adequação de técnicas de aprendizagem semi-supervisionada foram desenhadas e realizadas algumas experiências. A realização destas experiências foi suportada por um protótipo que implementa parte da metodologia proposta e que foi implementado no decurso deste trabalho. Em particular este protótipo foi utilizado para compilar dois recursos distintos e para estudar a taxa de erro e a robustez da tarefa de classificação semi-automática.
Leibold, Markus. "Web Log Mining als Controllinginstrument der PR." [S.l. : s.n.], 2004. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB11675715.
Full textMa, Yao. "Financial market predictions using Web mining approaches /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20MAY.
Full textSchenker, Adam. "Graph-Theoretic Techniques for Web Content Mining." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000143.
Full textSaad, Elmak. "Optimizing E-management Using Web data mining." Thesis, University of Huddersfield, 2018. http://eprints.hud.ac.uk/id/eprint/34540/.
Full textEscudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic Web Resource Compilation Using Data Mining." Dissertação, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/10767.
Full textMaster in Data Analysis and Decision Support Systems
Nesta dissertação propomos uma metodologia que automatize a recolha de recursos na Web e facilite a sua exploração. Um recurso é uma colecção de documentos referentes a um tópico específico definido pelo utilizador. A intervenção do utilizador é explicitamente requerida numa fase inicial, quando este especifica as suas necessidades de informação e fornece alguns documentos exemplificativos. Após esta fase inicial, de definição e especificação das necessidades de informação, a metodologia mantém-se alinhada corn a contínua evolução das preferências do utilizador que são permanentemente monitorizadas e seguidas sem que seja necessáio requerer explicitamente a sua intervenção. Para tal, a metodologia analisa as preferencias do utilizador a partir das suas acções - guardar, imprimir, visualizar, alterar a categoria de documentos - que são automaticamente registadas durante cada sessão. Desta forma o utilizador fornece informação valiosa ao sistema sem qualquer esforço adicional. A metodologia prevê um nível de apresentação, desenhado com o objectivo de permitir a exploração e análise de colecções volumosas de documentos, através do qual o utilizador explora os seus recursos. 0 s recursos são compilados através de um processo de meta-search, onde as pesquisas são programadas por um agente que analisa o compromisso entre a actualidade do recurso e a percentagem de documentos duplicados nas respostas do processo de recolha. As pesquisas são programadas de forma a manter a actualidade do recurso, reduzindo, simultaneamente, o número de pesquisas efectuadas. A metodologia propõe também os mecanismos necessários para avaliar e controlar de forma automática a qualidade global do sistema. Esta qualidade é definida num espaço tridimensional cujas dimensões quantificam o desempenho no que se refere ao nível de Automação, Eficácia e Eficiência. Cada uma destas dimensões agrega um conjunto de medidas relevantes para a qualidade global do sistema: o nivel de Automação é calculado a partir da carga de trabalho que é explicitamente requerida ao utilizador; a Eficiência é calculada a partir das medidas de precison e accuracy; a Eficiência é calculada com base nas medidas de recall, freshness e novelty. 0 sistema mede e regista permanentemente o valor dos seus parâmetros de qualidade globais, que são usados para activar procedimentos correctivos ou preventivos de forma a corrigir ou antecipar uma degradação da qualidade global do sistema. A classificação de páginas Web assume-se como uma tarefa critica na nossa metodologia. Para avaliar da adequação de técnicas de aprendizagem semi-supervisionada foram desenhadas e realizadas algumas experiências. A realização destas experiências foi suportada por um protótipo que implementa parte da metodologia proposta e que foi implementado no decurso deste trabalho. Em particular este protótipo foi utilizado para compilar dois recursos distintos e para estudar a taxa de erro e a robustez da tarefa de classificação semi-automática.
Zhu, Jianhan. "Mining web site link structures for adaptive web site navigation and search." Thesis, University of Ulster, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515890.
Full textSalin, Suleyman. "Web Usage Mining And Recommendation With Semantic Information." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12610483/index.pdf.
Full textZettsu, Koji. "Aspect discovery : mining context in world wide Web." 京都大学 (Kyoto University), 2005. http://hdl.handle.net/2433/144804.
Full textSobolewska, Katarzyna-Ewa. "Web links utility assessment using data mining techniques." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2936.
Full textakasha.kate@gmail.com
Mortazavi-Asl, Behzad. "Discovering and mining user Web-page traversal patterns." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ61594.pdf.
Full textSOARES, FABIO DE AZEVEDO. "TEXT MINING AT THE INTELLIGENT WEB CRAWLING PROCESS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=13212@1.
Full textCONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Esta dissertação apresenta um estudo sobre a utilização de Mineração de Textos no processo de coleta inteligente de dados na Web. O método mais comum de obtenção de dados na Web consiste na utilização de web crawlers. Web crawlers são softwares que, uma vez alimentados por um conjunto inicial de URLs (sementes), iniciam o procedimento metódico de visitar um site, armazenálo em disco e extrair deste os hyperlinks que serão utilizados para as próximas visitas. Entretanto, buscar conteúdo desta forma na Web é uma tarefa exaustiva e custosa. Um processo de coleta inteligente de dados na Web, mais do que coletar e armazenar qualquer documento web acessível, analisa as opções de crawling disponíveis para encontrar links que, provavelmente, fornecerão conteúdo de alta relevância a um tópico definido a priori. Na abordagem de coleta de dados inteligente proposta neste trabalho, tópicos são definidos, não por palavras chaves, mas, pelo uso de documentos textuais como exemplos. Em seguida, técnicas de pré-processamento utilizadas em Mineração de Textos, entre elas o uso de um dicionário thesaurus, analisam semanticamente o documento apresentado como exemplo. Baseado nesta análise, o web crawler construído será guiado em busca do seu objetivo: recuperar informação relevante sobre o documento. A partir de sementes ou realizando uma consulta automática nas máquinas de buscas disponíveis, o crawler analisa, igualmente como na etapa anterior, todo documento recuperado na Web. Então, é executado um processo de comparação entre cada documento recuperado e o documento exemplo. Depois de obtido o nível de similaridade entre ambos, os hyperlinks do documento recuperado são analisados, empilhados e, futuramente, serão desempilhados de acordo seus respectivos e prováveis níveis de importância. Ao final do processo de coleta de dados, outra técnica de Mineração de Textos é aplicada, objetivando selecionar os documentos mais representativos daquela coleção de textos: a Clusterização de Documentos. A implementação de uma ferramenta que contempla as heurísticas pesquisadas permitiu obter resultados práticos, tornando possível avaliar o desempenho das técnicas desenvolvidas e comparar os resultados obtidos com outras formas de recuperação de dados na Web. Com este trabalho, mostrou-se que o emprego de Mineração de Textos é um caminho a ser explorado no processo de recuperação de informação relevante na Web.
This dissertation presents a study about the application of Text Mining as part of the intelligent Web crawling process. The most usual way of gathering data in Web consists of the utilization of web crawlers. Web crawlers are softwares that, once provided with an initial set of URLs (seeds), start the methodical proceeding of visiting a site, store it in disk and extract its hyperlinks that will be used for the next visits. But seeking for content in this way is an expensive and exhausting task. An intelligent web crawling process, more than collecting and storing any web document available, analyses its available crawling possibilities for finding links that, probably, will provide high relevant content to a topic defined a priori. In the approach suggested in this work, topics are not defined by words, but rather by the employment of text documents as examples. Next, pre-processing techniques used in Text Mining, including the use of a Thesaurus, analyze semantically the document submitted as example. Based on this analysis, the web crawler thus constructed will be guided toward its objective: retrieve relevant information to the document. Starting from seeds or querying through available search engines, the crawler analyzes, exactly as in the previous step, every document retrieved in Web. the similarity level between them is obtained, the retrieved document`s hyperlinks are analysed, queued and, later, will be dequeued according to each one`s probable degree of importance. By the end of the gathering data process, another Text Mining technique is applied, with the propose of selecting the most representative document among the collected texts: Document Clustering. The implementation of a tool incorporating all the researched heuristics allowed to achieve results, making possible to evaluate the performance of the developed techniques and compare all obtained results with others means of retrieving data in Web. The present work shows that the use of Text Mining is a track worthy to be exploited in the process of retrieving relevant information in Web.
Мінакова, В. П., and Н. В. Геселева. "Web-mining: інтелектуальний аналіз даних в мережі Internet." Thesis, КНУТД, 2016. https://er.knutd.edu.ua/handle/123456789/4457.
Full textShun, Yeuk Kiu. "Web mining from client side user activity log /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?COMP%202002%20SHUN.
Full textIncludes bibliographical references (leaves 85-90). Also available in electronic version. Access restricted to campus users.
Novák, Petr. "Data mining časových řad." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-72068.
Full textChiara, Ramon. ""Aplicação de técnicas de data mining em logs de servidores web"." Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19012004-093205/.
Full textPeng, Puping. "Web mining with jMap technology." Thesis, 2002. http://spectrum.library.concordia.ca/1653/1/MQ68477.pdf.
Full textLiao, Shao-An, and 廖紹安. "Mining Closed Web Traversal Patterns." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/16547330032244431089.
Full text銘傳大學
資訊工程學系碩士班
99
Mining web traversal patterns is to find the traversal paths for most of the web users from web access logs. Most of the researches for mining web traversal patterns do not consider the user behavior of backward traversal. Besides, many redundant will be generated when the minimum support is low. In order to provide important and condensed information for users, we define the information called closed web traversal patterns. All the web traversal patterns can be derived from the closed web traversal patterns. In this paper, we propose an efficient algorithm for mining closed web traversal patterns from the paths traversed by all the web users. Our algorithm is based on a tree structure and the backward traversal is also considered. When a node is created on the tree structure, we can immediately determine if it is closed or not by using some mechanisms. Only find the closed web traversal patterns can reduce memory and search space, and improve the mining efficiency.
Chen, Meng-Hau, and 陳孟豪. "A Web Mining Architecture for XML Web Pages Characteristic." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/ebbrs6.
Full text靜宜大學
資訊管理學系研究所
90
Due to the growing of the web sites’ usage, the web sites have become popular channels for data transmission and data sharing. In the past, most researches related to Web Mining didn’t provide effective methods to find out the relative information about user browsing content, because the structure of HTML is too unstructure and the HTML tags can’t provide details and specific information about the web page. In recent years, the shortcomings of HTML have been overcome by XML. Therefore, we propose a method to extract the tag information from the XML web documents to find out the patterns of users’ web usage based on the characteristics of XML. In our thesis, we provide two kinds of XML documents tag extracting mechanisms. Fist method is to extract the tags based on the web site itself, the other is to retrieval and extract the documents based on user’s roaming path. Using our mechanisms, we can excavate the information related to users’ favorite web contents and analysis the users’ web browsing behaviors. Besides, we also propose a personalization recommend method based on the previous methods. Based on our personalization recommend system, we can recommend different and suitable products to distinct customer groups.
Lin, Ching-Nan, and 林慶南. "Enhancement of Web Sites Security Utilizing Web Logs Mining." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/34333759103494653266.
Full text中原大學
電子工程研究所
90
Abstract The problem of information security on the Web has become an important research issue recently. Because the Backdoors or information leak of scripts in Common Getaway Interface(CGI)is hidden inadvertently or premeditated by programmers, these problems cause enterprise’s information to be gotten illegally, and can’t be detected by security tools easily. Besides, Internet grows fast to encourage the important research of Web mining. Therefore, in order to detect Backdoor or information leak of CGI scripts that the some security tools can’t detect and to avoid damage of enterprises, we propose a log data mining to enhance the security of Web servers. First, we combine Web application log data with Web log data to solve the problems in Web log. Then, our method uses the density-based clustering algorithm to mine some abnormal Web log and Web application log data. The obtained information can help system administrator detecting the Backdoor or information leakages in programs more easily. Moreover, the mined information can help system administrator detecting the problem of CGI scripts from on-line Web site log data.
Mongolu, Vivek. "Distributed data mining using web services." 2004. http://etd.louisville.edu/data/UofL0076t2004.pdf.
Full textYo, Shu-Han, and 游舒涵. "Mining Related Terms from Web Pages." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/34301958562258504190.
Full text國立中正大學
資訊工程所
94
With the fast development of Internet and the popularization day by day of the broadband network, the network resources increase sharply. There are more and more network resources that people can obtain. So under the World Wide Web that includes the massive materials, there are many information retrieval researches that mine the data from it. In this paper, we will mine related terms from web pages under the World Wide Web. In the paper, first we will make analysis of the massive web pages to define different block definition and cut units from web pages. Afterward we will do word segmentation in the cutting unit, and pair any two words for related term of each other. After analyzing all webpage materials at present, we will count the result and and calculate the relation between two words with the common occurrence number (co-occurrence) of the two words. We can use the related term make the search result more precisely, and help the users to find more precise information.
Bill, Hong, and 洪渝翔. "Web Usage Mining Based on AJAX." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/56200534079798903747.
Full text國立彰化師範大學
資訊管理學系所
96
With powerful dynamic web development tools and Communities mechanism, Web 2.0 was created; users browse, process and share information on the rich internet application, which is highly-interactive. On account of transformation in browsing environment, the method and meaning of the web usage mining are different from before: the main element in mining process is varied from pages to message which is used for request from server and response to client. In the past, the researches in web usage mining based on web log ignore GET and POST message which contain important data. However, dynamic pages with highly-interaction, the exchange of massages is frequent, and so the web usage mining based on pages is being decayed. This study will probe into the application of AJAX to web usage mining. Also, it suggests the web usage mining be based on AJAX which is aimed at the data of interface usage. With the XML response message on AJAX, it collects and analyzes the data from interaction with the interface; it also integrates with database and provides more diverse and rich data source and meaning of analysis.
"Web opinion mining on consumer reviews." 2008. http://library.cuhk.edu.hk/record=b5893776.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 80-83).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview --- p.1
Chapter 1.2 --- Motivation --- p.3
Chapter 1.3 --- Objective --- p.5
Chapter 1.4 --- Our contribution --- p.5
Chapter 1.5 --- Organization of the Thesis --- p.6
Chapter 2 --- Related Work --- p.7
Chapter 2.1 --- Existing Sentiment Classification Approach --- p.7
Chapter 2.2 --- Existing Sentiment Analysis Approach --- p.9
Chapter 2.3 --- Our Approach --- p.11
Chapter 3 --- Extracting Product Feature Sentences using Supervised Learning Algorithms --- p.12
Chapter 3.1 --- Overview --- p.12
Chapter 3.2 --- Association Rules Mining --- p.13
Chapter 3.2.1 --- Apriori Algorithm --- p.13
Chapter 3.2.2 --- Class Association Rules Mining --- p.14
Chapter 3.3 --- Naive Bayesian Classifier --- p.14
Chapter 3.3.1 --- Basic Idea --- p.14
Chapter 3.3.2 --- Feature Selection Techniques --- p.15
Chapter 3.4 --- Experiment --- p.17
Chapter 3.4.1 --- Data Sets --- p.18
Chapter 3.4.2 --- Experimental Setup and Evaluation Measures --- p.19
Chapter 3.4.1 --- Class Association Rules Mining --- p.20
Chapter 3.4.2 --- Naive Bayesian Classifier --- p.22
Chapter 3.4.3 --- Effect on Data Size --- p.25
Chapter 3.5 --- Discussion --- p.27
Chapter 4 --- Extracting Product Feature Sentences Using Unsupervised Learning Algorithms --- p.28
Chapter 4.1 --- Overview --- p.28
Chapter 4.2 --- Unsupervised Learning Algorithms --- p.29
Chapter 4.2.1 --- K-means Algorithm --- p.29
Chapter 4.2.2 --- Density-Based Scan --- p.29
Chapter 4.2.3 --- Hierarchical Clustering --- p.30
Chapter 4.3 --- Distance Function --- p.32
Chapter 4.3.1 --- Euclidean Distance --- p.32
Chapter 4.3.2 --- Jaccard Distance --- p.32
Chapter 4.4 --- Experiment --- p.33
Chapter 4.4.1 --- Cluster Labeling --- p.33
Chapter 4.4.2 --- K-means Algorithm --- p.34
Chapter 4.4.3 --- Density-Based Scan --- p.35
Chapter 4.4.4 --- Hierarchical Clustering --- p.36
Chapter 4.5 --- Discussion --- p.37
Chapter 5 --- Extracting Product Feature Sentences Using Concept Clustering --- p.39
Chapter 5.1 --- Overview --- p.39
Chapter 5.2 --- Distance Function --- p.40
Chapter 5.2.1 --- Association Weight --- p.40
Chapter 5.2.2 --- Chi Square --- p.41
Chapter 5.2.3 --- Mutual Information --- p.41
Chapter 5.3 --- Experiment --- p.41
Chapter 5.3.1 --- Effect on Distance Functions --- p.42
Chapter 5.3.2 --- Extraction of Product Features Clusters --- p.43
Chapter 5.3.3 --- Labeling of Sentences --- p.45
Chapter 5.4 --- Discussion --- p.48
Chapter 6 --- Extracting Product Feature Sentences Using Concept Clustering and Proposed Unsupervised Learning Algorithm --- p.49
Chapter 6.1 --- Overview --- p.49
Chapter 6.2 --- Problem Statement --- p.50
Chapter 6.3 --- Proposed Algorithm - Scalable Thresholds Clustering --- p.50
Chapter 6.4 --- Properties of the Proposed Unsupervised Learning Algorithm --- p.54
Chapter 6.4.1 --- Relationship between threshold functions & shape of clusters --- p.54
Chapter 6.4.2 --- Expansion process --- p.56
Chapter 6.4.3 --- Impact of Different Threshold Functions --- p.58
Chapter 6.5 --- Experiment --- p.61
Chapter 6.5.1 --- Comparative Studies for Clusters Formation and Sentences Labeling with Digital Camera Dataset --- p.62
Chapter 6.5.2 --- Experiments with New Datasets --- p.67
Chapter 6.6 --- Discussion --- p.74
Chapter 7 --- Conclusion and Future Work --- p.76
Chapter 7.1 --- Compare with Existing Work --- p.76
Chapter 7.2 --- Contribution & Implication of this Work --- p.78
Chapter 7.3 --- Future Work & Improvement --- p.79
REFFERENCE --- p.80
Chapter A --- Concept Clustering for DC data with DB Scan (Terms in Concept Clusters) --- p.84
Chapter B --- Concept Clustering for DC data with Single-linkage Hierarchical Clustering (Terms in Concept Clusters) --- p.87
Chapter C --- Concept Clusters for Digital Camera data (Comparative Studies) --- p.91
Chapter D --- Concept Clusters for Personal Computer data (Comparative Studies) --- p.98
Chapter E --- Concept Clusters for Mobile data (Comparative Studies) --- p.103
Chapter F --- Concept Clusters for MP3 data (Comparative Studies) --- p.109
Chen, Yu-Ru, and 陳郁儒. "Mining Bilingual Collocations on the Web." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/10393173072986484554.
Full text國立清華大學
資訊工程學系
97
In this paper, we introduce a new method for learning to find translation equivalents of a given collocation on the Web based on the query expansion strategy. Our approach involves finding translations in a parallel corpus and learning query expansion terms for the given collocation in order to bias search engines towards returning the top-ranked snippets containing sought-after translations. We utilized the corpus translations from parallel corpus and attempt to learn additional QE terms for retrieving more translations on the Web. The query expansion method is trained on a parallel corpus and validated on the Web. At run time, a given collocation is automatically transformed into a set of queries and sent to a search engine. Then candidate translations are retrieved from the returned snippets and ranked according to their similarity with respect to the corpus translations. Our method provides significantly more translation equivalents from the Web in addition to translations found in parallel corpus, which could be used to assist language learners, translator, and the development of machine translation systems.
Wang, Siou-Hao, and 王修毫. "Mining Tourism Information from Web Fourm." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/13425843195175123562.
Full text國立交通大學
管理學院資訊管理學程
101
Tourism is one kind of industry without pollution, and brings stable economic growth in last ten years. It is quite important to know the trends of tourism for people who make tourism plan or dedicate to improve the quality of tourism. According to the report of Tourism bureau, most of Taiwanese travel at the region of their current address place of residence. This paper aims to know what kinds of topics will be given special attention. Although tourists have similar behavior at the same region, but consider the complexity of tourism plan making, it must have few differences. Most of information is unstructured or semi-structured on web forum. Through a series of data pre-processing, tourism information can be extracted. Finally, here processing web forum's data is processed with Apriori algorithm. According to the analysis, the behaviors of tourism are highly similar at Taoyuan County and Hsinchu City. People tend to pay attention on topic of tourism about "Food"; the behavior of Hsinchu County's tourists fall between Taoyuan County and Hsinchu City. Moreover, negative topics are paid highly attention; the most hit tourism topics of Miaoli County are all about "sightseeing". But these topics are not paid attention continually.
Su, Dong-po, and 蘇東坡. "Applying Neural Network to Web Mining." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/60882080051232983173.
Full text南華大學
資訊管理學研究所
91
In this thesis, we apply widely-used data mining techniques, neural network, in the user’s characteristics of classification on WWW. In this proposed method, we first utilize feature weight detector networks to discover the reliable features from mass and complex training data on WWW. Secondly, we use the proportional learning vector quantization network to learn the appropriate centroid of each cluster. Finally, we apply radial basis function network associated with the centroid of clusters to classify the test data. We in advance partition the data set into several data sections that are used in experiments according to session length. Experimental result show that it has better classification result than ones using overall data set.
Chen, Shih-Sheng, and 陳仕昇. "Mining Web Traversal rules with Sequences." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/75169284009872580156.
Full text國立中央大學
資訊管理研究所
87
Web traversal patterns and rules are valuable to both Electronic Commerce and System Designers. If business owners know users'' traversal behaviors, they can put advertisement banner in proper web pages. The same information can help systems to pre-fetch web pages and reduce response time. In this article, we propose a new data mining method to find the traversal patterns and associated rules. Traversal patterns are recorded in sequences, which have total orders among their elements. Sequences may have duplicated elements, and hence requires a new threshold computing method. The new method results in thresholds decreasing when sequences expanding. To resolve the issue, we design Next Pass Large Threshold and Next Pass Large Sequences to forecast needed sequences and thresholds. To expand sequences properly, sequence join, instead of traditional set join is employed. Since sequences contain orders, the rules established include forward reasoning and backward reasoning. Forward reasoning asserts rules in the order of event happening. Backward reasoning, on the other hand, asserts the rules in the reversed order. Both rules are valuable to EC and system designers.
Hsiao, Kuang-Yu, and 蕭廣佑. "Fuzzy Data Mining on Web Logs." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/74688406448388466191.
Full text南台科技大學
資訊管理系
92
With the improvement of technology, the Internet has been becoming an important part of everyday life. Governmental institutions and enterprises propose to advertise and marketing through webs. With the traveling records of browsers, one can analyze the preference of browsers, further understand the demands of consumers, and promote the advertising and marketing. In this study, we utilize Maximum Forward Reference algorithm to find the travel pattern of browsers from web logs. Simultaneously, experts are asked to evaluate the fuzzy importance weightings for different webs. At last, we employ fuzzy data mining technique that combines Apriori algorithm with fuzzy weights to determine the associate rules. From the yielded association rules, one can be accurately aware the information consumers need and which webs they prefer. This is important to governmental institutions and enterprises. Enterprises can find the commercial opportunities and improve the design of webs by means of this study. Governmental institutions can realize the needs of people from the obtained association rules, make the promotion of policy more efficiently, and provide better service quality.