Dissertations / Theses: 'Web mining'

1

Zheng, George. "Web Service Mining." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/26324.

Abstract:

In this dissertation, we present a novel approach for Web service mining. Web service mining is a new research discipline. It is different from conventional top down service composition approaches that are driven by specific search criteria. Web service mining starts with no such criteria and aims at the discovery of interesting and useful compositions of existing Web services. Web service mining requires the study of three main research topics: semantic description of Web services, efficient bottom up composition of composable services, and interestingness and usefulness evaluation of composed services. We first propose a Web service ontology to describe and organize the constructs of a Web service. We introduce the concept of Web service operation interface for the description of shared Web service capabilities and use Web service domains for grouping Web service capabilities based on these interfaces. We take clues from how Nature solves the problem of molecular composition and introduce the notion of Web service recognition to help devise efficient bottom up service composition strategies. We introduce several service recognition mechanisms that take advantage of the domain-based categorization of Web service capabilities and ontology-based description of operation semantics. We take clues from the drug discovery process and propose a Web service mining framework to group relevant mining activities into a progression of phases that would lead to the eventual discovery of useful compositions. Based on the composition strategies that are derived from recognition mechanisms, we propose a set of algorithms in the screening phase of the framework to automatically identify leads of service compositions. We propose objective interestingness and usefulness measures in the evaluation phase to narrow down the pool of composition leads for further exploration. To demonstrate the effectiveness of our framework and to address challenges faced by existing biological data representation methodologies, we have applied relevant techniques presented in this dissertation to the field of biological pathway discovery.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

2

Benkovská, Petra. "Web Usage Mining." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-3950.

Full text

Abstract:

General characteristic of web mining including methodology and procedures incorporated into this term. Relation to other areas (data mining, artificial intelligence, statistics, databases, internet technologies, management etc.) Web usage mining - data sources, data pre-processing, characterization of analytical methods and tools, interpretation of outputs (results), and possible areas of usage including examples. Suggestion of solution method, realization and a concrete example's outputs interpretation while using above mentioned methods of web usage mining.

APA, Harvard, Vancouver, ISO, and other styles

3

Oosthuizen, Craig Peter. "Web usage mining of organisational web sites." Thesis, Nelson Mandela Metropolitan University, 2005. http://hdl.handle.net/10948/399.

Full text

Abstract:

Web Usage Mining (WUM) can be used to determine whether the information architecture of a web site is structured correctly. Existing WUM tools however, do not indicate which web usage mining algorithms are used or provide effective graphical visualisations of the results obtained. WUM techniques can be used to determine typical navigation patterns of the users of organisational web sites. An organisational web site can be described as a site which has a high level of content. The Computer Science & Information Systems (CS&IS) web site at the Nelson Mandela Metropolitan University (NMMU) is an example of such a web site. The process of combining WUM and information visualisation techniques in order to discover useful information about web usage patterns is called visual web mining. The goal of this research is to discuss the development of a WUM model and a prototype, called WebPatterns, which allows the user to effectively visualise web usage patterns of an organisational web site. This will facilitate determining whether the information architecture of the CS&IS web site is structured correctly. The WUM algorithms used in WebPatterns are association rule mining and sequence analysis. The purpose of association rule mining is to discover relationships between different web pages within a web site. Sequence analysis is used to determine the longest time ordered paths that satisfy a user specified minimum frequency. A radial tree layout is used in WebPatterns to visualise the static structure of the organisational web site. The structure of the web site is laid out radially, with the home page in the middle and other pages positioned in circles at various levels around it. Colour and other visual cues are used to show the results of the WUM algorithms. User testing was used to determine the effectiveness and usefulness of WebPatterns for visualising web usage patterns. The results of the user testing clearly show that the participants were highly satisfied with the visual design and information provided by WebPatterns. All the participants also indicated that they would like to use WebPatterns in the future. Analysis of the web usage patterns presented by WebPatterns was used to determine that the information architecture of the CS&IS web site can be restructured to better facilitate information retrieval. Changes to the CS&IS web site web were suggested, included placing embedded hyperlinks on the home page to the frequently accessed sections of the web site.

APA, Harvard, Vancouver, ISO, and other styles

4

Martins, Bruno. "Geographically Aware Web Text Mining." Master's thesis, Department of Informatics, University of Lisbon, 2009. http://hdl.handle.net/10451/14301.

Full text

Abstract:

Text mining and search have become important research areas over the past few years, mostly due to the large popularity of the Web. A natural extension for these technologies is the development of methods for exploring the geographic context of Web information. Human information needs often present specific geographic constraints. Many Web documents also refer to speci c locations. However, relatively little e ort has been spent on developing the facilities required for geographic access to unstructured textual information. Geographically aware text mining and search remain relatively unexplored. This thesis addresses this new area, arguing that Web text mining can be applied to extract geographic context information, and that this information can be explored for information retrieval. Fundamental questions investigated include handling geographic references in text, assigning geographic scopes to the documents, and building retrieval applications that handle/use geographic scopes. The thesis presents appropriate solutions for each of these challenges, together with a comprehensive evaluation of their efectiveness. By investigating these questions, the thesis presents several findings on how the geographic context can be efectively handled by text processing tools.

APA, Harvard, Vancouver, ISO, and other styles

5

Stavrianou, Anna. "Modeling and mining of Web discussions." Phd thesis, Université Lumière - Lyon II, 2010. http://tel.archives-ouvertes.fr/tel-00564764.

Full text

Abstract:

Le développement du Web 2.0 a donné lieu à la production d'une grande quantité de discussions en ligne. La fouille et l'extraction de données de qualité de ces discussions en ligne sont importantes dans de nombreux domaines (industrie, marketing) et particulièrement pour toutes les applications de commerce électronique. Les discussions de ce type contiennent des opinions et des croyances de personnes et cela explique l'intérêt de développer des outils d'analyse efficaces pour ces discussions. L'objectif de cette thèse est de définir un modèle qui représente les discussions en ligne et facilite leur analyse. Nous proposons un modèle basé sur des graphes. Les sommets du graphe représentent les objets de type message. Chaque objet de type message contient des informations comme son contenu, son auteur, l'orientation de l'opinion qui y été exprimée et la date où il a été posté. Les liens parmi les objets message montrent une relation de type "répondre à". En d'autres termes, ils montrent quels objets répondent à quoi, conséquence directe de la structure de la discussion en ligne. Avec ce nouveau modèle, nous proposons un certain nombre de mesures qui guident la fouille au sein de la discussion et permettent d'extraire des informations pertinentes. Il existe des mesures centrées sur l'analyse de l'opinion qui traitent de l'évolution de l'opinion au sein de la discussion. Nous définissons également des mesures centrées sur le temps, qui exploitent la dimension temporelle du modèle, alors que les mesures centrées sur le sujet peuvent être utilisées pour mesurer la présence de sujets dans une discussion. La présence de l'utilisateur dans des discussions en ligne peut être exploitée soit par les techniques des réseaux sociaux, soit à travers notre nouveau modèle qui inclut la connaissance des auteurs de chaque objet message. De plus, une liste de messages clés est recommandée à l'utilisateur pour permettre une participation plus efficace au sein de la discussion.

APA, Harvard, Vancouver, ISO, and other styles

6

Norguet, Jean-Pierre. "Semantic analysis in web usage mining." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210890.

Full text

Abstract:

With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization.

Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.

Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results.

An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics.

In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites.

According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics.

Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

7

Chen, Hsinchun. "Special issue: "Web retrieval and mining"." Elsevier, 2003. http://hdl.handle.net/10150/106101.

Full text

Abstract:

Artificial Intelligence Lab, Department of MIS, University of Arizona
Search engines and data mining are two research areas that have experienced significant progress over the past few years. Overwhelming acceptance of the Internet as a primary medium for content delivery and business transactions has created unique opportunities and challenges for researchers. The richness of the webâ s multimedia content, the reach and timeliness of web-based publication, the proliferation of e-commerce activities and the potential for wireless web delivery have generated many interesting research problems. Technical, system, organizational and social research approaches are all needed to address these research problems. Many interesting webretrieval and mining research topics have emerged recently. These include, but are not limited to, the following: text and data mining on the web, web visualization, web intelligence and agents, web-based decision support and knowledge management, wireless web retrieval and visualization, web-based usability methodology, web-based analysis for eCommerce applications. This special issue consists of nine papers that report research in web retrieval and mining.

APA, Harvard, Vancouver, ISO, and other styles

8

Khalil, Faten. "Combining web data mining techniques for web page access prediction." University of Southern Queensland, Faculty of Sciences, 2008. http://eprints.usq.edu.au/archive/00004341/.

Full text

Abstract:

[Abstract]: Web page access prediction gained its importance from the ever increasing number of e-commerce Web information systems and e-businesses. Web page prediction, that involves personalising the Web users’ browsing experiences, assists Web masters in the improvement of the Web site structure and helps Web users in navigating the site and accessing the information they need. The most widely used approach for this purpose is the pattern discovery process of Web usage mining that entails many techniques like Markov model, association rules and clustering. Implementing pattern discovery techniques as such helps predict the next page tobe accessed by theWeb user based on the user’s previous browsing patterns. However, each of the aforementioned techniques has its own limitations, especiallywhen it comes to accuracy and space complexity. This dissertation achieves better accuracy as well as less state space complexity and rules generated by performingthe following combinations. First, we combine low-order Markov model and association rules. Markov model analysis are performed on the data sets. If the Markov model prediction results in a tie or no state, association rules are used for prediction. The outcome of this integration is better accuracy, less Markov model state space complexity and less number of generated rules than using each of the methods individually. Second, we integrate low-order Markov model and clustering. The data sets are clustered and Markov model analysis are performed oneach cluster instead of the whole data sets. The outcome of the integration is better accuracy than the first combination with less state space complexity than higherorder Markov model. The last integration model involves combining all three techniques together: clustering, association rules and low-order Markov model. The data sets are clustered and Markov model analysis are performed on each cluster. If the Markov model prediction results in close accuracies for the same item, association rules are used for prediction. This integration model achievesbetter Web page access prediction accuracy, less Markov model state space complexity and less number of rules generated than the previous two models.

APA, Harvard, Vancouver, ISO, and other styles

9

Khairo-Sindi, Mazin Omar. "Framework for web log pre-processing within web usage mining." Thesis, University of Manchester, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488456.

Full text

Abstract:

Web mining is gaining popularity by the day and the role of the web in providing invaluable information about users' behaviour and navigational patterns is now highly appreciated by information technology specialists and businesses alike. Nevertheless, given the enormity of the web and the complexities involved in delivering and retrieving electronic information, one can imagine the difficulties involved in extracting a set of minable objects from the raw and huge web log data. Added to the fact that web mining is a new science, this may explain why research on data pre-processing is still limited in scope. And, although the debate on major issues is still gaining momentum, attempts to establish a coherent and accurate web usage pre-processing framework are still non existent. As a contribution to the existing debate, this research aims at formulating a workable, reliable, and coherent pre-processing framework. The present study will address the following issues: enhance and maximise knowledge about every visit made to a given website from multiple web logs even when they have different schemas, improve the process of eliminating excessive web log data that are not related to users' behaviour, modify the existing approaches for session identification in order to obtain more accurate results and eliminate redundant data that comes as a result of repeatedly adding cached data to the web logs regardless whether or not the added page is a frameset. In addition to the suggested improvements, the study will also introduce a novel task, namely, "automatic web log integration". This will make it possible to integrate different web logs with different schemas into a unified data set. Finally, the study will incorporate unnecessary information, particularly that pertaining to malicious website visits into the non user request removal task. Put together, both the suggested improvements and novel tasks result into a coherent pre-processing framework. To test the reliability and validity of the framework, a website is created in order to perform the necessary experimental work and a prototype pre-processing tool is devised and employed to support it.

APA, Harvard, Vancouver, ISO, and other styles

10

Nagi, Mohamad. "Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14200.

Full text

Abstract:

The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.

APA, Harvard, Vancouver, ISO, and other styles

11

Liu, Qian. "Mining the Web to support Web image retrieval and image annotation." Thesis, University of Macau, 2007. http://umaclib3.umac.mo/record=b1677226.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Donato, Debora. "Web Mining and Exploration: Algorithms and Experiments." Doctoral thesis, La Sapienza, 2006. http://hdl.handle.net/11573/917052.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Poblete, Labra Bárbara. "Query-Based data mining for the web." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7270.

Full text

Abstract:

El objetivo de esta tesis es estudiar diferentes aplicaciones de la minería de consultas Web para mejorar el ranking en motores de búsqueda, mejorar la recuperación de información en la Web y mejorar los sitios Web. La principal motivación de este trabajo es aprovechar la información implícita que los usuarios dejan como rastro al navegar en la Web. A través de este trabajo buscamos demostrar el valor de la "sabiduría de las masas", que entregan las consultas, para muchas aplicaciones. Estas aplicaciones permiten un mejor entendimiento de las necesidades de los usuarios en la Web, mejorando en forma directa la interacción general que tienen los visitantes con los sitios Web y los buscadores.
The objective of this thesis is to study different applications of Web query mining for the improvement of search engine ranking, Web information retrieval and Web site enhancement. The main motivation of this work is to take advantage of the implicit feedback left in the trail of users while navigating through the Web. Throughout this work we seek to demonstrate the value of queries to extract interesting rules, patterns and information about the documents they reach. The models, created in this doctoral work, show that the "wisdom of the crowds" conveyed in queries has many applications that overall provide a better understanding of users' needs in the Web. This allows to improve the general interaction of visitors with Web sites and search engines in a straightforward way.

APA, Harvard, Vancouver, ISO, and other styles

14

Ngok, Man Chan. "Log mining to support web query expansions." Thesis, University of Macau, 2008. http://umaclib3.umac.mo/record=b1783608.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Tezuka, Taro. "Web mining for extracting cognitive geographic spaces." 京都大学 (Kyoto University), 2005. http://hdl.handle.net/2433/144807.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Li, Liangchun. "Web-based data visualization for data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ35845.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Ba-Omer, Hafidh Taher. "A framework for educational web usage mining." Thesis, University of Manchester, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492063.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Mulvenna, Maurice David. "Analyzing computer-mediated behaviour using web mining." Thesis, University of Ulster, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442371.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Kong, Wei. "EXPLORING HEALTH WEBSITE USERS BY WEB MINING." Thesis, Universal Access in Human-Computer Interaction. Applications and Services Lecture Notes in Computer Science, 2011, Volume 6768/2011, 376-383, DOI: 10.1007/978-3-642-21657-2_40, 2011. http://hdl.handle.net/1805/2810.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
With the continuous growth of health information on the Internet, providing user-orientated health service online has become a great challenge to health providers. Understanding the information needs of the users is the first step to providing tailored health service. The purpose of this study is to examine the navigation behavior of different user groups by extracting their search terms and to make some suggestions to reconstruct a website for more customized Web service. This study analyzed five months’ of daily access weblog files from one local health provider’s website, discovered the most popular general topics and health related topics, and compared the information search strategies for both patient/consumer and doctor groups. Our findings show that users are not searching health information as much as was thought. The top two health topics which patients are concerned about are children’s health and occupational health. Another topic that both user groups are interested in is medical records. Also, patients and doctors have different search strategies when looking for information on this website. Patients get back to the previous page more often, while doctors usually go to the final page directly and then leave the page without coming back. As a result, some suggestions to redesign and improve the website are discussed; a more intuitive portal and more customized links for both user groups are suggested.

APA, Harvard, Vancouver, ISO, and other styles

20

Yang, Yi Yang. "Identifying city landmarks by mining web albums." Thesis, University of Macau, 2015. http://umaclib3.umac.mo/record=b3335394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Escudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic web resource compilation using data mining." Master's thesis, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/65594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Escudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic Web Resource Compilation Using Data Mining." Master's thesis, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/10767.

Full text

Abstract:

Análise de Dados e Sistemas de Apoio à Decisão
Master in Data Analysis and Decision Support Systems
Nesta dissertação propomos uma metodologia que automatize a recolha de recursos na Web e facilite a sua exploração. Um recurso é uma colecção de documentos referentes a um tópico específico definido pelo utilizador. A intervenção do utilizador é explicitamente requerida numa fase inicial, quando este especifica as suas necessidades de informação e fornece alguns documentos exemplificativos. Após esta fase inicial, de definição e especificação das necessidades de informação, a metodologia mantém-se alinhada corn a contínua evolução das preferências do utilizador que são permanentemente monitorizadas e seguidas sem que seja necessáio requerer explicitamente a sua intervenção. Para tal, a metodologia analisa as preferencias do utilizador a partir das suas acções - guardar, imprimir, visualizar, alterar a categoria de documentos - que são automaticamente registadas durante cada sessão. Desta forma o utilizador fornece informação valiosa ao sistema sem qualquer esforço adicional. A metodologia prevê um nível de apresentação, desenhado com o objectivo de permitir a exploração e análise de colecções volumosas de documentos, através do qual o utilizador explora os seus recursos. 0 s recursos são compilados através de um processo de meta-search, onde as pesquisas são programadas por um agente que analisa o compromisso entre a actualidade do recurso e a percentagem de documentos duplicados nas respostas do processo de recolha. As pesquisas são programadas de forma a manter a actualidade do recurso, reduzindo, simultaneamente, o número de pesquisas efectuadas. A metodologia propõe também os mecanismos necessários para avaliar e controlar de forma automática a qualidade global do sistema. Esta qualidade é definida num espaço tridimensional cujas dimensões quantificam o desempenho no que se refere ao nível de Automação, Eficácia e Eficiência. Cada uma destas dimensões agrega um conjunto de medidas relevantes para a qualidade global do sistema: o nivel de Automação é calculado a partir da carga de trabalho que é explicitamente requerida ao utilizador; a Eficiência é calculada a partir das medidas de precison e accuracy; a Eficiência é calculada com base nas medidas de recall, freshness e novelty. 0 sistema mede e regista permanentemente o valor dos seus parâmetros de qualidade globais, que são usados para activar procedimentos correctivos ou preventivos de forma a corrigir ou antecipar uma degradação da qualidade global do sistema. A classificação de páginas Web assume-se como uma tarefa critica na nossa metodologia. Para avaliar da adequação de técnicas de aprendizagem semi-supervisionada foram desenhadas e realizadas algumas experiências. A realização destas experiências foi suportada por um protótipo que implementa parte da metodologia proposta e que foi implementado no decurso deste trabalho. Em particular este protótipo foi utilizado para compilar dois recursos distintos e para estudar a taxa de erro e a robustez da tarefa de classificação semi-automática.

APA, Harvard, Vancouver, ISO, and other styles

23

Leibold, Markus. "Web Log Mining als Controllinginstrument der PR." [S.l. : s.n.], 2004. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB11675715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ma, Yao. "Financial market predictions using Web mining approaches /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20MAY.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Schenker, Adam. "Graph-Theoretic Techniques for Web Content Mining." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000143.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Saad, Elmak. "Optimizing E-management Using Web data mining." Thesis, University of Huddersfield, 2018. http://eprints.hud.ac.uk/id/eprint/34540/.

Full text

Abstract:

Today, one of the biggest challenges that E-management systems face is the explosive growth of operating data and to use this data to enhance services. Web usage mining has emerged as an important technique to provide useful management information from user's Web data. One of the areas where such information is needed is the Web-based academic digital libraries. A digital library (D-library) is an information resource system to store resources in digital format and provide access to users through the network. Academic libraries offer a huge amount of information resources, these information resources overwhelm students and makes it difficult for them to access to relevant information. Proposed solutions to alleviate this issue emphasize the need to build Web recommender systems that make it possible to offer each student with a list of resources that they would be interested in. Collaborative filtering is the most successful technique used to offer recommendations to users. Collaborative filtering provides recommendations according to the user relevance feedback that tells the system their preferences. Most recent work on D-library recommender systems uses explicit feedback. Explicit feedback requires students to rate resources which make the recommendation process not realistic because few students are willing to provide their interests explicitly. Thus, collaborative filtering suffers from “data sparsity” problem. In response to this problem, the study proposed a Web usage mining framework to alleviate the sparsity problem. The framework incorporates clustering mining technique and usage data in the recommendation process. Students perform different actions on D-library, in this study five different actions are identified, including printing, downloading, bookmarking, reading, and viewing the abstract. These actions provide the system with large quantities of implicit feedback data. The proposed framework also utilizes clustering data mining approach to reduce the sparsity problem. Furthermore, generating recommendations based on clusters produce better results because students belonging to the same cluster usually have similar interests. The proposed framework is divided into two main components: off-line and online components. The off-line component is comprised of two stages: data pre-processing and the derivation of student clusters. The online component is comprised of two stages: building student's profile and generating recommendations. The second stage consists of three steps, in the first step the target student profile is classified to the closest cluster profile using the cosine similarity measure. In the second phase, the Pearson correlation coefficient method is used to select the most similar students to the target student from the chosen cluster to serve as a source of prediction. Finally, a top-list of resources is presented. Using the Book-Crossing dataset the effectiveness of the proposed framework was evaluated based on sparsity level, and Mean Absolute Error (MAE) regarding accuracy. The proposed framework reduced the sparsity level between (0.07% and 26.71%) in the sub-matrices, whereas the sparsity level is between 99.79% and 78.81% using the proposed framework, and 99.86% (for the original matrix) before applying the proposed framework. The experimental results indicated that by using the proposed framework the performance is as much as 13.12% better than clustering-only explicit feedback data, and 21.14% better than the standard K Nearest Neighbours method. The overall results show that the proposed framework can alleviate the Sparsity problem resulting in improving the accuracy of the recommendations.

APA, Harvard, Vancouver, ISO, and other styles

27

Escudeiro, Nuno Filipe Fonseca Vasconcelos. "Automatic Web Resource Compilation Using Data Mining." Dissertação, Faculdade de Economia da Universidade do Porto, 2004. http://hdl.handle.net/10216/10767.

Full text

Abstract:

Análise de Dados e Sistemas de Apoio à Decisão
Master in Data Analysis and Decision Support Systems
Nesta dissertação propomos uma metodologia que automatize a recolha de recursos na Web e facilite a sua exploração. Um recurso é uma colecção de documentos referentes a um tópico específico definido pelo utilizador. A intervenção do utilizador é explicitamente requerida numa fase inicial, quando este especifica as suas necessidades de informação e fornece alguns documentos exemplificativos. Após esta fase inicial, de definição e especificação das necessidades de informação, a metodologia mantém-se alinhada corn a contínua evolução das preferências do utilizador que são permanentemente monitorizadas e seguidas sem que seja necessáio requerer explicitamente a sua intervenção. Para tal, a metodologia analisa as preferencias do utilizador a partir das suas acções - guardar, imprimir, visualizar, alterar a categoria de documentos - que são automaticamente registadas durante cada sessão. Desta forma o utilizador fornece informação valiosa ao sistema sem qualquer esforço adicional. A metodologia prevê um nível de apresentação, desenhado com o objectivo de permitir a exploração e análise de colecções volumosas de documentos, através do qual o utilizador explora os seus recursos. 0 s recursos são compilados através de um processo de meta-search, onde as pesquisas são programadas por um agente que analisa o compromisso entre a actualidade do recurso e a percentagem de documentos duplicados nas respostas do processo de recolha. As pesquisas são programadas de forma a manter a actualidade do recurso, reduzindo, simultaneamente, o número de pesquisas efectuadas. A metodologia propõe também os mecanismos necessários para avaliar e controlar de forma automática a qualidade global do sistema. Esta qualidade é definida num espaço tridimensional cujas dimensões quantificam o desempenho no que se refere ao nível de Automação, Eficácia e Eficiência. Cada uma destas dimensões agrega um conjunto de medidas relevantes para a qualidade global do sistema: o nivel de Automação é calculado a partir da carga de trabalho que é explicitamente requerida ao utilizador; a Eficiência é calculada a partir das medidas de precison e accuracy; a Eficiência é calculada com base nas medidas de recall, freshness e novelty. 0 sistema mede e regista permanentemente o valor dos seus parâmetros de qualidade globais, que são usados para activar procedimentos correctivos ou preventivos de forma a corrigir ou antecipar uma degradação da qualidade global do sistema. A classificação de páginas Web assume-se como uma tarefa critica na nossa metodologia. Para avaliar da adequação de técnicas de aprendizagem semi-supervisionada foram desenhadas e realizadas algumas experiências. A realização destas experiências foi suportada por um protótipo que implementa parte da metodologia proposta e que foi implementado no decurso deste trabalho. Em particular este protótipo foi utilizado para compilar dois recursos distintos e para estudar a taxa de erro e a robustez da tarefa de classificação semi-automática.

APA, Harvard, Vancouver, ISO, and other styles

28

Zhu, Jianhan. "Mining web site link structures for adaptive web site navigation and search." Thesis, University of Ulster, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515890.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Salin, Suleyman. "Web Usage Mining And Recommendation With Semantic Information." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12610483/index.pdf.

Full text

Abstract:

Web usage mining has become popular in various business areas related with Web site development. In Web usage mining, the commonly visited navigational paths are extracted in terms of Web page addresses from the Web server visit logs, and the patterns are used in various applications. The semantic information of the Web page contents is generally not included in Web usage mining. In this thesis, a framework for integrating semantic information with Web usage mining is implemented. The frequent navigational patterns are extracted in the forms of ontology instances instead of Web page addresses and the result is used for making page recommendations to the visitor. Moreover, an evaluation mechanism is implemented to find the success of the recommendation. Test results proved that stronger and more accurate recommendations are obtained by including semantic information in the Web usage mining instead of using on visited Web page addresses.

APA, Harvard, Vancouver, ISO, and other styles

30

Zettsu, Koji. "Aspect discovery : mining context in world wide Web." 京都大学 (Kyoto University), 2005. http://hdl.handle.net/2433/144804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Sobolewska, Katarzyna-Ewa. "Web links utility assessment using data mining techniques." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2936.

Full text

Abstract:

This paper is focusing on the data mining solutions for the WWW, specifically how it can be used for the hyperlinks evaluation. We are focusing on the hyperlinks used in the web sites systems and on the problem which consider evaluation of its utility. Since hyperlinks reflect relation to other webpage one can expect that there exist way to verify if users follow desired navigation paths. The Challenge is to use available techniques to discover usage behavior patterns and interpret them. We have evaluated hyperlinks of the selected pages from www.bth.se web site. By using web expert’s help the usefulness of the data mining as the assessment basis was validated. The outcome of the research shows that data mining gives decision support for the changes in the web site navigational structure.
akasha.kate@gmail.com

APA, Harvard, Vancouver, ISO, and other styles

32

Mortazavi-Asl, Behzad. "Discovering and mining user Web-page traversal patterns." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ61594.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

SOARES, FABIO DE AZEVEDO. "TEXT MINING AT THE INTELLIGENT WEB CRAWLING PROCESS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=13212@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Esta dissertação apresenta um estudo sobre a utilização de Mineração de Textos no processo de coleta inteligente de dados na Web. O método mais comum de obtenção de dados na Web consiste na utilização de web crawlers. Web crawlers são softwares que, uma vez alimentados por um conjunto inicial de URLs (sementes), iniciam o procedimento metódico de visitar um site, armazenálo em disco e extrair deste os hyperlinks que serão utilizados para as próximas visitas. Entretanto, buscar conteúdo desta forma na Web é uma tarefa exaustiva e custosa. Um processo de coleta inteligente de dados na Web, mais do que coletar e armazenar qualquer documento web acessível, analisa as opções de crawling disponíveis para encontrar links que, provavelmente, fornecerão conteúdo de alta relevância a um tópico definido a priori. Na abordagem de coleta de dados inteligente proposta neste trabalho, tópicos são definidos, não por palavras chaves, mas, pelo uso de documentos textuais como exemplos. Em seguida, técnicas de pré-processamento utilizadas em Mineração de Textos, entre elas o uso de um dicionário thesaurus, analisam semanticamente o documento apresentado como exemplo. Baseado nesta análise, o web crawler construído será guiado em busca do seu objetivo: recuperar informação relevante sobre o documento. A partir de sementes ou realizando uma consulta automática nas máquinas de buscas disponíveis, o crawler analisa, igualmente como na etapa anterior, todo documento recuperado na Web. Então, é executado um processo de comparação entre cada documento recuperado e o documento exemplo. Depois de obtido o nível de similaridade entre ambos, os hyperlinks do documento recuperado são analisados, empilhados e, futuramente, serão desempilhados de acordo seus respectivos e prováveis níveis de importância. Ao final do processo de coleta de dados, outra técnica de Mineração de Textos é aplicada, objetivando selecionar os documentos mais representativos daquela coleção de textos: a Clusterização de Documentos. A implementação de uma ferramenta que contempla as heurísticas pesquisadas permitiu obter resultados práticos, tornando possível avaliar o desempenho das técnicas desenvolvidas e comparar os resultados obtidos com outras formas de recuperação de dados na Web. Com este trabalho, mostrou-se que o emprego de Mineração de Textos é um caminho a ser explorado no processo de recuperação de informação relevante na Web.
This dissertation presents a study about the application of Text Mining as part of the intelligent Web crawling process. The most usual way of gathering data in Web consists of the utilization of web crawlers. Web crawlers are softwares that, once provided with an initial set of URLs (seeds), start the methodical proceeding of visiting a site, store it in disk and extract its hyperlinks that will be used for the next visits. But seeking for content in this way is an expensive and exhausting task. An intelligent web crawling process, more than collecting and storing any web document available, analyses its available crawling possibilities for finding links that, probably, will provide high relevant content to a topic defined a priori. In the approach suggested in this work, topics are not defined by words, but rather by the employment of text documents as examples. Next, pre-processing techniques used in Text Mining, including the use of a Thesaurus, analyze semantically the document submitted as example. Based on this analysis, the web crawler thus constructed will be guided toward its objective: retrieve relevant information to the document. Starting from seeds or querying through available search engines, the crawler analyzes, exactly as in the previous step, every document retrieved in Web. the similarity level between them is obtained, the retrieved document`s hyperlinks are analysed, queued and, later, will be dequeued according to each one`s probable degree of importance. By the end of the gathering data process, another Text Mining technique is applied, with the propose of selecting the most representative document among the collected texts: Document Clustering. The implementation of a tool incorporating all the researched heuristics allowed to achieve results, making possible to evaluate the performance of the developed techniques and compare all obtained results with others means of retrieving data in Web. The present work shows that the use of Text Mining is a track worthy to be exploited in the process of retrieving relevant information in Web.

APA, Harvard, Vancouver, ISO, and other styles

34

Мінакова, В. П., and Н. В. Геселева. "Web-mining: інтелектуальний аналіз даних в мережі Internet." Thesis, КНУТД, 2016. https://er.knutd.edu.ua/handle/123456789/4457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Shun, Yeuk Kiu. "Web mining from client side user activity log /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?COMP%202002%20SHUN.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 85-90). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

36

Novák, Petr. "Data mining časových řad." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-72068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Chiara, Ramon. ""Aplicação de técnicas de data mining em logs de servidores web"." Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19012004-093205/.

Full text

Abstract:

Com o advento da Internet, as empresas puderam mostrar-se para o mundo. A possibilidade de colocar um negócio na World Wide Web (WWW) criou um novo tipo de dado que as empresas podem utilizar para melhorar ainda mais seu conhecimento sobre o mercado: a sequência de cliques que um usuário efetua em um site. Esse dado pode ser armazenado em uma espécie de Data Warehouse para ser analisado com técnicas de descoberta de conhecimento em bases de dados. Assim, há a necessidade de se realizar pesquisas para mostrar como retirar conhecimento a partir dessas sequências de cliques. Neste trabalho são discutidas e analisadas algumas das técnicas utilizadas para atingir esse objetivo. é proposta uma ferramenta onde os dados dessas sequências de cliques são mapeadas para o formato atributo-valor utilizado pelo Sistema Discover, um sistema sendo desenvolvindo em nosso Laboratório para o planejamento e execução de experimentos relacionados aos algoritmos de aprendizado utilizados durante a fase de Mineração de Dados do processo de descoberta de conhecimento em bases de dados. Ainda, é proposta a utilização do sistema de Programação Lógica Indutiva chamado Progol para extrair conhecimento relacional das sessões de sequências de cliques que caracterizam a interação de usuários com as páginas visitadas no site. Experimentos iniciais com a utilização de uma sequência de cliques real foram realizados usando Progol e algumas das facilidades já implementadas pelo Sistema Discover.

APA, Harvard, Vancouver, ISO, and other styles

38

Peng, Puping. "Web mining with jMap technology." Thesis, 2002. http://spectrum.library.concordia.ca/1653/1/MQ68477.pdf.

Full text

Abstract:

This report presents a software development process of using Breadth-First-Search algorithm to discover pages in a web site, retrieve and analyze information from such sites. The information is transformed to joined maps for content and structure mining. The software "Website jMap Builder" is the first step towards web mining using jMap technology. It was developed using PERL and Visual Basic. Functional and design specifications of the system are given in Chapter 4 and a user manual is provided in Chapter 5.

APA, Harvard, Vancouver, ISO, and other styles

39

Liao, Shao-An, and 廖紹安. "Mining Closed Web Traversal Patterns." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/16547330032244431089.

Full text

Abstract:

碩士
銘傳大學
資訊工程學系碩士班
99
Mining web traversal patterns is to find the traversal paths for most of the web users from web access logs. Most of the researches for mining web traversal patterns do not consider the user behavior of backward traversal. Besides, many redundant will be generated when the minimum support is low. In order to provide important and condensed information for users, we define the information called closed web traversal patterns. All the web traversal patterns can be derived from the closed web traversal patterns. In this paper, we propose an efficient algorithm for mining closed web traversal patterns from the paths traversed by all the web users. Our algorithm is based on a tree structure and the backward traversal is also considered. When a node is created on the tree structure, we can immediately determine if it is closed or not by using some mechanisms. Only find the closed web traversal patterns can reduce memory and search space, and improve the mining efficiency.

APA, Harvard, Vancouver, ISO, and other styles

40

Chen, Meng-Hau, and 陳孟豪. "A Web Mining Architecture for XML Web Pages Characteristic." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/ebbrs6.

Full text

Abstract:

碩士
靜宜大學
資訊管理學系研究所
90
Due to the growing of the web sites’ usage, the web sites have become popular channels for data transmission and data sharing. In the past, most researches related to Web Mining didn’t provide effective methods to find out the relative information about user browsing content, because the structure of HTML is too unstructure and the HTML tags can’t provide details and specific information about the web page. In recent years, the shortcomings of HTML have been overcome by XML. Therefore, we propose a method to extract the tag information from the XML web documents to find out the patterns of users’ web usage based on the characteristics of XML. In our thesis, we provide two kinds of XML documents tag extracting mechanisms. Fist method is to extract the tags based on the web site itself, the other is to retrieval and extract the documents based on user’s roaming path. Using our mechanisms, we can excavate the information related to users’ favorite web contents and analysis the users’ web browsing behaviors. Besides, we also propose a personalization recommend method based on the previous methods. Based on our personalization recommend system, we can recommend different and suitable products to distinct customer groups.

APA, Harvard, Vancouver, ISO, and other styles

41

Lin, Ching-Nan, and 林慶南. "Enhancement of Web Sites Security Utilizing Web Logs Mining." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/34333759103494653266.

Full text

Abstract:

碩士
中原大學
電子工程研究所
90
Abstract The problem of information security on the Web has become an important research issue recently. Because the Backdoors or information leak of scripts in Common Getaway Interface（CGI）is hidden inadvertently or premeditated by programmers, these problems cause enterprise’s information to be gotten illegally, and can’t be detected by security tools easily. Besides, Internet grows fast to encourage the important research of Web mining. Therefore, in order to detect Backdoor or information leak of CGI scripts that the some security tools can’t detect and to avoid damage of enterprises, we propose a log data mining to enhance the security of Web servers. First, we combine Web application log data with Web log data to solve the problems in Web log. Then, our method uses the density-based clustering algorithm to mine some abnormal Web log and Web application log data. The obtained information can help system administrator detecting the Backdoor or information leakages in programs more easily. Moreover, the mined information can help system administrator detecting the problem of CGI scripts from on-line Web site log data.

APA, Harvard, Vancouver, ISO, and other styles

42

Mongolu, Vivek. "Distributed data mining using web services." 2004. http://etd.louisville.edu/data/UofL0076t2004.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Yo, Shu-Han, and 游舒涵. "Mining Related Terms from Web Pages." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/34301958562258504190.

Full text

Abstract:

碩士
國立中正大學
資訊工程所
94
With the fast development of Internet and the popularization day by day of the broadband network, the network resources increase sharply. There are more and more network resources that people can obtain. So under the World Wide Web that includes the massive materials, there are many information retrieval researches that mine the data from it. In this paper, we will mine related terms from web pages under the World Wide Web. In the paper, first we will make analysis of the massive web pages to define different block definition and cut units from web pages. Afterward we will do word segmentation in the cutting unit, and pair any two words for related term of each other. After analyzing all webpage materials at present, we will count the result and and calculate the relation between two words with the common occurrence number (co-occurrence) of the two words. We can use the related term make the search result more precisely, and help the users to find more precise information.

APA, Harvard, Vancouver, ISO, and other styles

44

Bill, Hong, and 洪渝翔. "Web Usage Mining Based on AJAX." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/56200534079798903747.

Full text

Abstract:

碩士
國立彰化師範大學
資訊管理學系所
96
With powerful dynamic web development tools and Communities mechanism, Web 2.0 was created; users browse, process and share information on the rich internet application, which is highly-interactive. On account of transformation in browsing environment, the method and meaning of the web usage mining are different from before: the main element in mining process is varied from pages to message which is used for request from server and response to client. In the past, the researches in web usage mining based on web log ignore GET and POST message which contain important data. However, dynamic pages with highly-interaction, the exchange of massages is frequent, and so the web usage mining based on pages is being decayed. This study will probe into the application of AJAX to web usage mining. Also, it suggests the web usage mining be based on AJAX which is aimed at the data of interface usage. With the XML response message on AJAX, it collects and analyzes the data from interaction with the interface; it also integrates with database and provides more diverse and rich data source and meaning of analysis.

APA, Harvard, Vancouver, ISO, and other styles

45

"Web opinion mining on consumer reviews." 2008. http://library.cuhk.edu.hk/record=b5893776.

Full text

Abstract:

Wong, Yuen Chau.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 80-83).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview --- p.1
Chapter 1.2 --- Motivation --- p.3
Chapter 1.3 --- Objective --- p.5
Chapter 1.4 --- Our contribution --- p.5
Chapter 1.5 --- Organization of the Thesis --- p.6
Chapter 2 --- Related Work --- p.7
Chapter 2.1 --- Existing Sentiment Classification Approach --- p.7
Chapter 2.2 --- Existing Sentiment Analysis Approach --- p.9
Chapter 2.3 --- Our Approach --- p.11
Chapter 3 --- Extracting Product Feature Sentences using Supervised Learning Algorithms --- p.12
Chapter 3.1 --- Overview --- p.12
Chapter 3.2 --- Association Rules Mining --- p.13
Chapter 3.2.1 --- Apriori Algorithm --- p.13
Chapter 3.2.2 --- Class Association Rules Mining --- p.14
Chapter 3.3 --- Naive Bayesian Classifier --- p.14
Chapter 3.3.1 --- Basic Idea --- p.14
Chapter 3.3.2 --- Feature Selection Techniques --- p.15
Chapter 3.4 --- Experiment --- p.17
Chapter 3.4.1 --- Data Sets --- p.18
Chapter 3.4.2 --- Experimental Setup and Evaluation Measures --- p.19
Chapter 3.4.1 --- Class Association Rules Mining --- p.20
Chapter 3.4.2 --- Naive Bayesian Classifier --- p.22
Chapter 3.4.3 --- Effect on Data Size --- p.25
Chapter 3.5 --- Discussion --- p.27
Chapter 4 --- Extracting Product Feature Sentences Using Unsupervised Learning Algorithms --- p.28
Chapter 4.1 --- Overview --- p.28
Chapter 4.2 --- Unsupervised Learning Algorithms --- p.29
Chapter 4.2.1 --- K-means Algorithm --- p.29
Chapter 4.2.2 --- Density-Based Scan --- p.29
Chapter 4.2.3 --- Hierarchical Clustering --- p.30
Chapter 4.3 --- Distance Function --- p.32
Chapter 4.3.1 --- Euclidean Distance --- p.32
Chapter 4.3.2 --- Jaccard Distance --- p.32
Chapter 4.4 --- Experiment --- p.33
Chapter 4.4.1 --- Cluster Labeling --- p.33
Chapter 4.4.2 --- K-means Algorithm --- p.34
Chapter 4.4.3 --- Density-Based Scan --- p.35
Chapter 4.4.4 --- Hierarchical Clustering --- p.36
Chapter 4.5 --- Discussion --- p.37
Chapter 5 --- Extracting Product Feature Sentences Using Concept Clustering --- p.39
Chapter 5.1 --- Overview --- p.39
Chapter 5.2 --- Distance Function --- p.40
Chapter 5.2.1 --- Association Weight --- p.40
Chapter 5.2.2 --- Chi Square --- p.41
Chapter 5.2.3 --- Mutual Information --- p.41
Chapter 5.3 --- Experiment --- p.41
Chapter 5.3.1 --- Effect on Distance Functions --- p.42
Chapter 5.3.2 --- Extraction of Product Features Clusters --- p.43
Chapter 5.3.3 --- Labeling of Sentences --- p.45
Chapter 5.4 --- Discussion --- p.48
Chapter 6 --- Extracting Product Feature Sentences Using Concept Clustering and Proposed Unsupervised Learning Algorithm --- p.49
Chapter 6.1 --- Overview --- p.49
Chapter 6.2 --- Problem Statement --- p.50
Chapter 6.3 --- Proposed Algorithm - Scalable Thresholds Clustering --- p.50
Chapter 6.4 --- Properties of the Proposed Unsupervised Learning Algorithm --- p.54
Chapter 6.4.1 --- Relationship between threshold functions & shape of clusters --- p.54
Chapter 6.4.2 --- Expansion process --- p.56
Chapter 6.4.3 --- Impact of Different Threshold Functions --- p.58
Chapter 6.5 --- Experiment --- p.61
Chapter 6.5.1 --- Comparative Studies for Clusters Formation and Sentences Labeling with Digital Camera Dataset --- p.62
Chapter 6.5.2 --- Experiments with New Datasets --- p.67
Chapter 6.6 --- Discussion --- p.74
Chapter 7 --- Conclusion and Future Work --- p.76
Chapter 7.1 --- Compare with Existing Work --- p.76
Chapter 7.2 --- Contribution & Implication of this Work --- p.78
Chapter 7.3 --- Future Work & Improvement --- p.79
REFFERENCE --- p.80
Chapter A --- Concept Clustering for DC data with DB Scan (Terms in Concept Clusters) --- p.84
Chapter B --- Concept Clustering for DC data with Single-linkage Hierarchical Clustering (Terms in Concept Clusters) --- p.87
Chapter C --- Concept Clusters for Digital Camera data (Comparative Studies) --- p.91
Chapter D --- Concept Clusters for Personal Computer data (Comparative Studies) --- p.98
Chapter E --- Concept Clusters for Mobile data (Comparative Studies) --- p.103
Chapter F --- Concept Clusters for MP3 data (Comparative Studies) --- p.109

APA, Harvard, Vancouver, ISO, and other styles

46

Chen, Yu-Ru, and 陳郁儒. "Mining Bilingual Collocations on the Web." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/10393173072986484554.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系
97
In this paper, we introduce a new method for learning to find translation equivalents of a given collocation on the Web based on the query expansion strategy. Our approach involves finding translations in a parallel corpus and learning query expansion terms for the given collocation in order to bias search engines towards returning the top-ranked snippets containing sought-after translations. We utilized the corpus translations from parallel corpus and attempt to learn additional QE terms for retrieving more translations on the Web. The query expansion method is trained on a parallel corpus and validated on the Web. At run time, a given collocation is automatically transformed into a set of queries and sent to a search engine. Then candidate translations are retrieved from the returned snippets and ranked according to their similarity with respect to the corpus translations. Our method provides significantly more translation equivalents from the Web in addition to translations found in parallel corpus, which could be used to assist language learners, translator, and the development of machine translation systems.

APA, Harvard, Vancouver, ISO, and other styles

47

Wang, Siou-Hao, and 王修毫. "Mining Tourism Information from Web Fourm." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/13425843195175123562.

Full text

Abstract:

碩士
國立交通大學
管理學院資訊管理學程
101
Tourism is one kind of industry without pollution, and brings stable economic growth in last ten years. It is quite important to know the trends of tourism for people who make tourism plan or dedicate to improve the quality of tourism. According to the report of Tourism bureau, most of Taiwanese travel at the region of their current address place of residence. This paper aims to know what kinds of topics will be given special attention. Although tourists have similar behavior at the same region, but consider the complexity of tourism plan making, it must have few differences. Most of information is unstructured or semi-structured on web forum. Through a series of data pre-processing, tourism information can be extracted. Finally, here processing web forum's data is processed with Apriori algorithm. According to the analysis, the behaviors of tourism are highly similar at Taoyuan County and Hsinchu City. People tend to pay attention on topic of tourism about "Food"; the behavior of Hsinchu County's tourists fall between Taoyuan County and Hsinchu City. Moreover, negative topics are paid highly attention; the most hit tourism topics of Miaoli County are all about "sightseeing". But these topics are not paid attention continually.

APA, Harvard, Vancouver, ISO, and other styles

48

Su, Dong-po, and 蘇東坡. "Applying Neural Network to Web Mining." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/60882080051232983173.

Full text

Abstract:

碩士
南華大學
資訊管理學研究所
91
In this thesis, we apply widely-used data mining techniques, neural network, in the user’s characteristics of classification on WWW. In this proposed method, we first utilize feature weight detector networks to discover the reliable features from mass and complex training data on WWW. Secondly, we use the proportional learning vector quantization network to learn the appropriate centroid of each cluster. Finally, we apply radial basis function network associated with the centroid of clusters to classify the test data. We in advance partition the data set into several data sections that are used in experiments according to session length. Experimental result show that it has better classification result than ones using overall data set.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Shih-Sheng, and 陳仕昇. "Mining Web Traversal rules with Sequences." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/75169284009872580156.

Full text

Abstract:

碩士
國立中央大學
資訊管理研究所
87
Web traversal patterns and rules are valuable to both Electronic Commerce and System Designers. If business owners know users'' traversal behaviors, they can put advertisement banner in proper web pages. The same information can help systems to pre-fetch web pages and reduce response time. In this article, we propose a new data mining method to find the traversal patterns and associated rules. Traversal patterns are recorded in sequences, which have total orders among their elements. Sequences may have duplicated elements, and hence requires a new threshold computing method. The new method results in thresholds decreasing when sequences expanding. To resolve the issue, we design Next Pass Large Threshold and Next Pass Large Sequences to forecast needed sequences and thresholds. To expand sequences properly, sequence join, instead of traditional set join is employed. Since sequences contain orders, the rules established include forward reasoning and backward reasoning. Forward reasoning asserts rules in the order of event happening. Backward reasoning, on the other hand, asserts the rules in the reversed order. Both rules are valuable to EC and system designers.

APA, Harvard, Vancouver, ISO, and other styles

50

Hsiao, Kuang-Yu, and 蕭廣佑. "Fuzzy Data Mining on Web Logs." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/74688406448388466191.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
92
With the improvement of technology, the Internet has been becoming an important part of everyday life. Governmental institutions and enterprises propose to advertise and marketing through webs. With the traveling records of browsers, one can analyze the preference of browsers, further understand the demands of consumers, and promote the advertising and marketing. In this study, we utilize Maximum Forward Reference algorithm to find the travel pattern of browsers from web logs. Simultaneously, experts are asked to evaluate the fuzzy importance weightings for different webs. At last, we employ fuzzy data mining technique that combines Apriori algorithm with fuzzy weights to determine the associate rules. From the yielded association rules, one can be accurately aware the information consumers need and which webs they prefer. This is important to governmental institutions and enterprises. Enterprises can find the commercial opportunities and improve the design of webs by means of this study. Governmental institutions can realize the needs of people from the obtained association rules, make the promotion of policy more efficiently, and provide better service quality.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Web mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles