Dissertations / Theses on the topic 'WEB USAGE DATA'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'WEB USAGE DATA.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Winblad, Emanuel. "Visualization of web site visit and usage data." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-110576.
Full textKhalil, Faten. "Combining web data mining techniques for web page access prediction." University of Southern Queensland, Faculty of Sciences, 2008. http://eprints.usq.edu.au/archive/00004341/.
Full textBayir, Murat Ali. "A New Reactive Method For Processing Web Usage Data." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12607323/index.pdf.
Full textSmart-SRA'
is introduced. Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigations of Web users. As in classical data mining, data processing and pattern discovery are the main issues in web usage mining. The first phase of the web usage mining is the data processing phase including session reconstruction. Session reconstruction is the most important task of web usage mining since it directly affects the quality of the extracted frequent patterns at the final step, significantly. Session reconstruction methods can be classified into two categories, namely '
reactive'
and '
proactive'
with respect to the data source and the data processing time. If the user requests are processed after the server handles them, this technique is called as &lsquo
reactive&rsquo
, while in &lsquo
proactive&rsquo
strategies this processing occurs during the interactive browsing of the web site. Smart-SRA is a reactive session reconstruction techique, which uses web log data and the site topology. In order to compare Smart-SRA with previous reactive methods, a web agent simulator has been developed. Our agent simulator models behavior of web users and generates web user navigations as well as the log data kept by the web server. In this way, the actual user sessions will be known and the successes of different techniques can be compared. In this thesis, it is shown that the sessions generated by Smart-SRA are more accurate than the sessions constructed by previous heuristics.
Wu, Hao-cun, and 吳浩存. "A multidimensional data model for monitoring web usage and optimizing website topology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B29528215.
Full textWang, Long. "X-tracking the usage interest on web sites." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2011/5107/.
Full textWegen des exponentiellen Ansteigens der Anzahl an Internet-Nutzern und Websites ist das WWW (World Wide Web) die wichtigste globale Informationsressource geworden. Das Web bietet verschiedene Dienste (z. B. Informationsveröffentlichung, Electronic Commerce, Entertainment oder Social Networking) zum kostengünstigen und effizienten erlaubten Zugriff an, die von Einzelpersonen und Institutionen zur Verfügung gestellt werden. Um solche Dienste anzubieten, werden weltweite, vereinzelte Websites als Basiseinheiten definiert. Aber die extreme Fragilität der Web-Services und -inhalte, die hohe Kompetenz zwischen ähnlichen Diensten für verschiedene Sites bzw. die breite geographische Verteilung der Web-Nutzer treiben einen dringenden Bedarf für Web-Manager und das Verfolgen und Verstehen der Nutzungsinteresse ihrer Web-Kunden. Die Arbeit zielt darauf ab, dass die Anforderung "X-tracking the Usage Interest on Web Sites" erfüllt wird. "X" hat zwei Bedeutungen. Die erste Bedeutung ist, dass das Nutzungsinteresse von verschiedenen Websites sich unterscheidet. Außerdem stellt die zweite Bedeutung dar, dass das Nutzungsinteresse durch verschiedene Aspekte (interne und externe, strukturelle und konzeptionelle) beschrieben wird. Tracking zeigt, dass die Änderungen zwischen Nutzungsmustern festgelegt und gemessen werden. Die Arbeit eine Methodologie dar, um das Nutzungsinteresse gekoppelt an drei Arten von Websites (Public Informationsportal-Website, E-Learning-Website und Social-Website) zu finden. Wir konzentrieren uns auf unterschiedliche Themen im Bezug auf verschieden Sites, die mit Usage-Interest-Mining eng verbunden werden. Education Informationsportal-Website ist das erste Implementierungsscenario für Web-Usage-Mining. Durch das Scenario können Nutzungsmuster gefunden und die Organisation von Web-Services optimiert werden. In solchen Fällen wird das Nutzungsmuster als häufige Pagemenge, Navigation-Wege, -Strukturen oder -Graphen modelliert. Eine notwendige Voraussetzung ist jedoch, dass man individuelle Verhaltensmuster aus dem Verlauf der Nutzung (Usage History) wieder aufbauen muss. Deshalb geben wir in dieser Arbeit eine systematische Studie zum Nachempfinden der individuellen Verhaltensweisen. Außerdem zeigt die Arbeit eine neue Strategie, dass auf Page-Paaren basierten Content-Clustering aus Nutzungssite aufgebaut werden. Der Unterschied zwischen solchen Clustern und der originalen Webstruktur ist der Abstand zwischen Zielen der Nutzungssite und Erwartungen der Designsite. Darüber hinaus erforschen wir Probleme beim Tracking der Änderungen von Nutzungsmustern in ihrem Lebenszyklus. Die Änderungen werden durch mehrere Aspekte beschrieben. Für internen Aspekt werden konzeptionelle Strukturen und Funktionen integriert. Der externe Aspekt beschreibt physische Eigenschaften. Für lokalen Aspekt wird die Differenz zwischen zwei Zeitspannen gemessen. Der globale Aspekt zeigt Tendenzen der Änderung entlang des Lebenszyklus. Eine Plattform "Web-Cares" wird entwickelt, die die Nutzungsinteressen findet, Unterschiede zwischen Nutzungsinteresse und Website messen bzw. die Änderungen von Nutzungsmustern verfolgen kann. E-Learning-Websites bieten Lernmaterialien wie z.B. Folien, erfaßte Video-Vorlesungen und Übungsblätter an. Wir konzentrieren uns auf die Erfoschung des Lerninteresses auf Streaming-Vorlesungen z.B. Real-Media, mp4 und Flash-Clips. Im Vergleich zum Informationsportal Website kapselt die Nutzung auf Streaming-Vorlesungen die Variablen wie Schauzeit und Schautätigkeiten während der Lernprozesse. Das Lerninteresse wird erfasst, wenn wir Antworten zu sechs Fragen gehandelt haben. Diese Fragen umfassen verschiedene Themen, wie Erforschung der Relation zwischen Teilen von Lehrveranstaltungen oder die Präferenz zwischen den verschiedenen Formen der Lehrveranstaltungen. Wir bevorzugen die Aufdeckung der Veränderungen des Lerninteresses anhand der gleichen Kurse aus verschiedenen Semestern. Der Differenz auf den Inhalt und die Struktur zwischen zwei Kurse beeinflusst die Änderungen auf das Lerninteresse. Ein Algorithmus misst die Differenz des Lerninteresses im Bezug auf einen Ähnlichkeitsvergleich zwischen den Kursen. Die Suchmaschine „Task-Moniminer“ wird entwickelt, dass die Lehrkräfte das Lerninteresse für ihre Streaming-Vorlesungen über das Videoportal tele-TASK abrufen können. Social Websites dienen als eine Online-Community, in den teilnehmenden Web-Benutzern die gemeinsamen Themen diskutieren und ihre interessanten Informationen miteinander teilen. Im Vergleich zur Public Informationsportal-Website und E-Learning Website bietet diese Art von Website reichhaltige Interaktionen zwischen Benutzern und Inhalten an, die die breitere Auswahl der inhaltlichen Qualität bringen. Allerdings bietet eine Social-Website mehr Möglichkeiten zur Modellierung des Nutzungsinteresses an. Wir schlagen ein Rahmensystem vor, die hohe Reputation für Artikel in eine Social-Website empfiehlt. Unsere Beobachtungen sind, dass die Reputation in globalen und lokalen Kategorien klassifiziert wird. Außerdem wird die Qualität von Artikeln mit hoher Reputation mit den Content-Funktionen in Zusammenhang stehen. Durch die folgenden Schritte wird das Rahmensystem im Bezug auf die Überwachungen implementiert. Der erste Schritt ist, dass man die Artikel mit globalen oder lokalen Reputation findet. Danach werden Artikel im Bezug auf ihre Content-Relationen in jeder Kategorie gesammelt. Zum Schluß werden die ausgewählten Artikel aus jedem basierend auf ihren Reputation-Ranking Cluster empfohlen.
Norguet, Jean-Pierre. "Semantic analysis in web usage mining." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210890.
Full textIndeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.
Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results.
An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics.
In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites.
According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics.
Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished
Luczak-Rösch, Markus [Verfasser]. "Usage-dependent maintenance of structured Web data sets / Markus Luczak-Rösch." Berlin : Freie Universität Berlin, 2014. http://d-nb.info/1068253827/34.
Full textVollino, Bruno Winiemko. "Descoberta de perfis de uso de web services." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/83669.
Full textDuring the life cycle of a web service, several changes are made in its interface, which possibly are incompatible with regard to current usage and may break client applications. Providers must make decisions about changes on their services, most often without insight on the effect these changes will have over their customers. Existing research and tools fail to input provider with proper knowledge about the actual usage of the service interface’s features, considering the distinct types of customers, making it impossible to assess the actual impact of changes. This work presents a framework for the discovery of web service usage profiles, which constitute a descriptive model of the usage patterns found in distinct groups of clients, concerning the usage of service interface features. The framework supports a user in the process of knowledge discovery over service usage data through semi-automatic and configurable tasks, which assist the preparation and analysis of usage data with the minimum user intervention possible. The framework performs the monitoring of web services interactions, loads pre-processed usage data into a unified database, and supports the generation of usage profiles. Data mining techniques are used to group clients according to their usage patterns of features, and these groups are used to build service usage profiles. The entire process is configured via parameters, which allows the user to determine the level of detail of the usage information included in the profiles, and the criteria for evaluating the similarity between client applications. The proposal is validated through experiments with synthetic data, simulated according to features expected in the use of a real service. The experimental results demonstrate that the proposed framework allows the discovery of useful service usage profiles, and provide evidences about the proper parameterization of the framework.
Özakar, Belgin Püskülcü Halis. "Finding And Evaluating Patterns In Wes Repository Using Database Technology And Data Mining Algorithms/." [s.l.]: [s.n.], 2002. http://library.iyte.edu.tr/tezler/master/bilgisayaryazilimi/T000130.pdf.
Full textKarlsson, Sophie. "Datainsamling med Web Usage Mining : Lagringsstrategier för loggning av serverdata." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-9467.
Full textWeb applications complexity and the amount of advanced services increases. Logging activities can increase the understanding of users behavior and needs, but is used too much without relevant information. More advanced systems brings increased requirements for performance and logging becomes even more demanding for the systems. There is need of smarter systems, development within the techniques for performance improvements and techniques for data collection. This work will investigate how response times are affected when logging server data, according to the data collection phase in web usage mining, depending on storage strategies. The hypothesis is that logging may degrade response times even further. An experiment was conducted in which four different storage strategies are used to store server data with different table- and database structures, to see which strategy affects the response times least. The experiment proves statistically significant difference between the storage strategies with ANOVA. Storage strategy 4 proves the best effect for the performance average response time compared with storage strategy 2, which proves the most negative effect for the average response time. Future work would be interesting for strengthening the results.
Shun, Yeuk Kiu. "Web mining from client side user activity log /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?COMP%202002%20SHUN.
Full textIncludes bibliographical references (leaves 85-90). Also available in electronic version. Access restricted to campus users.
Wang, Hui. "Mining novel Web user behavior models for access prediction /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?COMP%202003%20WANG.
Full textIncludes bibliographical references (leaves 83-91). Also available in electronic version. Access restricted to campus users.
Zhao, Hongkun. "Automatic wrapper generation for the extraction of search result records from search engines." Diss., Online access via UMI:, 2007.
Find full textAgarwal, Khushbu. "A partition based approach to approximate tree mining a memory hierarchy perspective /." Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1196284256.
Full textNassopoulos, Georges. "Deducing Basic Graph Patterns from Logs of Linked Data Providers." Thesis, Nantes, 2017. http://www.theses.fr/2017NANT4110/document.
Full textFollowing the principles of Linked Data, data providers published billions of facts as RDF data. Executing SPARQL queries over SPARQL endpoints or Triple Pattern Fragments (TPF) servers allow to easily consume Linked Data. However, federated SPARQL query processing and TPF query processing decompose the initial query into subqueries. Consequently, the data providers only see subqueries and the initial query is only known by end users. Knowing executed SPARQL queries is fundamental for data providers, to ensure usage control, to optimize costs of query answering, to justify return of investment, to improve the user experience or to create business models of usage trends. In this thesis, we focus on analyzing execution logs of TPF servers and SPARQL endpoints to extract Basic Graph Patterns (BGP) of executed SPARQL queries. The main challenge to extract BGPs is the concurrent execution of SPARQL queries. We propose two algorithms: LIFT and FETA. LIFT extracts BGPs of executed queries from a single TPF server log. FETA extracts BGPs of federated queries from a log of a set of SPARQL endpoints. For experiments, we run LIFT and FETA on synthetic logs and real logs. LIFT and FETA are able to extract BGPs with good precision and recall under certain conditions
Khasawneh, Natheer Yousef. "Toward Better Website Usage: Leveraging Data Mining Techniques and Rough Set Learning to Construct Better-to-use Websites." Akron, OH : University of Akron, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=akron1120534472.
Full text"August, 2005." Title from electronic dissertation title page (viewed 01/14/2006) Advisor, John Durkin; Committee members, John Welch, James Grover, Yueh-Jaw Lin, Yingcai Xiao, Chien-Chung Chan; Department Chair, Alex Jose De Abreu-Garcia; Dean of the College, George Haritos; Dean of the Graduate School, George R. Newkome. Includes bibliographical references.
Persson, Pontus. "Identifying Early Usage Patterns That Increase User Retention Rates In A Mobile Web Browser." Thesis, Linköpings universitet, Databas och informationsteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-137793.
Full textNenadić, Oleg. "An implementation of correspondence analysis in R and its application in the analysis of web usage /." Göttingen : Cuvillier, 2007. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=016229974&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Full textCercós, Brownell Robert. "Diseño y Construcción de un Web Warehouse para Almacenar Información Extraída a Partir de Datos Originados en la Web." Tesis, Universidad de Chile, 2008. http://repositorio.uchile.cl/handle/2250/103076.
Full textNagi, Mohamad. "Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14200.
Full textKliegr, Tomáš. "Clickstream Analysis." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-2065.
Full textVillar, Escobar Osvaldo Pablo. "Minería y Personalización de un Sitio Web para Celulares." Tesis, Universidad de Chile, 2007. http://www.repositorio.uchile.cl/handle/2250/104823.
Full textHenriksson, William. "LOGGNING AV INTERAKTION MED DATAINSAMLINGSMETODER FÖR WEBBEVENTLOGGNINGSVERKTYG : Experiment om påverkan i svarstider vid loggning av interaktionsdata." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15324.
Full textShao, Da. "Usage of HTML5 as the basis for a multi-platform client solution." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-77814.
Full textGomes, João Fernando dos Anjos. "Recomendação de navegação em portais da internet como um serviço suportado em ferramentas Web Analytics." Master's thesis, Instituto Politécnico de Setúbal. Escola Superior de Ciências Empresariais, 2016. http://hdl.handle.net/10400.26/17292.
Full textCom o constante crescimento da utilização da Internet o número de websites e respetivas páginas contínua a evoluir também, por este motivo, verifica-se uma necessidade de alinhar a experiência de utilização com os objetivos gerais de um website. Para satisfazer esta necessidade o sistema de recomendação proposto sugere páginas ao utilizador que possam ser do seu interesse com base em perfis de navegação de um website em geral. A maioria dos sistemas de recomendação são baseados em regras de associação ou palavras chave (quando o conteúdo é considerado). No entanto, quando os dados não são suficientes ou são muito dispersos e a ordem é considerada, uma abordagem tradicional pode ser inadequada. Por outro lado, assumindo outro paradigma, a área de Web Analytics, tem obtido um crescimento considerável, através de ferramentas robustas que permitem a recolha e análise de dados da internet, a fim de compreender e otimizar eficiência e eficácia do website. O presente artigo propõe o desenvolvimento de um sistema de recomendação baseado na ferramenta Google Analytics. O protótipo é composto por dois componentes principais que são: 1) um serviço responsável pela construção e lógica associada à criação das recomendações; 2) uma biblioteca incorporável em qualquer website que providenciará um widget de recomendação configurável. Avaliações preliminares constataram que a implementação segue a lógica do modelo proposto.
As the Internet usage keeps increasing, the number of web sites and hence the number of web pages also keeps increasing, so there is a need to align the user experience with the overall websites purposes. Toward this requirement, the proposed recommendation systems suggest the user pages that might be of its interest based on past navigation profiles of overall site usage. Most of existing recommendation systems are based on association rules or based on keywords (when content is considered). However, on usage data shortage or sparse data and if sequential order is to be considered such traditional approaches may become unsuitable. Conversely, the Web Analytics arena, assuming other paradigm, has experienced a considerable growth through mature tools that allow the collection and analysis of internet data in order to understand and optimize website efficiency and efficacy. This work proposes the development of a recommendation system based on the Google Analytics tool. The prototype is constituted by two main components which are: 1) a service responsible for the construction and associated logic that underlies recommendations generation; 2) an embeddable library on any website that will furnish website with a configurable recommendation widget. Preliminary evaluations had showed that the implementation follows the logic of the proposed model.
Melhem, Hiba. "Usages et applications du web sémantique en bibliothèques numériques." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAL025/document.
Full textThis research work deals with the interdisciplinary field of the information and communication sciences (CIS) and aims to explore the use of the semantic web in digital libraries. The web requires libraries to rethink their organizations, activities, practices and services in order to reposition themselves as reference institutes for the dissemination of knowledge. In this thesis, we wish to understand the contexts of use of the semantic web in French digital libraries. It questions the contributions of the semantic web within these libraries, as well as on the challenges and the obstacles that accompany its implementation. We are also interested in documentary practices and their evolutions following the introduction of the semantic web in digital libraries. The problem is related to the role that information professionals can play in the implementation of the semantic web in digital libraries. After selecting 98 digital libraries following an analysis of three censuses, a questionnaire survey aims to collect data on the use of the semantic web in these libraries. Then, a second interview-based survey consists of highlighting the representations that the information professionals have of the semantic web and its use in the library, as well as on the evolution of their professional practices. The results show that the representation of knowledge within the semantic web requires human intervention to provide the conceptual framework to determine the links between the data. Finally, information professionals can become actors of the semantic web, in the sense that their roles are not limited to the use of the semantic web but also to the development of its standards to ensure better organization of knowledge
Tillemans, Stephen. "Development of an instrument for data collection in a multidimensional scaling study of personal Web usage in the South African workplace." Thesis, Stellenbosch : Stellenbosch University, 2011. http://hdl.handle.net/10019.1/21646.
Full textIn a relatively very short period the Internet has grown from being virtually unknown to becoming an essential business tool. Together with its many benefits, the Internet has unfortunately brought with it several new organisational challenges. One of these challenges is how to manage personal Web usage (PWU) in the workplace effectively. Although many managers see PWU as a form of workplace deviance, many researchers have pointed out its potential benefits such as learning, time-saving, employee well-being and a source of ideas. To help organisations manage PWU in the workplace more effectively, this research realised the need for a typology of PWU behaviours in the South African workplace. Multidimensional scaling (MDS) was identified as an objective method of creating such a typology. The objective of this research was therefore to develop an instrument to gather data for a multidimensional scaling study of PWU behaviours in the South African workplace. A questionnaire was designed that consists of three distinct sections. The first section contains seven pre-coded demographics questions that correspond with specific demographic variables, proven to have a relationship with PWU. The second section of the questionnaire is designed to gather dissimilarity data for input into an MDS algorithm. To begin with, 25 Web usage behaviours of South Africans were identified using Google Ad Planner. After weighing up various options of comparing the Web usage behaviours, the pairwise comparison method was selected. Ross sequencing was used to reduce positioning and timing effects. To reduce the number of judgements per participant, the 300 required judgments are split six ways, resulting in 50 judgements per participant. The last section of the questionnaire is designed to gather data to assist with interpreting the dimensions of the MDS configuration. Eight benefits and risks of PWU were identified. These are combined into a matrix together with the 25 Web usage behaviours. The data from this section will allow future research to use linear regression to discover the relationship between the Web usage behaviours (the objects), and the benefits and risks of PWU (the variables). It is believed that this design offers a fair compromise between the time and effort required of participants and the quality and integrity of the acquired data.
Johnsson, Daniel. "Creating and Evaluating a Useful Web Application for Introduction to Programming." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-172528.
Full textMair, Patrick, and Marcus Hudec. "Session Clustering Using Mixtures of Proportional Hazards Models." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2008. http://epub.wu.ac.at/598/1/document.pdf.
Full textSeries: Research Report Series / Department of Statistics and Mathematics
Kilic, Sefa. "Clustering Frequent Navigation Patterns From Website Logs Using Ontology And Temporal Information." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12613979/index.pdf.
Full textVlk, Vladimír. "Získávání znalostí z webových logů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236196.
Full textWindmiller, Sarah M. "Alternatives to smartphone applications for real-time information and technology usage among transit riders." Thesis, Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50369.
Full textChen, Xiaowei. "Measurement, analysis and improvement of BitTorrent Darknets." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1545.
Full textAmmari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns : the development and evaluation of new Web mining methods that enhance information retrieval and improve the understanding of users' Web behavior in websites and social blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.
Full textAmmari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.
Full textCalderón-Benavides, Liliana. "Unsupervised Identification of the User’s Query Intent in Web Search." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/51299.
Full textEste trabajo doctoral se enfoca en identificar y entender las intenciones que motivan a los usuarios a realizar búsquedas en la Web a través de la aplicación de métodos de aprendizaje automático que no requieren datos adicionales más que las necesidades de información de los mismos usuarios, representadas a través de sus consultas. El conocimiento y la interpretación de esta información, de valor incalculable, puede ayudar a los sistemas de búsqueda Web a encontrar recursos particularmente relevantes y así mejorar la satisfacción de sus usuarios. A través del uso de técnicas de aprendizaje no supervisado, las cuales han sido seleccionadas dependiendo del contexto del problema a solucionar, y cuyos resultados han demostrado ser efectivos para cada uno de los problemas planteados, a lo largo de este trabajo se muestra que no solo es posible identificar las intenciones de los usuarios, sino que este es un proceso que se puede llevar a cabo de manera automática. La investigación desarrollada en esta tesis ha implicado un proceso evolutivo, el cual inicia con el análisis de la clasificación manual de diferentes conjuntos de consultas que usuarios reales han sometido a un motor de búsqueda. El trabajo pasa a través de la proposición de una nueva clasificación de las intenciones de consulta de usuarios, y el uso de diferentes técnicas de aprendizaje no supervisado para identificar dichas intenciones, llegando hasta establecer que éste no es un problema unidimensional, sino que debería ser considerado como un problema de múltiples dimensiones, donde cada una de dichas dimensiones, o facetas, contribuye a clarificar y establecer cuál es la intención del usuario. A partir de este último trabajo, hemos creado un modelo para la identificar la intención del usuario en un escenario on–line.
Song, Ge. "Méthodes parallèles pour le traitement des flux de données continus." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC059/document.
Full textWe live in a world where a vast amount of data is being continuously generated. Data is coming in a variety of ways. For example, every time we do a search on Google, every time we purchase something on Amazon, every time we click a ‘like’ on Facebook, every time we upload an image on Instagram, every time a sensor is activated, etc., it will generate new data. Data is different than simple numerical information, it now comes in a variety of forms. However, isolated data is valueless. But when this huge amount of data is connected, it is very valuable to look for new insights. At the same time, data is time sensitive. The most accurate and effective way of describing data is to express it as a data stream. If the latest data is not promptly processed, the opportunity of having the most useful results will be missed.So a parallel and distributed system for processing large amount of data streams in real time has an important research value and a good application prospect. This thesis focuses on the study of parallel and continuous data stream Joins. We divide this problem into two categories. The first one is Data Driven Parallel and Continuous Join, and the second one is Query Driven Parallel and Continuous Join
Malherbe, Emmanuel. "Standardization of textual data for comprehensive job market analysis." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC058/document.
Full textWith so many job adverts and candidate profiles available online, the e-recruitment constitutes a rich object of study. All this information is however textual data, which from a computational point of view is unstructured. The large number and heterogeneity of recruitment websites also means that there is a lot of vocabularies and nomenclatures. One of the difficulties when dealing with this type of raw textual data is being able to grasp the concepts contained in it, which is the problem of standardization that is tackled in this thesis. The aim of standardization is to create a unified process providing values in a nomenclature. A nomenclature is by definition a finite set of meaningful concepts, which means that the attributes resulting from standardization are a structured representation of the information. Several questions are however raised: Are the websites' structured data usable for a unified standardization? What structure of nomenclature is the best suited for standardization, and how to leverage it? Is it possible to automatically build such a nomenclature from scratch, or to manage the standardization process without one? To illustrate the various obstacles of standardization, the examples we are going to study include the inference of the skills or the category of a job advert, or the level of training of a candidate profile. One of the challenges of e-recruitment is that the concepts are continuously evolving, which means that the standardization must be up-to-date with job market trends. In light of this, we will propose a set of machine learning models that require minimal supervision and can easily adapt to the evolution of the nomenclatures. The questions raised found partial answers using Case Based Reasoning, semi-supervised Learning-to-Rank, latent variable models, and leveraging the evolving sources of the semantic web and social media. The different models proposed have been tested on real-world data, before being implemented in a industrial environment. The resulting standardization is at the core of SmartSearch, a project which provides a comprehensive analysis of the job market
Castellanos-Paez, Sandra. "Apprentissage de routines pour la prise de décision séquentielle." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM043.
Full textIntuitively, a system capable of exploiting its past experiences should be able to achieve better performance. One way to build on past experiences is to learn macros (i.e. routines). They can then be used to improve the performance of the solving process of new problems. In automated planning, the challenge remains on developing powerful planning techniques capable of effectively explore the search space that grows exponentially. Learning macros from previously acquired knowledge has proven to be beneficial for improving a planner's performance. This thesis contributes mainly to the field of automated planning, and it is more specifically related to learning macros for classical planning. We focused on developing a domain-independent learning framework that identifies sequences of actions (even non-adjacent) from past solution plans and selects the most useful routines (i.e. macros), based on a priori evaluation, to enhance the planning domain.First, we studied the possibility of using sequential pattern mining for extracting frequent sequences of actions from past solution plans, and the link between the frequency of a macro and its utility. We found out that the frequency alone may not provide a consistent selection of useful macro-actions (i.e. sequences of actions with constant objects).Second, we discussed the problem of learning macro-operators (i.e. sequences of actions with variable objects) by using classic pattern mining algorithms in planning. Despite the efforts, we find ourselves in a dead-end with the selection process because the pattern mining filtering structures are not adapted to planning.Finally, we provided a novel approach called METEOR, which ensures to find the frequent sequences of operators from a set of plans without a loss of information about their characteristics. This framework was conceived for mining macro-operators from past solution plans, and for selecting the optimal set of macro-operators that maximises the node gain. It has proven to successfully mine macro-operators of different lengths for four different benchmarks domains and thanks to the selection phase, be able to deliver a positive impact on the search time without drastically decreasing the quality of the plans
Klinczak, Marjori Naiele Mocelin. "Identificação e propagação de temas em redes sociais." Universidade Tecnológica Federal do Paraná, 2016. http://repositorio.utfpr.edu.br/jspui/handle/1/2304.
Full textRecent years have been marked by the emergence of various social media, from Orkut to Facebook, and Twitter, Youtube, Google+ and many others: each offers new features as a way to attract more users. These social media generate a large amount of data which is processed properly can be used to identify trends, patterns and changes. The objective of this work is the discovery of the key topics in a social network, characterized as relevant terms groupings, restricted to a particular context and the study of its evolution over time. For that will be used procedures based on Data Mining and Text Processing. At first techniques are used preprocessing of texts in order to identify the most relevant terms that appear in the text messages from the social network. Next are used grouping of classical algorithms - k-means, k-medoids, DBSCAN - and the recent NMF (Non-negative Matrix Factorization), to identify the main themes of these messages, characterized as relevant terms groupings. The proposal was evaluated on the Twitter network, using bases tweets considering different contexts. The results show the feasibility of the proposal and its application in the identification of relevant topics of this social network
Nguyen, Hoang Viet Tuan. "Prise en compte de la qualité des données lors de l’extraction et de la sélection d’évolutions dans les séries temporelles de champs de déplacements en imagerie satellitaire." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAA011.
Full textThis PhD thesis deals with knowledge discovery from Displacement Field Time Series (DFTS) obtained by satellite imagery. Such series now occupy a central place in the study and monitoring of natural phenomena such as earthquakes, volcanic eruptions and glacier displacements. These series are indeed rich in both spatial and temporal information and can now be produced regularly at a lower cost thanks to spatial programs such as the European Copernicus program and its famous Sentinel satellites. Our proposals are based on the extraction of grouped frequent sequential patterns. These patterns, originally defined for the extraction of knowledge from Satellite Image Time Series (SITS), have shown their potential in early work to analyze a DFTS. Nevertheless, they cannot use the confidence indices coming along with DFTS and the swap method used to select the most promising patterns does not take into account their spatiotemporal complementarities, each pattern being evaluated individually. Our contribution is thus double. A first proposal aims to associate a measure of reliability with each pattern by using the confidence indices. This measure allows to select patterns having occurrences in the data that are on average sufficiently reliable. We propose a corresponding constraint-based extraction algorithm. It relies on an efficient search of the most reliable occurrences by dynamic programming and on a pruning of the search space provided by a partial push strategy. This new method has been implemented on the basis of the existing prototype SITS-P2miner, developed by the LISTIC and LIRIS laboratories to extract and rank grouped frequent sequential patterns. A second contribution for the selection of the most promising patterns is also made. This one, based on an informational criterion, makes it possible to take into account at the same time the confidence indices and the way the patterns complement each other spatially and temporally. For this aim, the confidence indices are interpreted as probabilities, and the DFTS are seen as probabilistic databases whose distributions are only partial. The informational gain associated with a pattern is then defined according to the ability of its occurrences to complete/refine the distributions characterizing the data. On this basis, a heuristic is proposed to select informative and complementary patterns. This method provides a set of weakly redundant patterns and therefore easier to interpret than those provided by swap randomization. It has been implemented in a dedicated prototype. Both proposals are evaluated quantitatively and qualitatively using a reference DFTS covering Greenland glaciers constructed from Landsat optical data. Another DFTS that we built from TerraSAR-X radar data covering the Mont-Blanc massif is also used. In addition to being constructed from different data and remote sensing techniques, these series differ drastically in terms of confidence indices, the series covering the Mont-Blanc massif being at very low levels of confidence. In both cases, the proposed methods operate under standard conditions of resource consumption (time, space), and experts’ knowledge of the studied areas is confirmed and completed
Braik, William. "Détection d'évènements complexes dans les flux d'évènements massifs." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0596/document.
Full textPattern detection over streams of events is gaining more and more attention, especially in the field of eCommerce. Our industrial partner Cdiscount, which is one of the largest eCommerce companies in France, aims to use pattern detection for real-time customer behavior analysis. The main challenges to consider are efficiency and scalability, as the detection of customer behaviors must be achieved within a few seconds, while millions of unique customers visit the website every day,thus producing a large event stream. In this thesis, we present Auros, a system for large-scale an defficient pattern detection for eCommerce. It relies on a domain-specific language to define behavior patterns. Patterns are then compiled into deterministic finite automata, which are run on a BigData streaming platform. Our evaluation shows that our approach is efficient and scalable, and fits the requirements of Cdiscount
Aleksandrova, Marharyta. "Factorisation de matrices et analyse de contraste pour la recommandation." Thesis, Université de Lorraine, 2017. http://www.theses.fr/2017LORR0080/document.
Full textIn many application areas, data elements can be high-dimensional. This raises the problem of dimensionality reduction. The dimensionality reduction techniques can be classified based on their aim: dimensionality reduction for optimal data representation and dimensionality reduction for classification, as well as based on the adopted strategy: feature selection and feature extraction. The set of features resulting from feature extraction methods is usually uninterpretable. Thereby, the first scientific problematic of the thesis is how to extract interpretable latent features? The dimensionality reduction for classification aims to enhance the classification power of the selected subset of features. We see the development of the task of classification as the task of trigger factors identification that is identification of those factors that can influence the transfer of data elements from one class to another. The second scientific problematic of this thesis is how to automatically identify these trigger factors? We aim at solving both scientific problematics within the recommender systems application domain. We propose to interpret latent features for the matrix factorization-based recommender systems as real users. We design an algorithm for automatic identification of trigger factors based on the concepts of contrast analysis. Through experimental results, we show that the defined patterns indeed can be considered as trigger factors
Valentin, Jérémie. "Usages géographiques du cyberespace : nouvelle appropriation de l'espace et l'essor d'une "néogéographie"." Thesis, Montpellier 3, 2010. http://www.theses.fr/2010MON30049.
Full textThis research proposes to analyze the impacts and challenges of an omnipresent geographical cyberspace. Spurred on by web 2.0 and that of virtual globes (Google Earth, Virtual Earth, World Wind), the production and diffusion of geographical knowledge undergo further transformations. Virtual spaces and other location-based services (LBS) are gradually replacing the paper map and tourist guide. These uses contribute to the emergence of a complex space where uses in real space and uses in the virtual space mingle. Meanwhile, production of geographical interest results outside areas which, until recently, were the initiators and traditional users: universities, research organizations, professional geographers, states, NGOs, military ... This thesis will enlighten the reader on the geographical reality of the (new) uses of cyberspace, whether related to the production of "amateur" geographical content (neogeography) or to consumption "augmented" of geographical space
Dash, A., and L. R. George. "Web Usage Mining: An Implementation." Thesis, 2010. http://ethesis.nitrkl.ac.in/1689/1/Thesis.pdf.
Full textBhalla, Karan, and Deepak Prasad. "Data preperation and pattern discovery for web usage mining." Thesis, 2007. http://ethesis.nitrkl.ac.in/4208/1/DATA_PREPERATION_AND_PATTERN.pdf.
Full textGOEL, VIVEK. "EFFICIENT ALGORITHM FOR FREQUENT PATTERN MINING AND IT’S APPLICATION IN PREDICTING PATTERN IN WEB USAGE DATA." Thesis, 2012. http://dspace.dtu.ac.in:8080/jspui/handle/repository/13930.
Full textFrequent Pattern Mining, the task of finding sets of items that frequently occur together in a dataset, has been at the core of the field of data mining for the past many years. With the tremendous growth of data, users are expecting more relevant and sophisticated information which may be lying hidden in the data. Data mining is often described as a discipline to find hidden information in a database. It involves different techniques and algorithms to discover useful knowledge lying hidden in the data. In this thesis, we propose an efficient algorithm for finding the frequent patterns which is the extension of IP tree algorithm. Also we prove its effectiveness over various previous algorithms like Aprioi, FPGrowth, CATS tree, Can tree. Apriori is the first popular algorithm for frequent patterns but it makes use of multiple database scan that make it inefficient for large database. To improve the drawback of Apriori algorithm, prefix-tree based algorithms have become popular. However most of the prefix tree based algorithms still suffer with either more execution time or take more memory. For e.g. FP Growth algorithm still requires two database scans and Can tree takes large memory. Our proposed algorithm constructs a FP tree like compact tree structure only for the frequent items in the database with only one database scan. It firstly store transactions in a lexicographic order tree and then restructured the tree by sorting the frequent items in a frequency-descending order and prune the infrequent items from each path. We evaluate the performance of the algorithm using both synthetic and real datasets, and the results show that the proposed algorithm is much more time efficient and take less memory than the previous algorithms.
"Improving opinion mining with feature-opinion association and human computation." 2009. http://library.cuhk.edu.hk/record=b5894009.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2009.
Includes bibliographical references (leaves [101]-113).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Major Topic --- p.1
Chapter 1.1.1 --- Opinion Mining --- p.1
Chapter 1.1.2 --- Human Computation --- p.2
Chapter 1.2 --- Major Work and Contributions --- p.3
Chapter 1.3 --- Thesis Outline --- p.4
Chapter 2 --- Literature Review --- p.6
Chapter 2.1 --- Opinion Mining --- p.6
Chapter 2.1.1 --- Feature Extraction --- p.6
Chapter 2.1.2 --- Sentiment Analysis --- p.9
Chapter 2.2 --- Social Computing --- p.15
Chapter 2.2.1 --- Social Bookmarking --- p.15
Chapter 2.2.2 --- Social Games --- p.18
Chapter 3 --- Feature-Opinion Association for Sentiment Analysis --- p.25
Chapter 3.1 --- Motivation --- p.25
Chapter 3.2 --- Problem Definition --- p.27
Chapter 3.2.1 --- Definitions --- p.27
Chapter 3.3 --- Closer look at the problem --- p.28
Chapter 3.3.1 --- Discussion --- p.29
Chapter 3.4 --- Proposed Approach --- p.29
Chapter 3.4.1 --- Nearest Opinion Word (DIST) --- p.31
Chapter 3.4.2 --- Co-Occurrence Frequency (COF) --- p.31
Chapter 3.4.3 --- Co-Occurrence Ratio (COR) --- p.32
Chapter 3.4.4 --- Likelihood-Ratio Test (LHR) --- p.32
Chapter 3.4.5 --- Combined Method --- p.34
Chapter 3.4.6 --- Feature-Opinion Association Algorithm --- p.35
Chapter 3.4.7 --- Sentiment Lexicon Expansion --- p.36
Chapter 3.5 --- Evaluation --- p.37
Chapter 3.5.1 --- Corpus Data Set --- p.37
Chapter 3.5.2 --- Test Data set --- p.37
Chapter 3.5.3 --- Feature-Opinion Association Accuracy --- p.38
Chapter 3.6 --- Summary --- p.45
Chapter 4 --- Social Game for Opinion Mining --- p.46
Chapter 4.1 --- Motivation --- p.46
Chapter 4.2 --- Social Game Model --- p.47
Chapter 4.2.1 --- Definitions --- p.48
Chapter 4.2.2 --- Social Game Problem --- p.51
Chapter 4.2.3 --- Social Game Flow --- p.51
Chapter 4.2.4 --- Answer Extraction Procedure --- p.52
Chapter 4.3 --- Social Game Properties --- p.53
Chapter 4.3.1 --- Type of Information --- p.53
Chapter 4.3.2 --- Game Structure --- p.55
Chapter 4.3.3 --- Verification Method --- p.59
Chapter 4.3.4 --- Game Mechanism --- p.60
Chapter 4.3.5 --- Player Requirement --- p.62
Chapter 4.4 --- Design Guideline --- p.63
Chapter 4.5 --- Opinion Mining Game Design --- p.65
Chapter 4.5.1 --- OpinionMatch --- p.65
Chapter 4.5.2 --- FeatureGuess --- p.68
Chapter 4.6 --- Summary --- p.71
Chapter 5 --- Tag Sentiment Analysis for Social Bookmark Recommendation System --- p.72
Chapter 5.1 --- Motivation --- p.72
Chapter 5.2 --- Problem Statement --- p.74
Chapter 5.2.1 --- Social Bookmarking Model --- p.74
Chapter 5.2.2 --- Social Bookmark Recommendation (SBR) Problem --- p.75
Chapter 5.3 --- Proposed Approach --- p.75
Chapter 5.3.1 --- Social Bookmark Recommendation Framework --- p.75
Chapter 5.3.2 --- Subjective Tag Detection (STD) --- p.77
Chapter 5.3.3 --- Similarity Matrices --- p.80
Chapter 5.3.4 --- User-Website matrix: --- p.81
Chapter 5.3.5 --- User-Tag matrix --- p.81
Chapter 5.3.6 --- Website-Tag matrix --- p.82
Chapter 5.4 --- Pearson Correlation Coefficient --- p.82
Chapter 5.5 --- Social Network-based User Similarity --- p.83
Chapter 5.6 --- User-oriented Website Ranking --- p.85
Chapter 5.7 --- Evaluation --- p.87
Chapter 5.7.1 --- Bookmark Data --- p.87
Chapter 5.7.2 --- Social Network --- p.87
Chapter 5.7.3 --- Subjective Tag List --- p.87
Chapter 5.7.4 --- Subjective Tag Detection --- p.88
Chapter 5.7.5 --- Bookmark Recommendation Quality --- p.90
Chapter 5.7.6 --- System Evaluation --- p.91
Chapter 5.8 --- Summary --- p.93
Chapter 6 --- Conclusion and Future Work --- p.94
Chapter A --- List of Symbols and Notations --- p.97
Chapter B --- List of Publications --- p.100
Bibliography --- p.101
Aguiar, Daniel José Gomes. "IP network usage accounting: parte I." Master's thesis, 2015. http://hdl.handle.net/10400.13/1499.
Full textAn Internet Service Provider (ISP) is responsible for the management of networks made up of thousands of customers, where there must be strict control of bandwidth and avoid congestion of it. For that ISPs need systems capable of analyzing traffic data and assign the Quality of Service (QoS) suitable in reliance thereon. NOS Madeira is the leading ISP in Madeira, with thousands of customers throughout the region. The existing bandwidth control system in this company was obsolete and led to the need to create a new bandwidth control system. This new system, called IP Network Usage Accounting, consists of three subsystems: IP Mapping System, Accounting System and Policy Server System. This report talks about the design, implementation and testing of the first subsystem, IP Mapping System. The IP Mapping System is responsible for performing the collection of traffic data held by customers of NOS Madeira and provide them to the second subsystem (Accounting System). This, in turn, performs an analysis of these data and sends the results to the third subsystem (Policy Server System) applying QoS corresponding to each of the clients IP.
Oosthuizen, Ockmer Louren. "A multi-agent collaborative personalized web mining system model." Thesis, 2008. http://hdl.handle.net/10210/508.
Full textProf. E.M. Ehlers