Дисертації: "WEB USAGE DATA"

1

Winblad, Emanuel. "Visualization of web site visit and usage data." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-110576.

Повний текст джерела

Анотація:

This report documents the work and results of a master’s thesis in Media Tech- nology that has been carried out at the Department of Science and Technology at Linköping University with the support of Sports Editing Sweden AB (SES). Its aim is to create a solution which aids the users of SES’ web CMS products in gaining insight into web site visit and usage statistics. The resulting solu- tion is the concept and initial version of a web based service. This service has been developed through an agile process with user centered design in mind and provides a graphical user interface which makes high use of visualizations to achieve the project goal.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Khalil, Faten. "Combining web data mining techniques for web page access prediction." University of Southern Queensland, Faculty of Sciences, 2008. http://eprints.usq.edu.au/archive/00004341/.

Повний текст джерела

Анотація:

[Abstract]: Web page access prediction gained its importance from the ever increasing number of e-commerce Web information systems and e-businesses. Web page prediction, that involves personalising the Web users’ browsing experiences, assists Web masters in the improvement of the Web site structure and helps Web users in navigating the site and accessing the information they need. The most widely used approach for this purpose is the pattern discovery process of Web usage mining that entails many techniques like Markov model, association rules and clustering. Implementing pattern discovery techniques as such helps predict the next page tobe accessed by theWeb user based on the user’s previous browsing patterns. However, each of the aforementioned techniques has its own limitations, especiallywhen it comes to accuracy and space complexity. This dissertation achieves better accuracy as well as less state space complexity and rules generated by performingthe following combinations. First, we combine low-order Markov model and association rules. Markov model analysis are performed on the data sets. If the Markov model prediction results in a tie or no state, association rules are used for prediction. The outcome of this integration is better accuracy, less Markov model state space complexity and less number of generated rules than using each of the methods individually. Second, we integrate low-order Markov model and clustering. The data sets are clustered and Markov model analysis are performed oneach cluster instead of the whole data sets. The outcome of the integration is better accuracy than the first combination with less state space complexity than higherorder Markov model. The last integration model involves combining all three techniques together: clustering, association rules and low-order Markov model. The data sets are clustered and Markov model analysis are performed on each cluster. If the Markov model prediction results in close accuracies for the same item, association rules are used for prediction. This integration model achievesbetter Web page access prediction accuracy, less Markov model state space complexity and less number of rules generated than the previous two models.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Bayir, Murat Ali. "A New Reactive Method For Processing Web Usage Data." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12607323/index.pdf.

Повний текст джерела

Анотація:

In this thesis, a new reactive session reconstruction method '
Smart-SRA'
is introduced. Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigations of Web users. As in classical data mining, data processing and pattern discovery are the main issues in web usage mining. The first phase of the web usage mining is the data processing phase including session reconstruction. Session reconstruction is the most important task of web usage mining since it directly affects the quality of the extracted frequent patterns at the final step, significantly. Session reconstruction methods can be classified into two categories, namely '
reactive'
and '
proactive'
with respect to the data source and the data processing time. If the user requests are processed after the server handles them, this technique is called as &lsquo
reactive&rsquo
, while in &lsquo
proactive&rsquo
strategies this processing occurs during the interactive browsing of the web site. Smart-SRA is a reactive session reconstruction techique, which uses web log data and the site topology. In order to compare Smart-SRA with previous reactive methods, a web agent simulator has been developed. Our agent simulator models behavior of web users and generates web user navigations as well as the log data kept by the web server. In this way, the actual user sessions will be known and the successes of different techniques can be compared. In this thesis, it is shown that the sessions generated by Smart-SRA are more accurate than the sessions constructed by previous heuristics.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Wu, Hao-cun, and 吳浩存. "A multidimensional data model for monitoring web usage and optimizing website topology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B29528215.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Wang, Long. "X-tracking the usage interest on web sites." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2011/5107/.

Повний текст джерела

Анотація:

The exponential expanding of the numbers of web sites and Internet users makes WWW the most important global information resource. From information publishing and electronic commerce to entertainment and social networking, the Web allows an inexpensive and efficient access to the services provided by individuals and institutions. The basic units for distributing these services are the web sites scattered throughout the world. However, the extreme fragility of web services and content, the high competence between similar services supplied by different sites, and the wide geographic distributions of the web users drive the urgent requirement from the web managers to track and understand the usage interest of their web customers. This thesis, "X-tracking the Usage Interest on Web Sites", aims to fulfill this requirement. "X" stands two meanings: one is that the usage interest differs from various web sites, and the other is that usage interest is depicted from multi aspects: internal and external, structural and conceptual, objective and subjective. "Tracking" shows that our concentration is on locating and measuring the differences and changes among usage patterns. This thesis presents the methodologies on discovering usage interest on three kinds of web sites: the public information portal site, e-learning site that provides kinds of streaming lectures and social site that supplies the public discussions on IT issues. On different sites, we concentrate on different issues related with mining usage interest. The educational information portal sites were the first implementation scenarios on discovering usage patterns and optimizing the organization of web services. In such cases, the usage patterns are modeled as frequent page sets, navigation paths, navigation structures or graphs. However, a necessary requirement is to rebuild the individual behaviors from usage history. We give a systematic study on how to rebuild individual behaviors. Besides, this thesis shows a new strategy on building content clusters based on pair browsing retrieved from usage logs. The difference between such clusters and the original web structure displays the distance between the destinations from usage side and the expectations from design side. Moreover, we study the problem on tracking the changes of usage patterns in their life cycles. The changes are described from internal side integrating conceptual and structure features, and from external side for the physical features; and described from local side measuring the difference between two time spans, and global side showing the change tendency along the life cycle. A platform, Web-Cares, is developed to discover the usage interest, to measure the difference between usage interest and site expectation and to track the changes of usage patterns. E-learning site provides the teaching materials such as slides, recorded lecture videos and exercise sheets. We focus on discovering the learning interest on streaming lectures, such as real medias, mp4 and flash clips. Compared to the information portal site, the usage on streaming lectures encapsulates the variables such as viewing time and actions during learning processes. The learning interest is discovered in the form of answering 6 questions, which covers finding the relations between pieces of lectures and the preference among different forms of lectures. We prefer on detecting the changes of learning interest on the same course from different semesters. The differences on the content and structure between two courses leverage the changes on the learning interest. We give an algorithm on measuring the difference on learning interest integrated with similarity comparison between courses. A search engine, TASK-Moniminer, is created to help the teacher query the learning interest on their streaming lectures on tele-TASK site. Social site acts as an online community attracting web users to discuss the common topics and share their interesting information. Compared to the public information portal site and e-learning web site, the rich interactions among users and web content bring the wider range of content quality, on the other hand, provide more possibilities to express and model usage interest. We propose a framework on finding and recommending high reputation articles in a social site. We observed that the reputation is classified into global and local categories; the quality of the articles having high reputation is related with the content features. Based on these observations, our framework is implemented firstly by finding the articles having global or local reputation, and secondly clustering articles based on their content relations, and then the articles are selected and recommended from each cluster based on their reputation ranks.
Wegen des exponentiellen Ansteigens der Anzahl an Internet-Nutzern und Websites ist das WWW (World Wide Web) die wichtigste globale Informationsressource geworden. Das Web bietet verschiedene Dienste (z. B. Informationsveröffentlichung, Electronic Commerce, Entertainment oder Social Networking) zum kostengünstigen und effizienten erlaubten Zugriff an, die von Einzelpersonen und Institutionen zur Verfügung gestellt werden. Um solche Dienste anzubieten, werden weltweite, vereinzelte Websites als Basiseinheiten definiert. Aber die extreme Fragilität der Web-Services und -inhalte, die hohe Kompetenz zwischen ähnlichen Diensten für verschiedene Sites bzw. die breite geographische Verteilung der Web-Nutzer treiben einen dringenden Bedarf für Web-Manager und das Verfolgen und Verstehen der Nutzungsinteresse ihrer Web-Kunden. Die Arbeit zielt darauf ab, dass die Anforderung "X-tracking the Usage Interest on Web Sites" erfüllt wird. "X" hat zwei Bedeutungen. Die erste Bedeutung ist, dass das Nutzungsinteresse von verschiedenen Websites sich unterscheidet. Außerdem stellt die zweite Bedeutung dar, dass das Nutzungsinteresse durch verschiedene Aspekte (interne und externe, strukturelle und konzeptionelle) beschrieben wird. Tracking zeigt, dass die Änderungen zwischen Nutzungsmustern festgelegt und gemessen werden. Die Arbeit eine Methodologie dar, um das Nutzungsinteresse gekoppelt an drei Arten von Websites (Public Informationsportal-Website, E-Learning-Website und Social-Website) zu finden. Wir konzentrieren uns auf unterschiedliche Themen im Bezug auf verschieden Sites, die mit Usage-Interest-Mining eng verbunden werden. Education Informationsportal-Website ist das erste Implementierungsscenario für Web-Usage-Mining. Durch das Scenario können Nutzungsmuster gefunden und die Organisation von Web-Services optimiert werden. In solchen Fällen wird das Nutzungsmuster als häufige Pagemenge, Navigation-Wege, -Strukturen oder -Graphen modelliert. Eine notwendige Voraussetzung ist jedoch, dass man individuelle Verhaltensmuster aus dem Verlauf der Nutzung (Usage History) wieder aufbauen muss. Deshalb geben wir in dieser Arbeit eine systematische Studie zum Nachempfinden der individuellen Verhaltensweisen. Außerdem zeigt die Arbeit eine neue Strategie, dass auf Page-Paaren basierten Content-Clustering aus Nutzungssite aufgebaut werden. Der Unterschied zwischen solchen Clustern und der originalen Webstruktur ist der Abstand zwischen Zielen der Nutzungssite und Erwartungen der Designsite. Darüber hinaus erforschen wir Probleme beim Tracking der Änderungen von Nutzungsmustern in ihrem Lebenszyklus. Die Änderungen werden durch mehrere Aspekte beschrieben. Für internen Aspekt werden konzeptionelle Strukturen und Funktionen integriert. Der externe Aspekt beschreibt physische Eigenschaften. Für lokalen Aspekt wird die Differenz zwischen zwei Zeitspannen gemessen. Der globale Aspekt zeigt Tendenzen der Änderung entlang des Lebenszyklus. Eine Plattform "Web-Cares" wird entwickelt, die die Nutzungsinteressen findet, Unterschiede zwischen Nutzungsinteresse und Website messen bzw. die Änderungen von Nutzungsmustern verfolgen kann. E-Learning-Websites bieten Lernmaterialien wie z.B. Folien, erfaßte Video-Vorlesungen und Übungsblätter an. Wir konzentrieren uns auf die Erfoschung des Lerninteresses auf Streaming-Vorlesungen z.B. Real-Media, mp4 und Flash-Clips. Im Vergleich zum Informationsportal Website kapselt die Nutzung auf Streaming-Vorlesungen die Variablen wie Schauzeit und Schautätigkeiten während der Lernprozesse. Das Lerninteresse wird erfasst, wenn wir Antworten zu sechs Fragen gehandelt haben. Diese Fragen umfassen verschiedene Themen, wie Erforschung der Relation zwischen Teilen von Lehrveranstaltungen oder die Präferenz zwischen den verschiedenen Formen der Lehrveranstaltungen. Wir bevorzugen die Aufdeckung der Veränderungen des Lerninteresses anhand der gleichen Kurse aus verschiedenen Semestern. Der Differenz auf den Inhalt und die Struktur zwischen zwei Kurse beeinflusst die Änderungen auf das Lerninteresse. Ein Algorithmus misst die Differenz des Lerninteresses im Bezug auf einen Ähnlichkeitsvergleich zwischen den Kursen. Die Suchmaschine „Task-Moniminer“ wird entwickelt, dass die Lehrkräfte das Lerninteresse für ihre Streaming-Vorlesungen über das Videoportal tele-TASK abrufen können. Social Websites dienen als eine Online-Community, in den teilnehmenden Web-Benutzern die gemeinsamen Themen diskutieren und ihre interessanten Informationen miteinander teilen. Im Vergleich zur Public Informationsportal-Website und E-Learning Website bietet diese Art von Website reichhaltige Interaktionen zwischen Benutzern und Inhalten an, die die breitere Auswahl der inhaltlichen Qualität bringen. Allerdings bietet eine Social-Website mehr Möglichkeiten zur Modellierung des Nutzungsinteresses an. Wir schlagen ein Rahmensystem vor, die hohe Reputation für Artikel in eine Social-Website empfiehlt. Unsere Beobachtungen sind, dass die Reputation in globalen und lokalen Kategorien klassifiziert wird. Außerdem wird die Qualität von Artikeln mit hoher Reputation mit den Content-Funktionen in Zusammenhang stehen. Durch die folgenden Schritte wird das Rahmensystem im Bezug auf die Überwachungen implementiert. Der erste Schritt ist, dass man die Artikel mit globalen oder lokalen Reputation findet. Danach werden Artikel im Bezug auf ihre Content-Relationen in jeder Kategorie gesammelt. Zum Schluß werden die ausgewählten Artikel aus jedem basierend auf ihren Reputation-Ranking Cluster empfohlen.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Norguet, Jean-Pierre. "Semantic analysis in web usage mining." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210890.

Повний текст джерела

Анотація:

With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization.

Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.

Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results.

An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics.

In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites.

According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics.

Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Luczak-Rösch, Markus [Verfasser]. "Usage-dependent maintenance of structured Web data sets / Markus Luczak-Rösch." Berlin : Freie Universität Berlin, 2014. http://d-nb.info/1068253827/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Vollino, Bruno Winiemko. "Descoberta de perfis de uso de web services." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/83669.

Повний текст джерела

Анотація:

Durante o ciclo de vida de um web service, diversas mudanças são feitas na sua interface, eventualmente causando incompatibilidades em relação aos seus clientes e ocasionando a quebra de suas aplicações. Os provedores precisam tomar decisões sobre mudanças em seus serviços frequentemente, muitas vezes sem um bom entendimento a respeito do efeito destas mudanças sobre seus clientes. Os trabalhos e ferramentas existentes não fornecem ao provedor um conhecimento adequado a respeito do uso real das funcionalidades da interface de um serviço, considerando os diferentes tipos de consumidores, o que impossibilita avaliar o impacto das mudanças. Este trabalho apresenta um framework para a descoberta de perfis de uso de serviços web, os quais constituem um modelo descritivo dos padrões de uso dos diferentes grupos de clientes do serviço, com relação ao uso das funcionalidades em sua interface. O framework auxilia no processo de descoberta de conhecimento através de tarefas semiautomáticas e parametrizáveis para a preparação e análise de dados de uso, minimizando a necessidade de intervenção do usuário. O framework engloba o monitoramento de interações de web services, a carga de dados de uso pré-processados em uma base de dados unificada, e a geração de perfis de uso. Técnicas de mineração de dados são utilizadas para agrupar clientes de acordo com seus padrões de uso de funcionalidades, e esses grupos são utilizados na construção de perfis de uso de serviços. Todo o processo é configurado através de parâmetros, permitindo que o usuário determine o nível de detalhe das informações sobre o uso incluídas nos perfis e os critérios para avaliar a similaridade entre clientes. A proposta é validada por meio de experimentos com dados sintéticos, simulados de acordo com características esperadas no comportamento de clientes de um serviço real. Os resultados dos experimentos demonstram que o framework proposto permite a descoberta de perfis de uso de serviço úteis, e fornecem evidências a respeito da parametrização adequada do framework.
During the life cycle of a web service, several changes are made in its interface, which possibly are incompatible with regard to current usage and may break client applications. Providers must make decisions about changes on their services, most often without insight on the effect these changes will have over their customers. Existing research and tools fail to input provider with proper knowledge about the actual usage of the service interface’s features, considering the distinct types of customers, making it impossible to assess the actual impact of changes. This work presents a framework for the discovery of web service usage profiles, which constitute a descriptive model of the usage patterns found in distinct groups of clients, concerning the usage of service interface features. The framework supports a user in the process of knowledge discovery over service usage data through semi-automatic and configurable tasks, which assist the preparation and analysis of usage data with the minimum user intervention possible. The framework performs the monitoring of web services interactions, loads pre-processed usage data into a unified database, and supports the generation of usage profiles. Data mining techniques are used to group clients according to their usage patterns of features, and these groups are used to build service usage profiles. The entire process is configured via parameters, which allows the user to determine the level of detail of the usage information included in the profiles, and the criteria for evaluating the similarity between client applications. The proposal is validated through experiments with synthetic data, simulated according to features expected in the use of a real service. The experimental results demonstrate that the proposed framework allows the discovery of useful service usage profiles, and provide evidences about the proper parameterization of the framework.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Özakar, Belgin Püskülcü Halis. "Finding And Evaluating Patterns In Wes Repository Using Database Technology And Data Mining Algorithms/." [s.l.]: [s.n.], 2002. http://library.iyte.edu.tr/tezler/master/bilgisayaryazilimi/T000130.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Karlsson, Sophie. "Datainsamling med Web Usage Mining : Lagringsstrategier för loggning av serverdata." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-9467.

Повний текст джерела

Анотація:

Webbapplikationers komplexitet och mängden avancerade tjänster ökar. Loggning av aktiviteter kan öka förståelsen över användares beteenden och behov, men används i för stor mängd utan relevant information. Mer avancerade system medför ökade krav för prestandan och loggning blir än mer krävande för systemen. Det finns behov av smartare system, utveckling inom tekniker för prestandaförbättringar och tekniker för datainsamling. Arbetet kommer undersöka hur svarstider påverkas vid loggning av serverdata, enligt datainsamlingsfasen i web usage mining, beroende på lagringsstrategier. Hypotesen är att loggning kan försämra svarstider ytterligare. Experiment genomförs där fyra olika lagringsstrategier används för att lagra serverdata med olika tabell- och databasstrukturer, för att se vilken strategi som påverkar svarstiderna minst. Experimentet påvisar statistiskt signifikant skillnad mellan lagringsstrategierna enligt ANOVA. Lagringsstrategi 4 påvisar bäst effekt för prestandans genomsnittliga svarstid, jämfört med lagringsstrategi 2 som påvisar mest negativ effekt för den genomsnittliga svarstiden. Framtida arbete vore intressant för att stärka resultaten.
Web applications complexity and the amount of advanced services increases. Logging activities can increase the understanding of users behavior and needs, but is used too much without relevant information. More advanced systems brings increased requirements for performance and logging becomes even more demanding for the systems. There is need of smarter systems, development within the techniques for performance improvements and techniques for data collection. This work will investigate how response times are affected when logging server data, according to the data collection phase in web usage mining, depending on storage strategies. The hypothesis is that logging may degrade response times even further. An experiment was conducted in which four different storage strategies are used to store server data with different table- and database structures, to see which strategy affects the response times least. The experiment proves statistically significant difference between the storage strategies with ANOVA. Storage strategy 4 proves the best effect for the performance average response time compared with storage strategy 2, which proves the most negative effect for the average response time. Future work would be interesting for strengthening the results.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Shun, Yeuk Kiu. "Web mining from client side user activity log /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?COMP%202002%20SHUN.

Повний текст джерела

Анотація:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 85-90). Also available in electronic version. Access restricted to campus users.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Wang, Hui. "Mining novel Web user behavior models for access prediction /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?COMP%202003%20WANG.

Повний текст джерела

Анотація:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 83-91). Also available in electronic version. Access restricted to campus users.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Zhao, Hongkun. "Automatic wrapper generation for the extraction of search result records from search engines." Diss., Online access via UMI:, 2007.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Agarwal, Khushbu. "A partition based approach to approximate tree mining a memory hierarchy perspective /." Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1196284256.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Nassopoulos, Georges. "Deducing Basic Graph Patterns from Logs of Linked Data Providers." Thesis, Nantes, 2017. http://www.theses.fr/2017NANT4110/document.

Повний текст джерела

Анотація:

Conformément aux principes de Linked Data, les fournisseurs de données ont publié des milliards de faits en tant que données RDF. Exécuter les requêtes SPARQL sur les endpoints SPARQL ou les serveurs Triple Pattern Fragments (TPF) permet de consommer facilement des données du Linked Data. Cependant, le traitement des requêtes SPARQL fédérées, tout comme le traitement des requêtes TPF, décompose la requête initiale en de nombreuses sous-requêtes. Les fournisseurs de données ne voient alors que les sous-requêtes et la requête initiale n’est connue que des utilisateurs finaux. La connaissance des requêtes exécutées est fondamentale pour les fournisseurs, afin d’assurer un contrôle de l’utilisation des données, d’optimiser le coût des réponses aux requêtes, de justifier un retour sur investissements, d’améliorer l’expérience utilisateur ou de créer des modèles commerciaux à partir de tendances d’utilisation. Dans cette thèse, nous nous concentrons sur l’analyse des logs d’exécution des serveurs TPF et des endpoints SPARQL pour extraire les Basic Graph Patterns (BGP) des requêtes SPARQL exécutées. Le principal défi pour l’extraction des BGPs est l’exécution simultanée des requêtes SPARQL. Nous proposons deux algorithmes : LIFT et FETA. Sous certaines conditions, nous constatons que LIFT et FETA sont capables d’extraire des BGPs avec une bonne précision et un bon rappel
Following the principles of Linked Data, data providers published billions of facts as RDF data. Executing SPARQL queries over SPARQL endpoints or Triple Pattern Fragments (TPF) servers allow to easily consume Linked Data. However, federated SPARQL query processing and TPF query processing decompose the initial query into subqueries. Consequently, the data providers only see subqueries and the initial query is only known by end users. Knowing executed SPARQL queries is fundamental for data providers, to ensure usage control, to optimize costs of query answering, to justify return of investment, to improve the user experience or to create business models of usage trends. In this thesis, we focus on analyzing execution logs of TPF servers and SPARQL endpoints to extract Basic Graph Patterns (BGP) of executed SPARQL queries. The main challenge to extract BGPs is the concurrent execution of SPARQL queries. We propose two algorithms: LIFT and FETA. LIFT extracts BGPs of executed queries from a single TPF server log. FETA extracts BGPs of federated queries from a log of a set of SPARQL endpoints. For experiments, we run LIFT and FETA on synthetic logs and real logs. LIFT and FETA are able to extract BGPs with good precision and recall under certain conditions

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Khasawneh, Natheer Yousef. "Toward Better Website Usage: Leveraging Data Mining Techniques and Rough Set Learning to Construct Better-to-use Websites." Akron, OH : University of Akron, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=akron1120534472.

Повний текст джерела

Анотація:

Dissertation (Ph. D.)--University of Akron, Dept. of Electrical and Computer Engineering, 2005.
"August, 2005." Title from electronic dissertation title page (viewed 01/14/2006) Advisor, John Durkin; Committee members, John Welch, James Grover, Yueh-Jaw Lin, Yingcai Xiao, Chien-Chung Chan; Department Chair, Alex Jose De Abreu-Garcia; Dean of the College, George Haritos; Dean of the Graduate School, George R. Newkome. Includes bibliographical references.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Persson, Pontus. "Identifying Early Usage Patterns That Increase User Retention Rates In A Mobile Web Browser." Thesis, Linköpings universitet, Databas och informationsteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-137793.

Повний текст джерела

Анотація:

One of the major challenges for modern technology companies is user retentionmanagement. This work focuses on identifying early usage patterns that signifyincreased retention rates in a mobile web browser.This is done using a targetedparallel implementation of the association rule mining algorithm FP-Growth.Different item subset selection techniques including clustering and otherstatistical methods have been used in order to reduce the mining time and allowfor lower support thresholds.A lot of interesting rules have been mined. The best retention-wise ruleimplies a retention rate of 99.5%. The majority of the rules analyzed in thiswork implies a retention rate increase between 150% and 200%.

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Nenadić, Oleg. "An implementation of correspondence analysis in R and its application in the analysis of web usage /." Göttingen : Cuvillier, 2007. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=016229974&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Cercós, Brownell Robert. "Diseño y Construcción de un Web Warehouse para Almacenar Información Extraída a Partir de Datos Originados en la Web." Tesis, Universidad de Chile, 2008. http://repositorio.uchile.cl/handle/2250/103076.

Повний текст джерела

Анотація:

El objetivo general del presente trabajo es diseñar y construir un respositorio de información respecto del uso, contenido y estructura de un sitio web. Lo anterior tiene por finalidad el aprovechamiento de los datos que genera la navegación a través de un sitio, mediante la extracción de información que sirva de apoyo para la toma de decisiones en función de mejorar su estructura y contenido. La Web se ha convertido en un canal con altas potencialidades de uso para las organizaciones, mostrando importantes ventajas, sobretodo en sus aplicaciones en ventas y marketing. Es así como se ha generado una creciente necesidad de comprender las preferencias de los visitantes de un sitio, de manera de atraer su atención e, idealmente, transformarlo en cliente. Sin embargo, debido a la gran cantidad de datos generados a partir de la navegación y del contenido de un sitio, se hace muy complejo sacar conclusiones a partir de ellos. Para dar inicio a esta investigación se estudiaron los algoritmos existentes para el procesamiento de los datos de la Web, además de los distintos modelos de almacenamiento para la información construida a partir de ellos. En base a lo anterior, se desarrolló un modelo genérico para almacenar, procesar y presentar la información. Ésta se extrae a partir de datos que se obtienen mediante una estrategia no invasiva para los visitantes, utilizando para su almacenamiento la arquitectura data warehouse, que permite mantener información limpia, consolidada y confiable a partir de una gran cantidad de datos provenientes de múltiples fuentes. Posteriormente, el modelo desarrollado se aplicó a un sitio web real relacionado a la industria bancaria, de manera de probar su correcto funcionamiento y utilidad. Como resultado, se concluyó que la arquitectura implementada es efectiva para el análisis estadístico de los datos, siendo el porcentaje de conversión por objetivos el indicador más relevante para la medición del desempeño de un sitio web, pudiendo transformarse, incluso, en una dimensión del modelo de información. Se recomienda que, en trabajos futuros, se constraste los resultados de la operación de este repositorio con el de otras estrategias de obtención de la información.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Nagi, Mohamad. "Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14200.

Повний текст джерела

Анотація:

The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Kliegr, Tomáš. "Clickstream Analysis." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-2065.

Повний текст джерела

Анотація:

Thesis introduces current research trends in clickstream analysis and proposes a new heuristic that could be used for dimensionality reduction of semantically enriched data in Web Usage Mining (WUM). Click-fraud and conversion fraud are identified as key prospective application areas for WUM. Thesis documents a conversion fraud vulnerability of Google Analytics and proposes defense - a new clickstream acquisition software, which collects data in sufficient granularity and structure to allow for data mining approaches to fraud detection. Three variants of K-means clustering algorithms and three association rule data mining systems are evaluated and compared on real-world web usage data.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Villar, Escobar Osvaldo Pablo. "Minería y Personalización de un Sitio Web para Celulares." Tesis, Universidad de Chile, 2007. http://www.repositorio.uchile.cl/handle/2250/104823.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Henriksson, William. "LOGGNING AV INTERAKTION MED DATAINSAMLINGSMETODER FÖR WEBBEVENTLOGGNINGSVERKTYG : Experiment om påverkan i svarstider vid loggning av interaktionsdata." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15324.

Повний текст джерела

Анотація:

Denna studie undersöker en eventuell påverkan av webbeventloggningsverktyg förautomatiserade användbarhetstestning av användarnas interaktion. I ett experiment mätssvarstider då inspelad interaktion av testpersonerna återuppspelas på den webbapplikationsom testas av webbeventloggningsverktygen med olika datainsamlingsmetoder.Experimentet är uppbyggt av fyra grupper som består av 3 loggningsverktyg somimplementerades utefter de delmålen som sattes upp. Webbeventloggningsverktygensimplementation inspireras av studiens förstudie och i deras numrering loggas allt merinteraktion av användaren som leder till en ökande mängd loggning i bytes. Studiens resultatmötte hypotesen att svarstiden för webbapplikationen när en användare interagerar på sidanökade inte märkbart och det var inte heller en statistiskt signifikant skillnad när loggningenutfördes jämfört mot den nuvarande webbplatsen.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Shao, Da. "Usage of HTML5 as the basis for a multi-platform client solution." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-77814.

Повний текст джерела

Анотація:

Technologies for how a rich user interaction can be delivered have changed a lot during the last 10 years. From a world where these types of user interfaces predominately was built as desktop applications like Windows Applications over to web technologies using Java Applets, Flash, Silverlight, Android, iOS etc. The problem though has been the vast amount of different technologies with no or limited connections between them. This has so far been a problem for application developers since essentially no code can be reused between the different platforms but also that some of them are not native – some kind of runtime platform needs to be installed (Flash, Silverlight, Java…). With the introduction of the emerging standard of HTML5 it seems that there is finally a technology that is getting more widely used by most of the platform suppliers. This Master Thesis aims to prove the HTML5 can be used to build clients for MediusFlow. The point is to investigate if it’s possible to implement all the pages of frontend on different platforms by using idea like Responsive Web Design with HTML5.The showcase will be to build a complete app for MediusFlow on the iPad platform based on HTML5. In addition the interface must be verified on at least one additional platform, most probably Internet Explorer 10.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Gomes, João Fernando dos Anjos. "Recomendação de navegação em portais da internet como um serviço suportado em ferramentas Web Analytics." Master's thesis, Instituto Politécnico de Setúbal. Escola Superior de Ciências Empresariais, 2016. http://hdl.handle.net/10400.26/17292.

Повний текст джерела

Анотація:

Dissertação apresentada para cumprimento dos requisitos necessários à obtenção do grau de Mestre de Sistemas de Informação Organizacionais
Com o constante crescimento da utilização da Internet o número de websites e respetivas páginas contínua a evoluir também, por este motivo, verifica-se uma necessidade de alinhar a experiência de utilização com os objetivos gerais de um website. Para satisfazer esta necessidade o sistema de recomendação proposto sugere páginas ao utilizador que possam ser do seu interesse com base em perfis de navegação de um website em geral. A maioria dos sistemas de recomendação são baseados em regras de associação ou palavras chave (quando o conteúdo é considerado). No entanto, quando os dados não são suficientes ou são muito dispersos e a ordem é considerada, uma abordagem tradicional pode ser inadequada. Por outro lado, assumindo outro paradigma, a área de Web Analytics, tem obtido um crescimento considerável, através de ferramentas robustas que permitem a recolha e análise de dados da internet, a fim de compreender e otimizar eficiência e eficácia do website. O presente artigo propõe o desenvolvimento de um sistema de recomendação baseado na ferramenta Google Analytics. O protótipo é composto por dois componentes principais que são: 1) um serviço responsável pela construção e lógica associada à criação das recomendações; 2) uma biblioteca incorporável em qualquer website que providenciará um widget de recomendação configurável. Avaliações preliminares constataram que a implementação segue a lógica do modelo proposto.
As the Internet usage keeps increasing, the number of web sites and hence the number of web pages also keeps increasing, so there is a need to align the user experience with the overall websites purposes. Toward this requirement, the proposed recommendation systems suggest the user pages that might be of its interest based on past navigation profiles of overall site usage. Most of existing recommendation systems are based on association rules or based on keywords (when content is considered). However, on usage data shortage or sparse data and if sequential order is to be considered such traditional approaches may become unsuitable. Conversely, the Web Analytics arena, assuming other paradigm, has experienced a considerable growth through mature tools that allow the collection and analysis of internet data in order to understand and optimize website efficiency and efficacy. This work proposes the development of a recommendation system based on the Google Analytics tool. The prototype is constituted by two main components which are: 1) a service responsible for the construction and associated logic that underlies recommendations generation; 2) an embeddable library on any website that will furnish website with a configurable recommendation widget. Preliminary evaluations had showed that the implementation follows the logic of the proposed model.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Melhem, Hiba. "Usages et applications du web sémantique en bibliothèques numériques." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAL025/document.

Повний текст джерела

Анотація:

Ce travail de recherche se situe dans le champ interdisciplinaire des sciences de l’information et de la communication (SIC) et a pour but d’explorer la question de l'usage du web sémantique en bibliothèques numériques. Le web oblige les bibliothèques à repenser leurs organisations, leurs activités, leurs pratiques et leurs services, afin de se repositionner en tant qu'instituts de références pour la diffusion des savoirs. Dans cette thèse, nous souhaitons comprendre les contextes d'usage du web sémantique en bibliothèques numériques françaises. Il s'agit de s'interroger sur les apports du web sémantique au sein de ces bibliothèques, ainsi que sur les défis et les obstacles qui accompagnent sa mise en place. Ensuite, nous nous intéressons aux pratiques documentaires et à leurs évolutions suite à l'introduction du web sémantique en bibliothèques numériques. La problématique s'attache au rôle que peuvent jouer les professionnels de l'information dans la mise en place du web sémantique en bibliothèques numériques. Après avoir sélectionné 98 bibliothèques numériques suite à une analyse de trois recensements, une enquête s'appuyant sur un questionnaire vise à recueillir des données sur l'usage du web sémantique dans ces bibliothèques. Ensuite, une deuxième enquête réalisée au moyen d'entretiens permet de mettre en évidence les représentations qu'ont les professionnels de l'information du web sémantique et de son usage en bibliothèque, ainsi que de l'évolution de leurs pratiques professionnelles. Les résultats montrent que la représentation des connaissances dans le cadre du web sémantique nécessite une intervention humaine permettant de fournir le cadre conceptuel pour déterminer les liens entre les données. Enfin, les professionnels de l'information peuvent devenir des acteurs du web sémantique, dans le sens où leurs rôles ne se limitent pas à l'utilisation du web sémantique mais aussi au développement de ses standards pour assurer une meilleure organisation des connaissances
This research work deals with the interdisciplinary field of the information and communication sciences (CIS) and aims to explore the use of the semantic web in digital libraries. The web requires libraries to rethink their organizations, activities, practices and services in order to reposition themselves as reference institutes for the dissemination of knowledge. In this thesis, we wish to understand the contexts of use of the semantic web in French digital libraries. It questions the contributions of the semantic web within these libraries, as well as on the challenges and the obstacles that accompany its implementation. We are also interested in documentary practices and their evolutions following the introduction of the semantic web in digital libraries. The problem is related to the role that information professionals can play in the implementation of the semantic web in digital libraries. After selecting 98 digital libraries following an analysis of three censuses, a questionnaire survey aims to collect data on the use of the semantic web in these libraries. Then, a second interview-based survey consists of highlighting the representations that the information professionals have of the semantic web and its use in the library, as well as on the evolution of their professional practices. The results show that the representation of knowledge within the semantic web requires human intervention to provide the conceptual framework to determine the links between the data. Finally, information professionals can become actors of the semantic web, in the sense that their roles are not limited to the use of the semantic web but also to the development of its standards to ensure better organization of knowledge

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Tillemans, Stephen. "Development of an instrument for data collection in a multidimensional scaling study of personal Web usage in the South African workplace." Thesis, Stellenbosch : Stellenbosch University, 2011. http://hdl.handle.net/10019.1/21646.

Повний текст джерела

Анотація:

Thesis (MBA)--Stellenbosch University, 2011.
In a relatively very short period the Internet has grown from being virtually unknown to becoming an essential business tool. Together with its many benefits, the Internet has unfortunately brought with it several new organisational challenges. One of these challenges is how to manage personal Web usage (PWU) in the workplace effectively. Although many managers see PWU as a form of workplace deviance, many researchers have pointed out its potential benefits such as learning, time-saving, employee well-being and a source of ideas. To help organisations manage PWU in the workplace more effectively, this research realised the need for a typology of PWU behaviours in the South African workplace. Multidimensional scaling (MDS) was identified as an objective method of creating such a typology. The objective of this research was therefore to develop an instrument to gather data for a multidimensional scaling study of PWU behaviours in the South African workplace. A questionnaire was designed that consists of three distinct sections. The first section contains seven pre-coded demographics questions that correspond with specific demographic variables, proven to have a relationship with PWU. The second section of the questionnaire is designed to gather dissimilarity data for input into an MDS algorithm. To begin with, 25 Web usage behaviours of South Africans were identified using Google Ad Planner. After weighing up various options of comparing the Web usage behaviours, the pairwise comparison method was selected. Ross sequencing was used to reduce positioning and timing effects. To reduce the number of judgements per participant, the 300 required judgments are split six ways, resulting in 50 judgements per participant. The last section of the questionnaire is designed to gather data to assist with interpreting the dimensions of the MDS configuration. Eight benefits and risks of PWU were identified. These are combined into a matrix together with the 25 Web usage behaviours. The data from this section will allow future research to use linear regression to discover the relationship between the Web usage behaviours (the objects), and the benefits and risks of PWU (the variables). It is believed that this design offers a fair compromise between the time and effort required of participants and the quality and integrity of the acquired data.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Johnsson, Daniel. "Creating and Evaluating a Useful Web Application for Introduction to Programming." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-172528.

Повний текст джерела

Анотація:

The aim of this thesis is to build a web application to teach students programming in Python through code puzzles that do not require them to write any code, to answer the research question How should a quiz application for introduction to Python programming be developed to be useful? The web application's utility and usability are evaluated through the learnability metric relative user efficiency. Data was collected and analyzed using Google Analytics and BigQuery. The study found that users were successfully aided with theoretical sections pertaining to the puzzles and even if programming is mainly a desktop activity there is still an interest for mobile access. Although evaluation of relative user efficiency did not serve as a sufficient learnability measure for this type of application, conclusions from the data analysis still gave insights into the utility of the web application.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Mair, Patrick, and Marcus Hudec. "Session Clustering Using Mixtures of Proportional Hazards Models." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2008. http://epub.wu.ac.at/598/1/document.pdf.

Повний текст джерела

Анотація:

Emanating from classical Weibull mixture models we propose a framework for clustering survival data with various proportionality restrictions imposed. By introducing mixtures of Weibull proportional hazards models on a multivariate data set a parametric cluster approach based on the EM-algorithm is carried out. The problem of non-response in the data is considered. The application example is a real life data set stemming from the analysis of a world-wide operating eCommerce application. Sessions are clustered due to the dwell times a user spends on certain page-areas. The solution allows for the interpretation of the navigation behavior in terms of survival and hazard functions. A software implementation by means of an R package is provided. (author´s abstract)
Series: Research Report Series / Department of Statistics and Mathematics

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Kilic, Sefa. "Clustering Frequent Navigation Patterns From Website Logs Using Ontology And Temporal Information." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12613979/index.pdf.

Повний текст джерела

Анотація:

Given set of web pages labeled with ontological items, the level of similarity between two web pages is measured using the level of similarity between ontological items of pages labeled with. Using similarity measure between two pages, degree of similarity between two sequences of web page visits can be calculated as well. Using clustering algorithms, similar frequent sequences are grouped and representative sequences are selected from these groups. A new sequence is compared with all clusters and it is assigned to most similar one. Representatives of the most similar cluster can be used in several real world cases. They can be used for predicting and prefetching the next page user will visit or for helping the navigation of user in the website. They can also be used to improve the structure of website for easier navigation. In this study the effect of time spent on each web page during the session is analyzed.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Vlk, Vladimír. "Získávání znalostí z webových logů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236196.

Повний текст джерела

Анотація:

This master's thesis deals with creating of an application, goal of which is to perform data preprocessing of web logs and finding association rules in them. The first part deals with the concept of Web mining. The second part is devoted to Web usage mining and notions related to it. The third part deals with design of the application. The forth section is devoted to describing the implementation of the application. The last section deals with experimentation with the application and results interpretation.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Windmiller, Sarah M. "Alternatives to smartphone applications for real-time information and technology usage among transit riders." Thesis, Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50369.

Повний текст джерела

Анотація:

Real-time information that informs transit riders about transit schedules, next bus or train arrivals, and service alerts, is becoming increasingly available, particularly through internet-enabled smartphone applications. However, the extent of communication technology usage amongst transit riders, specifically their access to mobile applications and alternative technologies that can provide real-time information, is largely unknown. Without this information, transit agencies are risking investing in an alternative technology that may not sufficiently supply real-time information to as many as possible riders. The purpose of this study is to identify the differences in individual technology accessibility and prioritize investing in real-time information application development that mirrors the unique characteristics of transit riders. This recognition and development will allow a wider availability of real-time information amongst transit riders. Paired with an investigation of cellular phone usage among transit riders and the general American population, an analysis of Saint Louis Metro’s Onboard Survey was performed. Cross tabulations and chi-squared tests were conducted to examine riders’ communication technology usage. Binary logit models were used to understand how, and whether, the ownership of smartphone applications is dependent on various demographic factors. These analyses identified specific demographic groups that would benefit from supplemental technology methods more conducive to their particular information accessibility. Results showed that communication technology usage has risen substantially in recent years but a portion of riders are still without access to smartphone applications. Specific demographic groups (e.g., riders over 40 years of age) were less likely to own smartphones, and these results indicate that computer-based websites and IVR are the best supplementary alternatives for those groups.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Chen, Xiaowei. "Measurement, analysis and improvement of BitTorrent Darknets." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1545.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Ammari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns : the development and evaluation of new Web mining methods that enhance information retrieval and improve the understanding of users' Web behavior in websites and social blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.

Повний текст джерела

Анотація:

The rapid growth of the World Wide Web in the last decade makes it the largest publicly accessible data source in the world, which has become one of the most significant and influential information revolution of modern times. The influence of the Web has impacted almost every aspect of humans' life, activities and fields, causing paradigm shifts and transformational changes in business, governance, and education. Moreover, the rapid evolution of Web 2.0 and the Social Web in the past few years, such as social blogs and friendship networking sites, has dramatically transformed the Web from a raw environment for information consumption to a dynamic and rich platform for information production and sharing worldwide. However, this growth and transformation of the Web has resulted in an uncontrollable explosion and abundance of the textual contents, creating a serious challenge for any user to find and retrieve the relevant information that he truly seeks to find on the Web. The process of finding a relevant Web page in a website easily and efficiently has become very difficult to achieve. This has created many challenges for researchers to develop new mining techniques in order to improve the user experience on the Web, as well as for organizations to understand the true informational interests and needs of their customers in order to improve their targeted services accordingly by providing the products, services and information that truly match the requirements of every online customer. With these challenges in mind, Web mining aims to extract hidden patterns and discover useful knowledge from Web page contents, Web hyperlinks, and Web usage logs. Based on the primary kinds of Web data used in the mining process, Web mining tasks can be categorized into three main types: Web content mining, which extracts knowledge from Web page contents using text mining techniques, Web structure mining, which extracts patterns from the hyperlinks that represent the structure of the website, and Web usage mining, which mines user's Web navigational patterns from Web server logs that record the Web page access made by every user, representing the interactional activities between the users and the Web pages in a website. The main goal of this thesis is to contribute toward addressing the challenges that have been resulted from the information explosion and overload on the Web, by proposing and developing novel Web mining-based approaches. Toward achieving this goal, the thesis presents, analyzes, and evaluates three major contributions. First, the development of an integrated Web structure and usage mining approach that recommends a collection of hyperlinks for the surfers of a website to be placed at the homepage of that website. Second, the development of an integrated Web content and usage mining approach to improve the understanding of the user's Web behavior and discover the user group interests in a website. Third, the development of a supervised classification model based on recent Social Web concepts, such as Tag Clouds, in order to improve the retrieval of relevant articles and posts from Web social blogs.

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Ammari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.

Повний текст джерела

Анотація:

The rapid growth of the World Wide Web in the last decade makes it the largest publicly accessible data source in the world, which has become one of the most significant and influential information revolution of modern times. The influence of the Web has impacted almost every aspect of humans' life, activities and fields, causing paradigm shifts and transformational changes in business, governance, and education. Moreover, the rapid evolution of Web 2.0 and the Social Web in the past few years, such as social blogs and friendship networking sites, has dramatically transformed the Web from a raw environment for information consumption to a dynamic and rich platform for information production and sharing worldwide. However, this growth and transformation of the Web has resulted in an uncontrollable explosion and abundance of the textual contents, creating a serious challenge for any user to find and retrieve the relevant information that he truly seeks to find on the Web. The process of finding a relevant Web page in a website easily and efficiently has become very difficult to achieve. This has created many challenges for researchers to develop new mining techniques in order to improve the user experience on the Web, as well as for organizations to understand the true informational interests and needs of their customers in order to improve their targeted services accordingly by providing the products, services and information that truly match the requirements of every online customer. With these challenges in mind, Web mining aims to extract hidden patterns and discover useful knowledge from Web page contents, Web hyperlinks, and Web usage logs. Based on the primary kinds of Web data used in the mining process, Web mining tasks can be categorized into three main types: Web content mining, which extracts knowledge from Web page contents using text mining techniques, Web structure mining, which extracts patterns from the hyperlinks that represent the structure of the website, and Web usage mining, which mines user's Web navigational patterns from Web server logs that record the Web page access made by every user, representing the interactional activities between the users and the Web pages in a website. The main goal of this thesis is to contribute toward addressing the challenges that have been resulted from the information explosion and overload on the Web, by proposing and developing novel Web mining-based approaches. Toward achieving this goal, the thesis presents, analyzes, and evaluates three major contributions. First, the development of an integrated Web structure and usage mining approach that recommends a collection of hyperlinks for the surfers of a website to be placed at the homepage of that website. Second, the development of an integrated Web content and usage mining approach to improve the understanding of the user's Web behavior and discover the user group interests in a website. Third, the development of a supervised classification model based on recent Social Web concepts, such as Tag Clouds, in order to improve the retrieval of relevant articles and posts from Web social blogs.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Calderón-Benavides, Liliana. "Unsupervised Identification of the User’s Query Intent in Web Search." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/51299.

Повний текст джерела

Анотація:

This doctoral work focuses on identifying and understanding the intents that motivate a user to perform a search on the Web. To this end, we apply machine learning models that do not require more information than the one provided by the very needs of the users, which in this work are represented by their queries. The knowledge and interpretation of this invaluable information can help search engines to obtain resources especially relevant to users, and thus improve their satisfaction. By means of unsupervised learning techniques, which have been selected according to the context of the problem being solved, we show that is not only possible to identify the user’s intents, but that this process can be conducted automatically. The research conducted in this thesis has involved an evolutionary process that starts from the manual analysis of different sets of real user queries from a search engine. The work passes through the proposition of a new classification of user’s query intents; the application of different unsupervised learning techniques to identify those intents; up to determine that the user’s intents, rather than being considered as an uni–dimensional problem, should be conceived as a composition of several aspects, or dimensions (i.e., as a multi–dimensional problem), that contribute to clarify and to establish what the user’s intents are. Furthermore, from this last proposal, we have configured a framework for the on–line identification of the user’s query intent. Overall, the results from this research have shown to be effective for the problem of identifying user’s query intent.
Este trabajo doctoral se enfoca en identificar y entender las intenciones que motivan a los usuarios a realizar búsquedas en la Web a través de la aplicación de métodos de aprendizaje automático que no requieren datos adicionales más que las necesidades de información de los mismos usuarios, representadas a través de sus consultas. El conocimiento y la interpretación de esta información, de valor incalculable, puede ayudar a los sistemas de búsqueda Web a encontrar recursos particularmente relevantes y así mejorar la satisfacción de sus usuarios. A través del uso de técnicas de aprendizaje no supervisado, las cuales han sido seleccionadas dependiendo del contexto del problema a solucionar, y cuyos resultados han demostrado ser efectivos para cada uno de los problemas planteados, a lo largo de este trabajo se muestra que no solo es posible identificar las intenciones de los usuarios, sino que este es un proceso que se puede llevar a cabo de manera automática. La investigación desarrollada en esta tesis ha implicado un proceso evolutivo, el cual inicia con el análisis de la clasificación manual de diferentes conjuntos de consultas que usuarios reales han sometido a un motor de búsqueda. El trabajo pasa a través de la proposición de una nueva clasificación de las intenciones de consulta de usuarios, y el uso de diferentes técnicas de aprendizaje no supervisado para identificar dichas intenciones, llegando hasta establecer que éste no es un problema unidimensional, sino que debería ser considerado como un problema de múltiples dimensiones, donde cada una de dichas dimensiones, o facetas, contribuye a clarificar y establecer cuál es la intención del usuario. A partir de este último trabajo, hemos creado un modelo para la identificar la intención del usuario en un escenario on–line.

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Song, Ge. "Méthodes parallèles pour le traitement des flux de données continus." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC059/document.

Повний текст джерела

Анотація:

Nous vivons dans un monde où une grande quantité de données est généré en continu. Par exemple, quand on fait une recherche sur Google, quand on achète quelque chose sur Amazon, quand on clique en ‘Aimer’ sur Facebook, quand on upload une image sur Instagram, et quand un capteur est activé, etc., de nouvelles données vont être généré. Les données sont différentes d’une simple information numérique, mais viennent dans de nombreux format. Cependant, les données prisent isolément n’ont aucun sens. Mais quand ces données sont reliées ensemble on peut en extraire de nouvelles informations. De plus, les données sont sensibles au temps. La façon la plus précise et efficace de représenter les données est de les exprimer en tant que flux de données. Si les données les plus récentes ne sont pas traitées rapidement, les résultats obtenus ne sont pas aussi utiles. Ainsi, un système parallèle et distribué pour traiter de grandes quantités de flux de données en temps réel est un problème de recherche important. Il offre aussi de bonne perspective d’application. Dans cette thèse nous étudions l’opération de jointure sur des flux de données, de manière parallèle et continue. Nous séparons ce problème en deux catégories. La première est la jointure en parallèle et continue guidée par les données. La second est la jointure en parallèle et continue guidée par les requêtes
We live in a world where a vast amount of data is being continuously generated. Data is coming in a variety of ways. For example, every time we do a search on Google, every time we purchase something on Amazon, every time we click a ‘like’ on Facebook, every time we upload an image on Instagram, every time a sensor is activated, etc., it will generate new data. Data is different than simple numerical information, it now comes in a variety of forms. However, isolated data is valueless. But when this huge amount of data is connected, it is very valuable to look for new insights. At the same time, data is time sensitive. The most accurate and effective way of describing data is to express it as a data stream. If the latest data is not promptly processed, the opportunity of having the most useful results will be missed.So a parallel and distributed system for processing large amount of data streams in real time has an important research value and a good application prospect. This thesis focuses on the study of parallel and continuous data stream Joins. We divide this problem into two categories. The first one is Data Driven Parallel and Continuous Join, and the second one is Query Driven Parallel and Continuous Join

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Malherbe, Emmanuel. "Standardization of textual data for comprehensive job market analysis." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC058/document.

Повний текст джерела

Анотація:

Sachant qu'une grande partie des offres d'emplois et des profils candidats est en ligne, le e-recrutement constitue un riche objet d'étude. Ces documents sont des textes non structurés, et le grand nombre ainsi que l'hétérogénéité des sites de recrutement implique une profusion de vocabulaires et nomenclatures. Avec l'objectif de manipuler plus aisément ces données, Multiposting, une entreprise française spécialisée dans les outils de e-recrutement, a soutenu cette thèse, notamment en terme de données, en fournissant des millions de CV numériques et offres d'emplois agrégées de sources publiques.Une difficulté lors de la manipulation de telles données est d'en déduire les concepts sous-jacents, les concepts derrière les mots n'étant compréhensibles que des humains. Déduire de tels attributs structurés à partir de donnée textuelle brute est le problème abordé dans cette thèse, sous le nom de normalisation. Avec l'objectif d'un traitement unifié, la normalisation doit fournir des valeurs dans une nomenclature, de sorte que les attributs résultants forment une représentation structurée unique de l'information. Ce traitement traduit donc chaque document en un language commun, ce qui permet d'agréger l'ensemble des données dans un format exploitable et compréhensible. Plusieurs questions sont cependant soulevées: peut-on exploiter les structures locales des sites web dans l'objectif d'une normalisation finale unifiée? Quelle structure de nomenclature est la plus adaptée à la normalisation, et comment l'exploiter? Est-il possible de construire automatiquement une telle nomenclature de zéro, ou de normaliser sans en avoir une?Pour illustrer le problème de la normalisation, nous allons étudier par exemple la déduction des compétences ou de la catégorie professionelle d'une offre d'emploi, ou encore du niveau d'étude d'un profil de candidat. Un défi du e-recrutement est que les concepts évoluent continuellement, de sorte que la normalisation se doit de suivre les tendances du marché. A la lumière de cela, nous allons proposer un ensemble de modèles d'apprentissage statistique nécessitant le minimum de supervision et facilement adaptables à l'évolution des nomenclatures. Les questions posées ont trouvé des solutions dans le raisonnement à partir de cas, le learning-to-rank semi-supervisé, les modèles à variable latente, ainsi qu'en bénéficiant de l'Open Data et des médias sociaux. Les différents modèles proposés ont été expérimentés sur des données réelles, avant d'être implémentés industriellement. La normalisation résultante est au coeur de SmartSearch, un projet qui fournit une analyse exhaustive du marché de l'emploi
With so many job adverts and candidate profiles available online, the e-recruitment constitutes a rich object of study. All this information is however textual data, which from a computational point of view is unstructured. The large number and heterogeneity of recruitment websites also means that there is a lot of vocabularies and nomenclatures. One of the difficulties when dealing with this type of raw textual data is being able to grasp the concepts contained in it, which is the problem of standardization that is tackled in this thesis. The aim of standardization is to create a unified process providing values in a nomenclature. A nomenclature is by definition a finite set of meaningful concepts, which means that the attributes resulting from standardization are a structured representation of the information. Several questions are however raised: Are the websites' structured data usable for a unified standardization? What structure of nomenclature is the best suited for standardization, and how to leverage it? Is it possible to automatically build such a nomenclature from scratch, or to manage the standardization process without one? To illustrate the various obstacles of standardization, the examples we are going to study include the inference of the skills or the category of a job advert, or the level of training of a candidate profile. One of the challenges of e-recruitment is that the concepts are continuously evolving, which means that the standardization must be up-to-date with job market trends. In light of this, we will propose a set of machine learning models that require minimal supervision and can easily adapt to the evolution of the nomenclatures. The questions raised found partial answers using Case Based Reasoning, semi-supervised Learning-to-Rank, latent variable models, and leveraging the evolving sources of the semantic web and social media. The different models proposed have been tested on real-world data, before being implemented in a industrial environment. The resulting standardization is at the core of SmartSearch, a project which provides a comprehensive analysis of the job market

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Castellanos-Paez, Sandra. "Apprentissage de routines pour la prise de décision séquentielle." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM043.

Повний текст джерела

Анотація:

Intuitivement, un système capable d'exploiter son expérience devrait être capable d'atteindre de meilleures performances. Une façon de tirer parti des expériences passées est d'apprendre des macros (c.-à-d. des routines), elle peuvent être ensuite utilisés pour améliorer la performance du processus de résolution de nouveaux problèmes. Le défi de la planification automatique est de développer des techniques de planification capables d'explorer efficacement l'espace de recherche qui croît exponentiellement. L'apprentissage de macros à partir de connaissances précédemment acquises s'avère bénéfique pour l'amélioration de la performance d'un planificateur.Cette thèse contribue principalement au domaine de la planification automatique, et plus spécifiquement à l’apprentissage de macros pour la planification classique. Nous nous sommes concentrés sur le développement d'un modèle d'apprentissage indépendant du domaine qui identifie des séquences d'actions (même non adjacentes) à partir de plans solutions connus. Ce dernier sélectionne les routines les plus utiles (c'est-à-dire les macros), grâce à une évaluation a priori, pour améliorer le domaine de planification.Tout d'abord, nous avons étudié la possibilité d'utiliser la fouille de motifs séquentiels pour extraire des séquences fréquentes d'actions à partir de plans de solutions connus, et le lien entre la fréquence d'une macro et son utilité. Nous avons découvert que la fréquence seule peut ne pas fournir une sélection cohérente de macro-actions utiles (c.-à-d. des séquences d'actions avec des objets constants).Ensuite, nous avons discuté du problème de l'apprentissage des macro-opérateurs (c'est-à-dire des séquences d'actions avec des objets variables) en utilisant des algorithmes classiques de fouille de motifs dans la planification. Malgré les efforts, nous nous sommes trouvés dans une impasse dans le processus de sélection car les structures de filtrage de la fouille de motifs ne sont pas adaptées à la planification.Finalement, nous avons proposé une nouvelle approche appelée METEOR, qui permet de trouver les séquences fréquentes d'opérateurs d'un ensemble de plans sans perte d'information sur leurs caractéristiques. Cette approche a été conçue pour l'extraction des macro-opérateurs à partir de plans solutions connus, et pour la sélection d'un ensemble optimal de macro-opérateurs maximisant le gain en nœuds. Il s'est avéré efficace pour extraire avec succès des macro-opérateurs de différentes longueurs pour quatre domaines de référence différents. De plus, grâce à la phase de sélection l'approche a montré un impact positif sur le temps de recherche sans réduire drastiquement la qualité des plans
Intuitively, a system capable of exploiting its past experiences should be able to achieve better performance. One way to build on past experiences is to learn macros (i.e. routines). They can then be used to improve the performance of the solving process of new problems. In automated planning, the challenge remains on developing powerful planning techniques capable of effectively explore the search space that grows exponentially. Learning macros from previously acquired knowledge has proven to be beneficial for improving a planner's performance. This thesis contributes mainly to the field of automated planning, and it is more specifically related to learning macros for classical planning. We focused on developing a domain-independent learning framework that identifies sequences of actions (even non-adjacent) from past solution plans and selects the most useful routines (i.e. macros), based on a priori evaluation, to enhance the planning domain.First, we studied the possibility of using sequential pattern mining for extracting frequent sequences of actions from past solution plans, and the link between the frequency of a macro and its utility. We found out that the frequency alone may not provide a consistent selection of useful macro-actions (i.e. sequences of actions with constant objects).Second, we discussed the problem of learning macro-operators (i.e. sequences of actions with variable objects) by using classic pattern mining algorithms in planning. Despite the efforts, we find ourselves in a dead-end with the selection process because the pattern mining filtering structures are not adapted to planning.Finally, we provided a novel approach called METEOR, which ensures to find the frequent sequences of operators from a set of plans without a loss of information about their characteristics. This framework was conceived for mining macro-operators from past solution plans, and for selecting the optimal set of macro-operators that maximises the node gain. It has proven to successfully mine macro-operators of different lengths for four different benchmarks domains and thanks to the selection phase, be able to deliver a positive impact on the search time without drastically decreasing the quality of the plans

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Klinczak, Marjori Naiele Mocelin. "Identificação e propagação de temas em redes sociais." Universidade Tecnológica Federal do Paraná, 2016. http://repositorio.utfpr.edu.br/jspui/handle/1/2304.

Повний текст джерела

Анотація:

Os últimos anos foram marcados pelo surgimento de diversas mídias sociais, desde o Orkut até o Facebook, assim como Twitter, Youtube, Google+ e tantos outros: cada um oferece novas funcionalidades como forma de atrair um maior número de usuários. Essas mídias sociais geram uma grande quantidade de dados, que se devidamente processados podem ser utilizados para se identificar tendências, padrões e mudanças. O objetivo deste trabalho é a descoberta dos principais temas abordados em uma rede social, caracterizados como agrupamentos de termos relevantes, restritos a determinado contexto e o estudo de sua evolução ao longo do tempo. Para tanto serão utilizados procedimentos fundamentados em Mineração de Dados e no Processamento de Textos. Em um primeiro momento são utilizadas técnicas de pré-processamento de textos com o objetivo de identificar os termos mais relevantes que aparecem nas mensagens textuais da rede social. Em seguida utilizam-se algoritmos clássicos de agrupamento - k-means, k-medoids, DBSCAN - e o recente NMF (Non-negative Matrix Factorization), para a identificação dos temas principais destas mensagens, caracterizados como agrupamentos de termos relevantes. A proposta foi avaliada sobre a rede Twitter, utilizando-se bases de tweets considerando diversos contextos. Os resultados obtidos evidenciam a viabilidade da proposta e sua aplicação na identificação de temas relevantes desta rede social.
Recent years have been marked by the emergence of various social media, from Orkut to Facebook, and Twitter, Youtube, Google+ and many others: each offers new features as a way to attract more users. These social media generate a large amount of data which is processed properly can be used to identify trends, patterns and changes. The objective of this work is the discovery of the key topics in a social network, characterized as relevant terms groupings, restricted to a particular context and the study of its evolution over time. For that will be used procedures based on Data Mining and Text Processing. At first techniques are used preprocessing of texts in order to identify the most relevant terms that appear in the text messages from the social network. Next are used grouping of classical algorithms - k-means, k-medoids, DBSCAN - and the recent NMF (Non-negative Matrix Factorization), to identify the main themes of these messages, characterized as relevant terms groupings. The proposal was evaluated on the Twitter network, using bases tweets considering different contexts. The results show the feasibility of the proposal and its application in the identification of relevant topics of this social network

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Nguyen, Hoang Viet Tuan. "Prise en compte de la qualité des données lors de l’extraction et de la sélection d’évolutions dans les séries temporelles de champs de déplacements en imagerie satellitaire." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAA011.

Повний текст джерела

Анотація:

Ce travail de thèse traite de la découverte de connaissances à partir de Séries Temporelles de Champs de Déplacements (STCD) obtenues par imagerie satellitaire. De telles séries occupent aujourd'hui une place centrale dans l'étude et la surveillance de phénomènes naturels tels que les tremblements de terre, les éruptions volcaniques ou bien encore le déplacement des glaciers. En effet, ces séries sont riches d'informations à la fois spatiales et temporelles et peuvent aujourd'hui être produites régulièrement à moindre coût grâce à des programmes spatiaux tels que le programme européen Copernicus et ses satellites phares Sentinel. Nos propositions s'appuient sur l'extraction de motifs Séquentiels Fréquents Groupés (SFG). Ces motifs, à l'origine définis pour l'extraction de connaissances à partir des Séries Temporelles d’Images Satellitaires (STIS), ont montré leur potentiel dans de premiers travaux visant à dépouiller une STCD. Néanmoins, ils ne permettent pas d'utiliser les indices de confiance intrinsèques aux STCD et la méthode de swap randomisation employée pour sélectionner les motifs les plus prometteurs ne tient pas compte de leurs complémentarités spatiotemporelles, chaque motif étant évalué individuellement. Notre contribution est ainsi double. Une première proposition vise tout d'abord à associer une mesure de fiabilité à chaque motif en utilisant les indices de confiance. Cette mesure permet de sélectionner les motifs portés par des données qui sont en moyenne suffisamment fiables. Nous proposons un algorithme correspondant pour réaliser les extractions sous contrainte de fiabilité. Celui-ci s'appuie notamment sur une recherche efficace des occurrences les plus fiables par programmation dynamique et sur un élagage de l'espace de recherche grâce à une stratégie de push partiel, ce qui permet de considérer des STCD conséquentes. Cette nouvelle méthode a été implémentée sur la base du prototype existant SITS-P2miner, développé au sein du LISTIC et du LIRIS pour extraire et classer des motifs SFG. Une deuxième contribution visant à sélectionner les motifs les plus prometteurs est également présentée. Celle-ci, basée sur un critère informationnel, permet de prendre en compte à la fois les indices de confiance et la façon dont les motifs se complètent spatialement et temporellement. Pour ce faire, les indices de confiance sont interprétés comme des probabilités, et les STCD comme des bases de données probabilistes dont les distributions ne sont que partielles. Le gain informationnel associé à un motif est alors défini en fonction de la capacité de ses occurrences à compléter/affiner les distributions caractérisant les données. Sur cette base, une heuristique est proposée afin de sélectionner des motifs informatifs et complémentaires. Cette méthode permet de fournir un ensemble de motifs faiblement redondants et donc plus faciles à interpréter que ceux fournis par swap randomisation. Elle a été implémentée au sein d'un prototype dédié. Les deux propositions sont évaluées à la fois quantitativement et qualitativement en utilisant une STCD de référence couvrant des glaciers du Groenland construite à partir de données optiques Landsat. Une autre STCD que nous avons construite à partir de données radar TerraSAR-X couvrant le massif du Mont-Blanc est également utilisée. Outre le fait d'être construites à partir de données et de techniques de télédétection différentes, ces séries se différencient drastiquement en termes d'indices de confiance, la série couvrant le massif du Mont-Blanc se situant à des niveaux de confiance très faibles. Pour les deux STCD, les méthodes proposées ont été mises en œuvre dans des conditions standards au niveau consommation de ressources (temps, espace), et les connaissances des experts sur les zones étudiées ont été confirmées et complétées
This PhD thesis deals with knowledge discovery from Displacement Field Time Series (DFTS) obtained by satellite imagery. Such series now occupy a central place in the study and monitoring of natural phenomena such as earthquakes, volcanic eruptions and glacier displacements. These series are indeed rich in both spatial and temporal information and can now be produced regularly at a lower cost thanks to spatial programs such as the European Copernicus program and its famous Sentinel satellites. Our proposals are based on the extraction of grouped frequent sequential patterns. These patterns, originally defined for the extraction of knowledge from Satellite Image Time Series (SITS), have shown their potential in early work to analyze a DFTS. Nevertheless, they cannot use the confidence indices coming along with DFTS and the swap method used to select the most promising patterns does not take into account their spatiotemporal complementarities, each pattern being evaluated individually. Our contribution is thus double. A first proposal aims to associate a measure of reliability with each pattern by using the confidence indices. This measure allows to select patterns having occurrences in the data that are on average sufficiently reliable. We propose a corresponding constraint-based extraction algorithm. It relies on an efficient search of the most reliable occurrences by dynamic programming and on a pruning of the search space provided by a partial push strategy. This new method has been implemented on the basis of the existing prototype SITS-P2miner, developed by the LISTIC and LIRIS laboratories to extract and rank grouped frequent sequential patterns. A second contribution for the selection of the most promising patterns is also made. This one, based on an informational criterion, makes it possible to take into account at the same time the confidence indices and the way the patterns complement each other spatially and temporally. For this aim, the confidence indices are interpreted as probabilities, and the DFTS are seen as probabilistic databases whose distributions are only partial. The informational gain associated with a pattern is then defined according to the ability of its occurrences to complete/refine the distributions characterizing the data. On this basis, a heuristic is proposed to select informative and complementary patterns. This method provides a set of weakly redundant patterns and therefore easier to interpret than those provided by swap randomization. It has been implemented in a dedicated prototype. Both proposals are evaluated quantitatively and qualitatively using a reference DFTS covering Greenland glaciers constructed from Landsat optical data. Another DFTS that we built from TerraSAR-X radar data covering the Mont-Blanc massif is also used. In addition to being constructed from different data and remote sensing techniques, these series differ drastically in terms of confidence indices, the series covering the Mont-Blanc massif being at very low levels of confidence. In both cases, the proposed methods operate under standard conditions of resource consumption (time, space), and experts’ knowledge of the studied areas is confirmed and completed

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Braik, William. "Détection d'évènements complexes dans les flux d'évènements massifs." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0596/document.

Повний текст джерела

Анотація:

La détection d’évènements complexes dans les flux d’évènements est un domaine qui a récemment fait surface dans le ecommerce. Notre partenaire industriel Cdiscount, parmi les sites ecommerce les plus importants en France, vise à identifier en temps réel des scénarios de navigation afin d’analyser le comportement des clients. Les objectifs principaux sont la performance et la mise à l’échelle : les scénarios de navigation doivent être détectés en moins de quelques secondes, alorsque des millions de clients visitent le site chaque jour, générant ainsi un flux d’évènements massif.Dans cette thèse, nous présentons Auros, un système permettant l’identification efficace et à grande échelle de scénarios de navigation conçu pour le eCommerce. Ce système s’appuie sur un langage dédié pour l’expression des scénarios à identifier. Les règles de détection définies sont ensuite compilées en automates déterministes, qui sont exécutés au sein d’une plateforme Big Data adaptée au traitement de flux. Notre évaluation montre qu’Auros répond aux exigences formulées par Cdiscount, en étant capable de traiter plus de 10,000 évènements par seconde, avec une latence de détection inférieure à une seconde
Pattern detection over streams of events is gaining more and more attention, especially in the field of eCommerce. Our industrial partner Cdiscount, which is one of the largest eCommerce companies in France, aims to use pattern detection for real-time customer behavior analysis. The main challenges to consider are efficiency and scalability, as the detection of customer behaviors must be achieved within a few seconds, while millions of unique customers visit the website every day,thus producing a large event stream. In this thesis, we present Auros, a system for large-scale an defficient pattern detection for eCommerce. It relies on a domain-specific language to define behavior patterns. Patterns are then compiled into deterministic finite automata, which are run on a BigData streaming platform. Our evaluation shows that our approach is efficient and scalable, and fits the requirements of Cdiscount

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Aleksandrova, Marharyta. "Factorisation de matrices et analyse de contraste pour la recommandation." Thesis, Université de Lorraine, 2017. http://www.theses.fr/2017LORR0080/document.

Повний текст джерела

Анотація:

Dans de nombreux domaines, les données peuvent être de grande dimension. Ça pose le problème de la réduction de dimension. Les techniques de réduction de dimension peuvent être classées en fonction de leur but : techniques pour la représentation optimale et techniques pour la classification, ainsi qu'en fonction de leur stratégie : la sélection et l'extraction des caractéristiques. L'ensemble des caractéristiques résultant des méthodes d'extraction est non interprétable. Ainsi, la première problématique scientifique de la thèse est comment extraire des caractéristiques latentes interprétables? La réduction de dimension pour la classification vise à améliorer la puissance de classification du sous-ensemble sélectionné. Nous voyons le développement de la tâche de classification comme la tâche d'identification des facteurs déclencheurs, c'est-à-dire des facteurs qui peuvent influencer le transfert d'éléments de données d'une classe à l'autre. La deuxième problématique scientifique de cette thèse est comment identifier automatiquement ces facteurs déclencheurs? Nous visons à résoudre les deux problématiques scientifiques dans le domaine d'application des systèmes de recommandation. Nous proposons d'interpréter les caractéristiques latentes de systèmes de recommandation basés sur la factorisation de matrices comme des utilisateurs réels. Nous concevons un algorithme d'identification automatique des facteurs déclencheurs basé sur les concepts d'analyse par contraste. Au travers d'expérimentations, nous montrons que les motifs définis peuvent être considérés comme des facteurs déclencheurs
In many application areas, data elements can be high-dimensional. This raises the problem of dimensionality reduction. The dimensionality reduction techniques can be classified based on their aim: dimensionality reduction for optimal data representation and dimensionality reduction for classification, as well as based on the adopted strategy: feature selection and feature extraction. The set of features resulting from feature extraction methods is usually uninterpretable. Thereby, the first scientific problematic of the thesis is how to extract interpretable latent features? The dimensionality reduction for classification aims to enhance the classification power of the selected subset of features. We see the development of the task of classification as the task of trigger factors identification that is identification of those factors that can influence the transfer of data elements from one class to another. The second scientific problematic of this thesis is how to automatically identify these trigger factors? We aim at solving both scientific problematics within the recommender systems application domain. We propose to interpret latent features for the matrix factorization-based recommender systems as real users. We design an algorithm for automatic identification of trigger factors based on the concepts of contrast analysis. Through experimental results, we show that the defined patterns indeed can be considered as trigger factors

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Valentin, Jérémie. "Usages géographiques du cyberespace : nouvelle appropriation de l'espace et l'essor d'une "néogéographie"." Thesis, Montpellier 3, 2010. http://www.theses.fr/2010MON30049.

Повний текст джерела

Анотація:

Cette recherche propose d’analyser les impacts et les enjeux géographiques d’un cyberespace omni présent. Sous l’impulsion du web 2.0 et celle des globes virtuels (Google Earth, Virtual Earth, World Wind), la production et la diffusion du savoir géographique subissent d’amples transformations. Les espaces virtuels et autres services de géolocalisation (LBS) remplacent peu à peu la carte papier et le guide touristique. Ces usages participent à l’émergence d’un espace complexe où viennent se mêler des usages dans l’espace réel et des usages dans l’espace virtuel. Parallèlement, une production d’intérêt géographique en résulte, hors des milieux qui, jusqu’à ces dernières années, en étaient les initiateurs et les utilisateurs obligés : universités, organismes de recherche, géographes professionnels, Etats, ONG, militaires … Cette thèse éclairera donc le lecteur sur la réalité géographique des (nouveaux) usages du cyberespace, qu’ils soient liés à la production « amateur » de contenus géographiques (néogéographie) ou à la consommation « augmentée » de l’espace géographique
This research proposes to analyze the impacts and challenges of an omnipresent geographical cyberspace. Spurred on by web 2.0 and that of virtual globes (Google Earth, Virtual Earth, World Wind), the production and diffusion of geographical knowledge undergo further transformations. Virtual spaces and other location-based services (LBS) are gradually replacing the paper map and tourist guide. These uses contribute to the emergence of a complex space where uses in real space and uses in the virtual space mingle. Meanwhile, production of geographical interest results outside areas which, until recently, were the initiators and traditional users: universities, research organizations, professional geographers, states, NGOs, military ... This thesis will enlighten the reader on the geographical reality of the (new) uses of cyberspace, whether related to the production of "amateur" geographical content (neogeography) or to consumption "augmented" of geographical space

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Dash, A., and L. R. George. "Web Usage Mining: An Implementation." Thesis, 2010. http://ethesis.nitrkl.ac.in/1689/1/Thesis.pdf.

Повний текст джерела

Анотація:

Web usage mining is the area of data mining which deals with the discovery and analysis of usage patterns from Web data, specifically web logs, in order to improve web based applications. Web usage mining consists of three phases, preprocessing, pattern discovery,and pattern analysis. After the completion of these three phases the user can find the required usage patterns and use these information for the specific needs. In this project, the DSpace log files have been preprocessed to convert the data stored in them into a structured format. Thereafter, the general procedures for bot-removal and session-identification from a web log file, have been written down with certain modifications pertaining to the DSpace log files, in an algorithmic form. Furthermore, analysis of these log files using a subjective interpretation of a recently proposed algorithm EIN-WUM has also been conducted. This algorithm is based on the artificial immune system model and uses this model to learn and extract information present in the web data i.e server logs. This algorithm has been duly modified according to DSpace@NITR Website structure.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Bhalla, Karan, and Deepak Prasad. "Data preperation and pattern discovery for web usage mining." Thesis, 2007. http://ethesis.nitrkl.ac.in/4208/1/DATA_PREPERATION_AND_PATTERN.pdf.

Повний текст джерела

Анотація:

The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of traffic and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, these server logs cannot be used directly for patter discovery and analysis purposes. There are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. The objective of this paper is to discuss several data preparation techniques in order to identify unique users and user sessions. New heuristics to identify user sessions have been proposed. Also the data mining algorithms that can be applied to this processed data to discover patterns and rules have been discussed. On the basis of implementation of these algorithms, a comparative analysis among some of these algorithms is drawn on a 2-dimensional graph.

Стилі APA, Harvard, Vancouver, ISO та ін.

47

GOEL, VIVEK. "EFFICIENT ALGORITHM FOR FREQUENT PATTERN MINING AND IT’S APPLICATION IN PREDICTING PATTERN IN WEB USAGE DATA." Thesis, 2012. http://dspace.dtu.ac.in:8080/jspui/handle/repository/13930.

Повний текст джерела

Анотація:

M.TECH
Frequent Pattern Mining, the task of finding sets of items that frequently occur together in a dataset, has been at the core of the field of data mining for the past many years. With the tremendous growth of data, users are expecting more relevant and sophisticated information which may be lying hidden in the data. Data mining is often described as a discipline to find hidden information in a database. It involves different techniques and algorithms to discover useful knowledge lying hidden in the data. In this thesis, we propose an efficient algorithm for finding the frequent patterns which is the extension of IP tree algorithm. Also we prove its effectiveness over various previous algorithms like Aprioi, FPGrowth, CATS tree, Can tree. Apriori is the first popular algorithm for frequent patterns but it makes use of multiple database scan that make it inefficient for large database. To improve the drawback of Apriori algorithm, prefix-tree based algorithms have become popular. However most of the prefix tree based algorithms still suffer with either more execution time or take more memory. For e.g. FP Growth algorithm still requires two database scans and Can tree takes large memory. Our proposed algorithm constructs a FP tree like compact tree structure only for the frequent items in the database with only one database scan. It firstly store transactions in a lexicographic order tree and then restructured the tree by sorting the frequent items in a frequency-descending order and prune the infrequent items from each path. We evaluate the performance of the algorithm using both synthetic and real datasets, and the results show that the proposed algorithm is much more time efficient and take less memory than the previous algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

"Improving opinion mining with feature-opinion association and human computation." 2009. http://library.cuhk.edu.hk/record=b5894009.

Повний текст джерела

Анотація:

Chan, Kam Tong.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.
Includes bibliographical references (leaves [101]-113).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Major Topic --- p.1
Chapter 1.1.1 --- Opinion Mining --- p.1
Chapter 1.1.2 --- Human Computation --- p.2
Chapter 1.2 --- Major Work and Contributions --- p.3
Chapter 1.3 --- Thesis Outline --- p.4
Chapter 2 --- Literature Review --- p.6
Chapter 2.1 --- Opinion Mining --- p.6
Chapter 2.1.1 --- Feature Extraction --- p.6
Chapter 2.1.2 --- Sentiment Analysis --- p.9
Chapter 2.2 --- Social Computing --- p.15
Chapter 2.2.1 --- Social Bookmarking --- p.15
Chapter 2.2.2 --- Social Games --- p.18
Chapter 3 --- Feature-Opinion Association for Sentiment Analysis --- p.25
Chapter 3.1 --- Motivation --- p.25
Chapter 3.2 --- Problem Definition --- p.27
Chapter 3.2.1 --- Definitions --- p.27
Chapter 3.3 --- Closer look at the problem --- p.28
Chapter 3.3.1 --- Discussion --- p.29
Chapter 3.4 --- Proposed Approach --- p.29
Chapter 3.4.1 --- Nearest Opinion Word (DIST) --- p.31
Chapter 3.4.2 --- Co-Occurrence Frequency (COF) --- p.31
Chapter 3.4.3 --- Co-Occurrence Ratio (COR) --- p.32
Chapter 3.4.4 --- Likelihood-Ratio Test (LHR) --- p.32
Chapter 3.4.5 --- Combined Method --- p.34
Chapter 3.4.6 --- Feature-Opinion Association Algorithm --- p.35
Chapter 3.4.7 --- Sentiment Lexicon Expansion --- p.36
Chapter 3.5 --- Evaluation --- p.37
Chapter 3.5.1 --- Corpus Data Set --- p.37
Chapter 3.5.2 --- Test Data set --- p.37
Chapter 3.5.3 --- Feature-Opinion Association Accuracy --- p.38
Chapter 3.6 --- Summary --- p.45
Chapter 4 --- Social Game for Opinion Mining --- p.46
Chapter 4.1 --- Motivation --- p.46
Chapter 4.2 --- Social Game Model --- p.47
Chapter 4.2.1 --- Definitions --- p.48
Chapter 4.2.2 --- Social Game Problem --- p.51
Chapter 4.2.3 --- Social Game Flow --- p.51
Chapter 4.2.4 --- Answer Extraction Procedure --- p.52
Chapter 4.3 --- Social Game Properties --- p.53
Chapter 4.3.1 --- Type of Information --- p.53
Chapter 4.3.2 --- Game Structure --- p.55
Chapter 4.3.3 --- Verification Method --- p.59
Chapter 4.3.4 --- Game Mechanism --- p.60
Chapter 4.3.5 --- Player Requirement --- p.62
Chapter 4.4 --- Design Guideline --- p.63
Chapter 4.5 --- Opinion Mining Game Design --- p.65
Chapter 4.5.1 --- OpinionMatch --- p.65
Chapter 4.5.2 --- FeatureGuess --- p.68
Chapter 4.6 --- Summary --- p.71
Chapter 5 --- Tag Sentiment Analysis for Social Bookmark Recommendation System --- p.72
Chapter 5.1 --- Motivation --- p.72
Chapter 5.2 --- Problem Statement --- p.74
Chapter 5.2.1 --- Social Bookmarking Model --- p.74
Chapter 5.2.2 --- Social Bookmark Recommendation (SBR) Problem --- p.75
Chapter 5.3 --- Proposed Approach --- p.75
Chapter 5.3.1 --- Social Bookmark Recommendation Framework --- p.75
Chapter 5.3.2 --- Subjective Tag Detection (STD) --- p.77
Chapter 5.3.3 --- Similarity Matrices --- p.80
Chapter 5.3.4 --- User-Website matrix: --- p.81
Chapter 5.3.5 --- User-Tag matrix --- p.81
Chapter 5.3.6 --- Website-Tag matrix --- p.82
Chapter 5.4 --- Pearson Correlation Coefficient --- p.82
Chapter 5.5 --- Social Network-based User Similarity --- p.83
Chapter 5.6 --- User-oriented Website Ranking --- p.85
Chapter 5.7 --- Evaluation --- p.87
Chapter 5.7.1 --- Bookmark Data --- p.87
Chapter 5.7.2 --- Social Network --- p.87
Chapter 5.7.3 --- Subjective Tag List --- p.87
Chapter 5.7.4 --- Subjective Tag Detection --- p.88
Chapter 5.7.5 --- Bookmark Recommendation Quality --- p.90
Chapter 5.7.6 --- System Evaluation --- p.91
Chapter 5.8 --- Summary --- p.93
Chapter 6 --- Conclusion and Future Work --- p.94
Chapter A --- List of Symbols and Notations --- p.97
Chapter B --- List of Publications --- p.100
Bibliography --- p.101

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Aguiar, Daniel José Gomes. "IP network usage accounting: parte I." Master's thesis, 2015. http://hdl.handle.net/10400.13/1499.

Повний текст джерела

Анотація:

Um Internet Service Provider (ISP) tem a seu cargo a gestão de redes compostas por milhares de clientes, onde é necessário haver um controlo rígido da largura de banda e evitar o congestionamento da mesma. Para isso os ISPs precisam de sistemas capazes de analisar os dados de tráfego e atribuir a qualidade de serviço (QoS) adequada com base nesses dados. A NOS Madeira é o ISP líder na ilha da Madeira, com milhares de clientes por toda a Região. O sistema de controlo de largura de banda existente nesta empresa era obsoleto e levou à necessidade da criação de um novo sistema de controlo de largura de banda. Este novo sistema, denominado IP Network Usage Accounting, é composto por três subsistemas: IP Mapping System, Accounting System e Policy Server System. Este relatório, fala sobre o desenho, a implementação e testes do primeiro subsistema, IP Mapping System. O IP Mapping System é responsável por realizar a recolha de dados de tráfego realizado pelos clientes da NOS Madeira e a fornecê-los ao segundo subsistema, (Accounting System). Este, por sua vez, realiza uma análise desses mesmos dados e envia os resultados ao terceiro subsistema (Policy Server System) que aplica o QoS correspondente a cada IP dos clientes.
An Internet Service Provider (ISP) is responsible for the management of networks made up of thousands of customers, where there must be strict control of bandwidth and avoid congestion of it. For that ISPs need systems capable of analyzing traffic data and assign the Quality of Service (QoS) suitable in reliance thereon. NOS Madeira is the leading ISP in Madeira, with thousands of customers throughout the region. The existing bandwidth control system in this company was obsolete and led to the need to create a new bandwidth control system. This new system, called IP Network Usage Accounting, consists of three subsystems: IP Mapping System, Accounting System and Policy Server System. This report talks about the design, implementation and testing of the first subsystem, IP Mapping System. The IP Mapping System is responsible for performing the collection of traffic data held by customers of NOS Madeira and provide them to the second subsystem (Accounting System). This, in turn, performs an analysis of these data and sends the results to the third subsystem (Policy Server System) applying QoS corresponding to each of the clients IP.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Oosthuizen, Ockmer Louren. "A multi-agent collaborative personalized web mining system model." Thesis, 2008. http://hdl.handle.net/10210/508.

Повний текст джерела

Анотація:

The Internet and world wide web (WWW) have in recent years, grown exponentially in size and in terms of the volume of information that is available on it. In order to effectively deal with the huge amount of information on the web, so called web search engines have been developed for the task of retrieving useful and relevant information for its users. Unfortunately, these web search engines have not kept pace with the boom growth and commercialization of the web. The main goal of this dissertation is the development of a model for a collaborative personalized meta-search agent (COPEMSA) system for the WWW. This model will enable the personalization of web search for users. Furthermore, the model aims to leverage on current search engines on the web as well as enable collaboration between users of the search system for the purposes of sharing useful resources between them. The model also employs the use of multiple intelligent agents and web content mining techniques. This enables the model to autonomously retrieve useful information for it’s user(s) and present this information in an effective manner. In order to achieve the above stated, the COPEMSA model employs the use of multiple intelligent agents. COPEMSA consists of five core components: a user agent, a query agent, a community agent, a content mining agent and a directed web spider. The user agent learns about the user in order to introduce personal preference into user queries. The query agent is a scaled down meta-search engine with the task of submitting the personalized queries it receives from the user agent to multiple search services on theWWW. The community agent enables the search system to communicate and leverage on the search experiences of a community of searchers. The content mining agent is responsible for analysis of the retrieved results from theWWWand the presentation of these results to the system user. Finally, a directed web spider is used by the content mining agent to retrieve the actual web pages it analyzes from the WWW. In this dissertation an additional model is also presented to deal with a specific problem all web spidering software must deal with namely content and link encapsulation.
Prof. E.M. Ehlers

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "WEB USAGE DATA"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями