Увійти

Готові списки джерел за темами / Mining Frequent Patterns / Дисертації

Дисертації з теми "Mining Frequent Patterns"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Mining Frequent Patterns.

Автор: Grafiati

Опубліковано: 27 липня 2024

Оновлено: 28 липня 2024

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Mining Frequent Patterns".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Soztutar, Enis. "Mining Frequent Semantic Event Patterns." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12611007/index.pdf.

Повний текст джерела

Анотація:

Especially with the wide use of dynamic page generation, and richer user interaction in Web, traditional web usage mining methods, which are based on the pageview concept are of limited usability. For overcoming the difficulty of capturing usage behaviour, we define the concept of semantic events. Conceptually, events are higher level actions of a user in a web site, that are technically independent of pageviews. Events are modelled as objects in the domain of the web site, with associated properties. A sample event from a video web site is the '
play video event'
with properties '
video'
, '
length of video'
, '
name of video'
, etc. When the event objects belong to the domain model of the web site'
s ontology, they are referred as semantic events. In this work, we propose a new algorithm and associated framework for mining patterns of semantic events from the usage logs. We present a method for tracking and logging domain-level events of a web site, adding semantic information to events, an ordering of events in respect to the genericity of the event, and an algorithm for computing sequences of frequent events.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Jin, Ruoming. "New techniques for efficiently discovering frequent patterns." Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1121795612.

Повний текст джерела

Анотація:

Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains xvii, 170 p.; also includes graphics. Includes bibliographical references (p. 160-170). Available online via OhioLINK's ETD Center

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Zhang, Qi. "The Application of Sequential Pattern Mining in Healthcare Workflow System and an Improved Mining Algorithm Based on Pattern-Growth Approach." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378113261.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Bifet, Albert. "Adaptive Learning and Mining for Data Streams and Frequent Patterns." Doctoral thesis, Universitat Politècnica de Catalunya, 2009. http://hdl.handle.net/10803/22738.

Повний текст джерела

Анотація:

Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres. En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals. Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria. I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Bifet, Figuerol Albert Carles. "Adaptive Learning and Mining for Data Streams and Frequent Patterns." Doctoral thesis, Universitat Politècnica de Catalunya, 2009. http://hdl.handle.net/10803/22738.

Повний текст джерела

Анотація:

Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres. En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals. Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria. I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Seyfi, Majid. "Mining discriminative itemsets in data streams using different window models." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/120850/1/Majid_Seyfi_Thesis.pdf.

Повний текст джерела

Анотація:

Big data availability in areas such as social networks, online marketing systems and stock markets is a good source for knowledge discovery. This thesis studies how discriminative itemsets can be discovered in the data streams made of transactions out of user profiles. Discriminative itemsets are frequent in one data stream with much higher frequencies than same itemsets in other data streams in the application domain. This research uses heuristics to manage the large and complex datasets by decreasing the number of candidate patterns. This gives researchers a better understanding of pattern mining in multiple data streams.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

El-Sayed, Maged F. "An efficient and incremental system to mine contiguous frequent sequences." Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0130104-115506.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Meng, Jinghan. "Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6900.

Повний текст джерела

Анотація:

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence / instance hypergraphs. Based on that, we present two new support measures: minimum instance (MI) measure and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. In particular, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. Although the MVC measure is NP-hard, it can be approximated to a constant factor in polynomial time. We also provide polynomial-time relaxations for both measures and bounding theorems for all presented support measures in the hypergraph setting. We further show that the hypergraph-based framework can unify all support measures studied in this paper. This framework is also flexible in that more variants of support measures can be defined and profiled in it.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Kilic, Sefa. "Clustering Frequent Navigation Patterns From Website Logs Using Ontology And Temporal Information." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12613979/index.pdf.

Повний текст джерела

Анотація:

Given set of web pages labeled with ontological items, the level of similarity between two web pages is measured using the level of similarity between ontological items of pages labeled with. Using similarity measure between two pages, degree of similarity between two sequences of web page visits can be calculated as well. Using clustering algorithms, similar frequent sequences are grouped and representative sequences are selected from these groups. A new sequence is compared with all clusters and it is assigned to most similar one. Representatives of the most similar cluster can be used in several real world cases. They can be used for predicting and prefetching the next page user will visit or for helping the navigation of user in the website. They can also be used to improve the structure of website for easier navigation. In this study the effect of time spent on each web page during the session is analyzed.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

TATAVARTY, GIRIDHAR. "FINDING TEMPORAL ASSOCIATION RULES BETWEEN FREQUENT PATTERNS IN MULTIVARIATE TIME SERIES." University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1141325950.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Chino, Daniel Yoshinobu Takada. "Mineração de padrões frequentes em séries temporais para apoio à tomada de decisão em agrometereologia." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04062014-142915/.

Повний текст джерела

Анотація:

O crescente aumento no volume de dados complexos tem se tornado um desafio para pesquisadores. Séries temporais são um tipo de dados complexos que tem tido um crescimento em sua relevância, devido a sua importância para o monitoramento e acompanhamento de safras agrícolas. Assim, a mineração de informação a partir de grandes volumes de séries temporais para o apoio a tomada de decisões tem se tornado uma atividade valiosa. Uma das atividades importantes na mineração em séries temporais é a descoberta de padrões frequentes. Entretanto, a complexidade dessa atividade requer métodos rápidos e eficientes. Nesse contexto, esta dissertação de mestrado apresenta propostas para novos algoritmos e métodos para minerar e indexar séries temporais. Uma das propostas dessa dissertação é o índice Telesto, que utiliza uma estrutura baseada em árvores de sufixo generalizada para recuperar séries temporais em uma base de dados de séries temporais de modo rápido e eficiente. Outra proposta dessa dissertação é o algoritmo TrieMotif, que se baseia em uma trie para eliminar comparações desnecessárias entre subsequências, agilizando o processo de mineração de padrões frequentes em séries temporais. Os algoritmos propostos foram utilizados para a análise de dados climáticos e agrometeorológicos. Os resultados apresentados nessa dissertação de mestrado mostram que os algoritmos são escaláveis, podendo ser utilizados para grandes volumes de dados
Dealing with large volumes of complex data is a challenging task that has motivated many researchers around the world. Time series is a type of complex data that is growing in importance due to the increasing demand of sensors for surveillance and monitoring. Thus, mining information from large volumes of time series to support decision making is a valuable activity nowadays. This Master dissertation goes in this direction, as it proposes new algorithms and methods to mine and index time series. The novelty of the TrieMotif, a new algorithm to mine frequent patterns (motifs) from time series employing a trie structure that allows clever comparison between the sequences, as well as the Telesto index structure based on suffix trees area presented and discussed in the context of agrometeorological and climatological data, being the two main contributions of this work. The dissertation shows that the proposed algorithms are scalable, being suitable to big data, and when compared to the competitors they always presented the best results

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Legler, Thomas. "Datenzentrierte Bestimmung von Assoziationsregeln in parallelen Datenbankarchitekturen." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-23701.

Повний текст джерела

Анотація:

Die folgende Arbeit befasst sich mit der Alltagstauglichkeit moderner Massendatenverarbeitung, insbesondere mit dem Problem der Assoziationsregelanalyse. Vorhandene Datenmengen wachsen stark an, aber deren Auswertung ist für ungeübte Anwender schwierig. Daher verzichten Unternehmen auf Informationen, welche prinzipiell vorhanden sind. Assoziationsregeln zeigen in diesen Daten Abhängigkeiten zwischen den Elementen eines Datenbestandes, beispielsweise zwischen verkauften Produkten. Diese Regeln können mit Interessantheitsmaßen versehen werden, welche dem Anwender das Erkennen wichtiger Zusammenhänge ermöglichen. Es werden Ansätze gezeigt, dem Nutzer die Auswertung der Daten zu erleichtern. Das betrifft sowohl die robuste Arbeitsweise der Verfahren als auch die einfache Auswertung der Regeln. Die vorgestellten Algorithmen passen sich dabei an die zu verarbeitenden Daten an, was sie von anderen Verfahren unterscheidet. Assoziationsregelsuchen benötigen die Extraktion häufiger Kombinationen (EHK). Hierfür werden Möglichkeiten gezeigt, Lösungsansätze auf die Eigenschaften moderne System anzupassen. Als Ansatz werden Verfahren zur Berechnung der häufigsten $N$ Kombinationen erläutert, welche anders als bekannte Ansätze leicht konfigurierbar sind. Moderne Systeme rechnen zudem oft verteilt. Diese Rechnerverbünde können große Datenmengen parallel verarbeiten, benötigen jedoch die Vereinigung lokaler Ergebnisse. Für verteilte Top-N-EHK auf realistischen Partitionierungen werden hierfür Ansätze mit verschiedenen Eigenschaften präsentiert. Aus den häufigen Kombinationen werden Assoziationsregeln gebildet, deren Aufbereitung ebenfalls einfach durchführbar sein soll. In der Literatur wurden viele Maße vorgestellt. Je nach den Anforderungen entsprechen sie je einer subjektiven Bewertung, allerdings nicht zwingend der des Anwenders. Hierfür wird untersucht, wie mehrere Interessantheitsmaßen zu einem globalen Maß vereinigt werden können. Dies findet Regeln, welche mehrfach wichtig erschienen. Der Nutzer kann mit den Vorschlägen sein Suchziel eingrenzen. Ein zweiter Ansatz gruppiert Regeln. Dies erfolgt über die Häufigkeiten der Regelelemente, welche die Grundlage von Interessantheitsmaßen bilden. Die Regeln einer solchen Gruppe sind daher bezüglich vieler Interessantheitsmaßen ähnlich und können gemeinsam ausgewertet werden. Dies reduziert den manuellen Aufwand des Nutzers. Diese Arbeit zeigt Möglichkeiten, Assoziationsregelsuchen auf einen breiten Benutzerkreis zu erweitern und neue Anwender zu erreichen. Die Assoziationsregelsuche wird dabei derart vereinfacht, dass sie statt als Spezialanwendung als leicht nutzbares Werkzeug zur Datenanalyse verwendet werden kann
The importance of data mining is widely acknowledged today. Mining for association rules and frequent patterns is a central activity in data mining. Three main strategies are available for such mining: APRIORI , FP-tree-based approaches like FP-GROWTH, and algorithms based on vertical data structures and depth-first mining strategies like ECLAT and CHARM. Unfortunately, most of these algorithms are only moderately suitable for many “real-world” scenarios because their usability and the special characteristics of the data are two aspects of practical association rule mining that require further work. All mining strategies for frequent patterns use a parameter called minimum support to define a minimum occurrence frequency for searched patterns. This parameter cuts down the number of patterns searched to improve the relevance of the results. In complex business scenarios, it can be difficult and expensive to define a suitable value for the minimum support because it depends strongly on the particular datasets. Users are often unable to set this parameter for unknown datasets, and unsuitable minimum-support values can extract millions of frequent patterns and generate enormous runtimes. For this reason, it is not feasible to permit ad-hoc data mining by unskilled users. Such users do not have the knowledge and time to define suitable parameters by trial-and-error procedures. Discussions with users of SAP software have revealed great interest in the results of association-rule mining techniques, but most of these users are unable or unwilling to set very technical parameters. Given such user constraints, several studies have addressed the problem of replacing the minimum-support parameter with more intuitive top-n strategies. We have developed an adaptive mining algorithm to give untrained SAP users a tool to analyze their data easily without the need for elaborate data preparation and parameter determination. Previously implemented approaches of distributed frequent-pattern mining were expensive and time-consuming tasks for specialists. In contrast, we propose a method to accelerate and simplify the mining process by using top-n strategies and relaxing some requirements on the results, such as completeness. Unlike such data approximation techniques as sampling, our algorithm always returns exact frequency counts. The only drawback is that the result set may fail to include some of the patterns up to a specific frequency threshold. Another aspect of real-world datasets is the fact that they are often partitioned for shared-nothing architectures, following business-specific parameters like location, fiscal year, or branch office. Users may also want to conduct mining operations spanning data from different partners, even if the local data from the respective partners cannot be integrated at a single location for data security reasons or due to their large volume. Almost every data mining solution is constrained by the need to hide complexity. As far as possible, the solution should offer a simple user interface that hides technical aspects like data distribution and data preparation. Given that BW Accelerator users have such simplicity and distribution requirements, we have developed an adaptive mining algorithm to give unskilled users a tool to analyze their data easily, without the need for complex data preparation or consolidation. For example, Business Intelligence scenarios often partition large data volumes by fiscal year to enable efficient optimizations for the data used in actual workloads. For most mining queries, more than one data partition is of interest, and therefore, distribution handling that leaves the data unaffected is necessary. The algorithms presented in this paper have been developed to work with data stored in SAP BW. A salient feature of SAP BW Accelerator is that it is implemented as a distributed landscape that sits on top of a large number of shared-nothing blade servers. Its main task is to execute OLAP queries that require fast aggregation of many millions of rows of data. Therefore, the distribution of data over the dedicated storage is optimized for such workloads. Data mining scenarios use the same data from storage, but reporting takes precedence over data mining, and hence, the data cannot be redistributed without massive costs. Distribution by special data semantics or user-defined selections can produce many partitions and very different partition sizes. The handling of such real-world distributions for frequent-pattern mining is an important task, but it conflicts with the requirement of balanced partition

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Wang, Chao. "Exploiting non-redundant local patterns and probabilistic models for analyzing structured and semi-structured data." Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1199284713.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Singh, Shailendra. "Smart Meters Big Data : Behavioral Analytics via Incremental Data Mining and Visualization." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/35244.

Повний текст джерела

Анотація:

The big data framework applied to smart meters offers an exception platform for data-driven forecasting and decision making to achieve sustainable energy efficiency. Buying-in consumer confidence through respecting occupants' energy consumption behavior and preferences towards improved participation in various energy programs is imperative but difficult to obtain. The key elements for understanding and predicting household energy consumption are activities occupants perform, appliances and the times that appliances are used, and inter-appliance dependencies. This information can be extracted from the context rich big data from smart meters, although this is challenging because: (1) it is not trivial to mine complex interdependencies between appliances from multiple concurrent data streams; (2) it is difficult to derive accurate relationships between interval based events, where multiple appliance usage persist; (3) continuous generation of the energy consumption data can trigger changes in appliance associations with time and appliances. To overcome these challenges, we propose an unsupervised progressive incremental data mining technique using frequent pattern mining (appliance-appliance associations) and cluster analysis (appliance-time associations) coupled with a Bayesian network based prediction model. The proposed technique addresses the need to analyze temporal energy consumption patterns at the appliance level, which directly reflect consumers' behaviors and provide a basis for generalizing household energy models. Extensive experiments were performed on the model with real-world datasets and strong associations were discovered. The accuracy of the proposed model for predicting multiple appliances usage outperformed support vector machine during every stage while attaining accuracy of 81.65\%, 85.90\%, 89.58\% for 25\%, 50\% and 75\% of the training dataset size respectively. Moreover, accuracy results of 81.89\%, 75.88\%, 79.23\%, 74.74\%, and 72.81\% were obtained for short-term (hours), and long-term (day, week, month, and season) energy consumption forecasts, respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Almuhisen, Feda. "Leveraging formal concept analysis and pattern mining for moving object trajectory analysis." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0738/document.

Повний текст джерела

Анотація:

Cette thèse présente un cadre de travail d'analyse de trajectoires contenant une phase de prétraitement et un processus d’extraction de trajectoires d’objets mobiles. Le cadre offre des fonctions visuelles reflétant le comportement d'évolution des motifs de trajectoires. L'originalité de l’approche est d’allier extraction de motifs fréquents, extraction de motifs émergents et analyse formelle de concepts pour analyser les trajectoires. A partir des données de trajectoires, les méthodes proposées détectent et caractérisent les comportements d'évolution des motifs. Trois contributions sont proposées : Une méthode d'analyse des trajectoires, basée sur les concepts formels fréquents, est utilisée pour détecter les différents comportements d’évolution de trajectoires dans le temps. Ces comportements sont “latents”, "emerging", "decreasing", "lost" et "jumping". Ils caractérisent la dynamique de la mobilité par rapport à l'espace urbain et le temps. Les comportements détectés sont visualisés sur des cartes générées automatiquement à différents niveaux spatio-temporels pour affiner l'analyse de la mobilité dans une zone donnée de la ville. Une deuxième méthode basée sur l'extraction de concepts formels séquentiels fréquents a également été proposée pour exploiter la direction des mouvements dans la détection de l'évolution. Enfin, une méthode de prédiction basée sur les chaînes de Markov est présentée pour prévoir le comportement d’évolution dans la future période pour une région. Ces trois méthodes sont évaluées sur ensembles de données réelles . Les résultats expérimentaux obtenus sur ces données valident la pertinence de la proposition et l'utilité des cartes produites
This dissertation presents a trajectory analysis framework, which includes both a preprocessing phase and trajectory mining process. Furthermore, the framework offers visual functions that reflect trajectory patterns evolution behavior. The originality of the mining process is to leverage frequent emergent pattern mining and formal concept analysis for moving objects trajectories. These methods detect and characterize pattern evolution behaviors bound to time in trajectory data. Three contributions are proposed: (1) a method for analyzing trajectories based on frequent formal concepts is used to detect different trajectory patterns evolution over time. These behaviors are "latent", "emerging", "decreasing", "lost" and "jumping". They characterize the dynamics of mobility related to urban spaces and time. The detected behaviors are automatically visualized on generated maps with different spatio-temporal levels to refine the analysis of mobility in a given area of the city, (2) a second trajectory analysis framework that is based on sequential concept lattice extraction is also proposed to exploit the movement direction in the evolution detection process, and (3) prediction method based on Markov chain is presented to predict the evolution behavior in the future period for a region. These three methods are evaluated on two real-world datasets. The obtained experimental results from these data show the relevance of the proposal and the utility of the generated maps

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Lopez, Cueva Patricia. "Debugging Embedded Multimedia Application Execution Traces through Periodic Pattern Mining." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-01006213.

Повний текст джерела

Анотація:

La conception des systèmes multimédia embarqués présente de nombreux déﬁs comme la croissante complexité du logiciel et du matériel sous-jacent, ou les pressions liées aux délais de mise en marche. L'optimisation du processus de débogage et validation du logiciel peut aider à réduire sensiblement le temps de développement. Parmi les outils de débogage de systèmes embarqués, un puissant outil largement utilisé est l'analyse de traces d'exécution. Cependant, l'évolution des techniques de tra¸cage dans les systèmes embarqués se traduit par des traces d'exécution avec une grande quantité d'information, à tel point que leur analyse manuelle devient ingérable. Dans ce cas, les techniques de recherche de motifs peuvent aider en trouvant des motifs intéressants dans de grandes quantités d'information. Concrètement, dans cette thèse, nous nous intéressons à la découverte de comportements périodiques sur des applications multimédia. Donc, les contributions de cette thèse concernent l'analyse des traces d'exécution d'applications multimédia en utilisant des techniques de recherche de motifs périodiques fréquents. Concernant la recherche de motifs périodiques, nous proposons une déﬁnition de motif périodique adaptée aux caractéristiques de la programmation paralléle. Nous proposons ensuite une représentation condensée de l'ensemble de motifs périodiques fréquents, appelée Core Periodic Concepts (CPC), en adoptant une approche basée sur les relations triadiques. De plus, nous déﬁnissons quelques propriétés de connexion entre ces motifs, ce qui nous permet de mettre en oeuvre un algorithme efficace de recherche de CPC, appelé PerMiner. Pour montrer l'efficacité et le passage à l'échelle de PerMiner, nous réalisons une analyse rigoureuse qui montre que PerMiner est au moins deux ordres de grandeur plus rapide que l'état de l'art. En plus, nous réalisons un analyse de l'efficacité de PerMiner sur une trace d'exécution d'une application multimédia réelle en présentant l'accélération accompli par la version parallèle de l'algorithme. Concernant les systèmes embarqués, nous proposons un premier pas vers une méthodologie qui explique comment utiliser notre approche dans l'analyse de traces d'exécution d'applications multimédia. Avant d'appliquer la recherche de motifs fréquents, les traces d'exécution doivent ˆetre traitées, et pour cela nous proposons plusieurs techniques de pré-traitement des traces. En plus, pour le post-traitement des motifs périodiques, nous proposons deux outils : un outil qui trouve des pairs de motifs en compétition ; et un outil de visualisation de CPC, appelé CPCViewer. Finalement, nous montrons que notre approche peut aider dans le débogage des applications multimédia à travers deux études de cas sur des traces d'exécution d'applications multimédia réelles.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Almuhisen, Feda. "Leveraging formal concept analysis and pattern mining for moving object trajectory analysis." Electronic Thesis or Diss., Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0738.

Повний текст джерела

Анотація:

Cette thèse présente un cadre de travail d'analyse de trajectoires contenant une phase de prétraitement et un processus d’extraction de trajectoires d’objets mobiles. Le cadre offre des fonctions visuelles reflétant le comportement d'évolution des motifs de trajectoires. L'originalité de l’approche est d’allier extraction de motifs fréquents, extraction de motifs émergents et analyse formelle de concepts pour analyser les trajectoires. A partir des données de trajectoires, les méthodes proposées détectent et caractérisent les comportements d'évolution des motifs. Trois contributions sont proposées : Une méthode d'analyse des trajectoires, basée sur les concepts formels fréquents, est utilisée pour détecter les différents comportements d’évolution de trajectoires dans le temps. Ces comportements sont “latents”, "emerging", "decreasing", "lost" et "jumping". Ils caractérisent la dynamique de la mobilité par rapport à l'espace urbain et le temps. Les comportements détectés sont visualisés sur des cartes générées automatiquement à différents niveaux spatio-temporels pour affiner l'analyse de la mobilité dans une zone donnée de la ville. Une deuxième méthode basée sur l'extraction de concepts formels séquentiels fréquents a également été proposée pour exploiter la direction des mouvements dans la détection de l'évolution. Enfin, une méthode de prédiction basée sur les chaînes de Markov est présentée pour prévoir le comportement d’évolution dans la future période pour une région. Ces trois méthodes sont évaluées sur ensembles de données réelles . Les résultats expérimentaux obtenus sur ces données valident la pertinence de la proposition et l'utilité des cartes produites
This dissertation presents a trajectory analysis framework, which includes both a preprocessing phase and trajectory mining process. Furthermore, the framework offers visual functions that reflect trajectory patterns evolution behavior. The originality of the mining process is to leverage frequent emergent pattern mining and formal concept analysis for moving objects trajectories. These methods detect and characterize pattern evolution behaviors bound to time in trajectory data. Three contributions are proposed: (1) a method for analyzing trajectories based on frequent formal concepts is used to detect different trajectory patterns evolution over time. These behaviors are "latent", "emerging", "decreasing", "lost" and "jumping". They characterize the dynamics of mobility related to urban spaces and time. The detected behaviors are automatically visualized on generated maps with different spatio-temporal levels to refine the analysis of mobility in a given area of the city, (2) a second trajectory analysis framework that is based on sequential concept lattice extraction is also proposed to exploit the movement direction in the evolution detection process, and (3) prediction method based on Markov chain is presented to predict the evolution behavior in the future period for a region. These three methods are evaluated on two real-world datasets. The obtained experimental results from these data show the relevance of the proposal and the utility of the generated maps

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Giudice, Riccardo. "Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time: Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/9054/.

Повний текст джерела

Анотація:

Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time. Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Giavoli, Andrea. "Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time: Adattamento di un algoritmo di apprendimento automatico per la caratterizzazione e la ricerca di frequent patterns su macchine automatiche." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/9055/.

Повний текст джерела

Анотація:

La tesi da me svolta durante questi ultimi sei mesi è stata sviluppata presso i laboratori di ricerca di IMA S.p.a.. IMA (Industria Macchine Automatiche) è una azienda italiana che naque nel 1961 a Bologna ed oggi riveste il ruolo di leader mondiale nella produzione di macchine automatiche per il packaging di medicinali. Vorrei subito mettere in luce che in tale contesto applicativo l’utilizzo di algoritmi di data-mining risulta essere ostico a causa dei due ambienti in cui mi trovo. Il primo è quello delle macchine automatiche che operano con sistemi in tempo reale dato che non presentano a pieno le risorse di cui necessitano tali algoritmi. Il secondo è relativo alla produzione di farmaci in quanto vige una normativa internazionale molto restrittiva che impone il tracciamento di tutti gli eventi trascorsi durante l’impacchettamento ma che non permette la visione al mondo esterno di questi dati sensibili. Emerge immediatamente l’interesse nell’utilizzo di tali informazioni che potrebbero far affiorare degli eventi riconducibili a un problema della macchina o a un qualche tipo di errore al fine di migliorare l’efficacia e l’efficienza dei prodotti IMA. Lo sforzo maggiore per riuscire ad ideare una strategia applicativa è stata nella comprensione ed interpretazione dei messaggi relativi agli aspetti software. Essendo i dati molti, chiusi, e le macchine con scarse risorse per poter applicare a dovere gli algoritmi di data mining ho provveduto ad adottare diversi approcci in diversi contesti applicativi: • Sistema di identificazione automatica di errore al fine di aumentare di diminuire i tempi di correzione di essi. • Modifica di un algoritmo di letteratura per la caratterizzazione della macchina. La trattazione è così strutturata: • Capitolo 1: descrive la macchina automatica IMA Adapta della quale ci sono stati forniti i vari file di log. Essendo lei l’oggetto di analisi per questo lavoro verranno anche riportati quali sono i flussi di informazioni che essa genera. • Capitolo 2: verranno riportati degli screenshoot dei dati in mio possesso al fine di, tramite un’analisi esplorativa, interpretarli e produrre una formulazione di idee/proposte applicabili agli algoritmi di Machine Learning noti in letteratura. • Capitolo 3 (identificazione di errore): in questo capitolo vengono riportati i contesti applicativi da me progettati al fine di implementare una infrastruttura che possa soddisfare il requisito, titolo di questo capitolo. • Capitolo 4 (caratterizzazione della macchina): definirò l’algoritmo utilizzato, FP-Growth, e mostrerò le modifiche effettuate al fine di poterlo impiegare all’interno di macchine automatiche rispettando i limiti stringenti di: tempo di cpu, memoria, operazioni di I/O e soprattutto la non possibilità di aver a disposizione l’intero dataset ma solamente delle sottoporzioni. Inoltre verranno generati dei DataSet per il testing di dell’algoritmo FP-Growth modificato.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Shang, Xuequn. "SQL based frequent pattern mining." [S.l. : s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=975449176.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Yun, Unil. "New approaches to weighted frequent pattern mining." Texas A&M University, 2005. http://hdl.handle.net/1969.1/5003.

Повний текст джерела

Анотація:

Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Liu, Guimei. "Supporting efficient and scalable frequent pattern mining /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LIUG.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Jiang, Fan. "Frequent pattern mining of uncertain data streams." Springer-Verlag, 2011. http://hdl.handle.net/1993/5233.

Повний текст джерела

Анотація:

When dealing with uncertain data, users may not be certain about the presence of an item in the database. For example, due to inherent instrumental imprecision or errors, data collected by sensors are usually uncertain. In various real-life applications, uncertain databases are not necessarily static, new data may come continuously and at a rapid rate. These uncertain data can come in batches, which forms a data stream. To discover useful knowledge in the form of frequent patterns from streams of uncertain data, algorithms have been developed to use the sliding window model for processing and mining data streams. However, for some applications, the landmark window model and the time-fading model are more appropriate. In this M.Sc. thesis, I propose tree-based algorithms that use the landmark window model or the time-fading model to mine frequent patterns from streams of uncertain data. Experimental results show the effectiveness of our algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Vu, Lan. "High performance methods for frequent pattern mining." Thesis, University of Colorado at Denver, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3667246.

Повний текст джерела

Анотація:

Current Big Data era is generating tremendous amount of data in most fields such as business, social media, engineering, and medicine. The demand to process and handle the resulting "big data" has led to the need for fast data mining methods to develop powerful and versatile analysis tools that can turn data into useful knowledge. Frequent pattern mining (FPM) is an important task in data mining with numerous applications such as recommendation systems, consumer market analysis, web mining, network intrusion detection, etc. We develop efficient high performance FPM methods for large-scale databases on different computing platforms, including personal computers (PCs), multi-core multi-socket servers, clusters and graphics processing units (GPUs). At the core of our research is a novel self-adaptive approach that performs efficiently and fast on both sparse and dense databases, and outperforms its sequential counterparts. This approach applies multiple mining strategies and dynamically switches among them based on the data characteristics detected at runtime. The research results include two sequential FPM methods (i.e. FEM and DFEM) and three parallel ones (i.e. ShaFEM, SDFEM and CGMM). These methods are applicable to develop powerful and scalable mining tools for big data analysis. We have tested, analysed and demonstrated their efficacy on selecting representative real databases publicly available at Frequent Itemset Mining Implementations Repository.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Tseng, Fan-Chen, and 曾繁鎮. "Mining Frequent Patterns with the Frequent Pattern List." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/45333441171833013337.

Повний текст джерела

Анотація:

博士
國立臺灣大學
資訊工程學研究所
90
The mining of frequent patterns is an essential and time-consuming step in many tasks of data mining. Therefore, algorithms for efficient mining of frequent patterns are in urgent demand. In its original definition, a frequent pattern is a set of items (called itemset) whose occurrence (called support or frequent) in the database exceeds some user-defined threshold. However, the mining of the complete set of frequent itemsets (denoted as FIS) often results in a huge solution space, and the effectiveness of the association rules derived from them will be decreased. To solve this problem, some researchers suggested mining only the complete set of frequent closed itemsets (denoted as FCIS), which is a small yet representative portion of FIS. There have been quite a few methods developed for mining FIS and FCIS, with strengths and weaknesses in various situations. Nevertheless, for databases containing many long patterns, it is still prohibitive to enumerate all the frequent closed itemsets. In these situations, one can only generate the complete set of maximal frequent itemsets (denoted as MFIS), which is a subset of FCIS. In this dissertation, we propose the Frequent Pattern List (FPL) as an efficient data structure for mining frequent patterns. We define the FPL and explore its properties. Algorithm for constructing FPL is given. We then apply FPL to the mining of frequent patterns. Correctness of our method is proven, and performance of our method is thoroughly evaluated. Besides, in real applications, the transactional databases are abundant with duplicated transactions, or data redundancies, that could be eliminated. In view of this, we refine FPL to a more compact data structure, called the Transaction Pattern List (TPL), for eliminating data redundancies and thus improving both space and time efficiency in mining frequent patterns. For the mining of FCIS and MFIS, the search space can be pruned to accelerate the mining process. This, however, requires efficiently accessing relevant itemsets for superset checking. We therefore propose three-dimensional indexing (3D indexing) for indexing the solution space and for selecting itemsets to be involved for checking. Moreover, for application in large databases, we use the partition-based nature of FPL and TPL to develop a partition-based approach for mining frequent patterns. Experimental results show that our methods outperform other previously proposed ones, and confirm the power of TPL in eliminating data redundancies and the effectiveness of partition-based approach in mining patterns in large databases.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Chiu, Chui-huang, and 邱垂煌. "Mining Discriminative Frequent Patterns." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/92292055003596097985.

Повний текст джерела

Анотація:

碩士
世新大學
資訊管理學研究所(含碩專班)
98
The rapid development of information technology enables companies to accumulate a large amount of data in short time. Many enterprises nowadays pursue finding ways to utilize data and convert them to information or knowledge. As for retailing sales, huge amount of operational data is accumulated everyday, therefore these sales records are good sources to analyse by kinds of data mining technology to discover useful information. In this study, our goal is to mine discriminative patterns in transaction databases. A discriminative pattern is defined as a combination of product items, so that people buying such discrimination pattern would have quite different buying behaviors from those not buying. Namely, discriminative patterns are important to identify customers belonging to different segmentations. Therefore, store managers can utilize such information to configure their promotion activities. To mine discriminative patterns from transaction sales, we propose the RDP (recommended discriminative pattern) method to recommend potential discriminative patterns. Several experiments based on the data generated from IBM synthetic data generator are done to testify the proposed new method. The experimental results show that the RDP method can recommend discriminative patterns both effectively and efficiently.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Chen, Yi-An, and 陳怡安. "Mining Frequent Trajectory Patterns." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/29743026116836899524.

Повний текст джерела

Анотація:

博士
國立臺灣大學
資訊管理學研究所
99
In this dissertation, we propose three algorithms, GBM, FTM and LTM, for mining trajectory patterns. GBM focuses on finding frequent trajectory patterns consisting of consecutively adjacent points, where the time spent between two consecutive points in a frequent trajectory pattern is represented by a timespan. FTM mines frequent flexible trajectory patterns, where the consecutive points in a flexible pattern are not necessarily adjacent and the time spent between two consecutive points is denoted by a time interval. Although representing a trajectory pattern by a sequence of points is ideal to reduce the effect of noises and ease the mining process, these approaches may lead to generating long patterns and requiring a tremendous amount of mining time. Therefore, LTM models trajectories and patterns as consecutive line segments rather than discrete points so that the memory consumption, the lengths and number of frequent patterns can be effectively reduced. All these three algorithms mine frequent patterns in a depth-first search (DFS) manner. GBM utilizes the adjacency property to effectively reduce the search space, while FTM employs frequent edges to prune unnecessary patterns. LTM uses two pruning strategies, CU-Bound and FU-Bound, to speed up the mining process. Extensive experiments are conducted to evaluate the performance of GBM, FTM and LTM. The experimental results show that GBM significantly outperforms Apriori-G and PrefixSpan-G. FTM also gains considerable improvement in efficiency in comparison to Apriori-F and PrefixSpan-F. LTM effectively speeds up the mining process by using both CU-Bound and FU-Bound pruning strategies.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Liu, Yu-mei, and 劉佑玫. "Mining of Frequent Subgraph Patterns." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/2mfm6y.

Повний текст джерела

Анотація:

碩士
國立臺灣科技大學
資訊管理系
100
In recent years, data mining has been extensively applied to different domains. How to find frequent patterns efficiently in large data sets is a popular research topic in the data mining community. In addition, the power of using graphs to model complex data sets has been recognized based on the researches in the past. Therefore, using graphs to represent data and developing the data mining technique has turned into a main trend. The purpose of graph mining is to find frequent subgraphs from graph data sets. In other words, it is to discover all the structures whose occurrence frequency is no less than a user-specified threshold. Data mining technique using graphs to represent data can find more complicated relations or structures among data. For this reason, graph mining has been widely and extensively used in many domains like chemistry, biology and computer networks, etc. The main challenge in graph mining is how to solve the graph/ subgraph isomorphism testing problems. This research proposes the algorithm MFG（Mining of Frequent subGraph Patterns）, which combined many previous graph mining techniques to mine all frequent subgraph patterns efficiently. MFG uses global orders of vertices in frequent patterns to reduce part of the duplicate enumeration, and utilizes an effective embedded mapping structure to store the information of subgraph patterns, so it can avoid the subgraph isomorphism checking problem completely. Finally, MFG adopts graph signatures and canonical form to solve the graph isomorphism testing problem.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Yu, Tsui-Fen, and 余翠芬. "Incremental Mining of Frequent Subgraph Patterns." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/56099182798144564817.

Повний текст джерела

Анотація:

碩士
國立臺灣科技大學
資訊管理系
93
How to find frequent patterns efficiently in large data sets is an active research topic in the data mining community. In recent years, mining frequent patterns has been extensively applied to different domains. In the past, mining frequent patterns was focused on itemsets and path patterns. Nevertheless, with the increasing domains applied by the data mining technique and more complex data sets in new domains, the traditional mining technique of the frequent patterns will be not enough. Therefore, it has turned into a main trend to develop an efficient and suitable data mining technique. The power of graphs to model complex data sets has been recognized according to the researches in the past. For this reason, the data mining technique using graphs to represent data can apply not only in complex data sets but also in traditional data sets. As we know so far, mining frequent subgraph patterns algorithms have already been developed and designed for static datasets. Once the dataset or the thresholds that users assign have been changed, it must re-mine the whole dataset. When facing the high complexity of subgraph patterns, it will burden the data mining. This thesis proposes the algorithm IMFG（Incremental Mining of Frequent subGraph Patterns）, which uses a different data structure to store the dataset information of graphs and removes the defect of algorithm that only can deal with static datasets in the past. When the graph dataset or threshold is changed, we can obtain the new information we want without re-mining the whole dataset.

Стилі APA, Harvard, Vancouver, ISO та ін.

30

蘇仁鑫. "Mining frequent patterns by transaction tree." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/79464139456448545300.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Huang, Ko-Wei, and 黃科瑋. "Mining Frequent Patterns with Heterogeneous Constraints." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/61039732706261007215.

Повний текст джерела

Анотація:

碩士
國立高雄大學
電機工程學系碩士班
96
Recently, the topic of constraint-based association mining has received increasing attention within the data mining research community. By allowing more user-specified constraints other than traditional rule measurements, e.g., minimum support and confidence, research work on this topic endeavor to reflect real interest of analysts and relief them from the overabundance of rules, and ultimately, fulfill an interactive environment for association analysis. So far most work on constraint-based frequent patterns (itemsets) mining has been single-constraint oriented, i.e., only one specific type of constraint is considered. Surprisingly little research has been conducted to deal with multiple types of constraints. This thesis is an investigation on this problem. Specifically, three different types of constraints are considered, including item constraint, aggregation constraint, and cardinality constraint. We propose two efficient algorithms, MCApriori and MCFPTree, to accomplish the task of discovering frequent patterns (itemsets) that satisfy all three types of constraints. Experimental results show that our algorithms are significantly faster than the intuitive approach, i.e., post processing the frequent patterns generated by leading algorithms, such as Apriori and FP-Growth, against user-specified constraints.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Huang, Yu-Chun, and 黃郁君. "Mining Frequent Spatial Co-relation Patterns." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/78069947906207351654.

Повний текст джерела

Анотація:

碩士
國立政治大學
資訊科學學系
92
With the growth of data, a variety of databases are applied in many applications. Spatial data mining is an example, and it discovers patterns or spatial relations from large spatial databases. Spatial data mining is the process of discovering interesting and previously unknown, but potential useful patterns or spatial relations from large spatial databases. In this thesis, we explore the problem of spatial sequential pattern mining. The two issues spatial co-relation patterns and approximate spatial co-relation patterns will be discussed. We utilize Apriori-based method and depth-first based method to solve the problem of spatial co-relation patterns. About approximate co-relation spatial patterns, we propose two algorithms, named AP-mine and AS-mine. In AP-mine, we propose a data structure, named AP-tree, to efficient mining the approximate spatial co-relation patterns. Lastly, We also perform the experiments to evaluate our spatial co-relation pattern mining algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Jung, Lo Mei, and 羅美榮. "Efficient Algorithms for Mining Frequent Patterns." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/76819700828878428867.

Повний текст джерела

Анотація:

碩士
大葉大學
電機工程研究所
90
Data mining is a very important database research issue. Especially, the generation of frequent patterns in large database has been widely studied. Most of the studies take the Apriori-based approach, which has great effort in the generation of candidate frequent patterns and needs multiple database accesses. The FP-tree-based approaches have been proposed to avoid the generation of candidate set and scan transaction database only twice, but they work with more complicated data structure. Recently, a Frequent Pattern List (FPL) algorithm, using a simple linear list to store all transactions, was proposed to improve the FP-tree algorithm. However, FPL algorithm still needs to scan database twice. In this paper, an efficient frequent pattern generation algorithm, called FPLI, was proposed to improve the FPL algorithm. FPLI scans the database only once and uses a simple linear list to store all transactions like FPL. By performing simple operations on the list, we can discover the frequent patterns quickly. It is also not necessary for FPLI to rescan database and reconstruct data structure when transaction database is updated or minimum support is varied. Experimental results show that the FPLI algorithm has much better performance than the FPL algorithm.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Kun-Ta, Chuang. "On Feasibility-Oriented Mining of Frequent Patterns." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2007200609451500.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Chuang, Kun-Ta, and 莊坤達. "On Feasibility-Oriented Mining of Frequent Patterns." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/11475725010556815161.

Повний текст джерела

Анотація:

博士
國立臺灣大學
電信工程學研究所
94
Since the early work in algorithm Apriori, a broad spectrum of topics in mining frequent patterns has been studied. While those proposed techniques are important results toward the integration of mining association mining and other real-life requirements, how to provide feasibility-oriented models for mining frequent patterns, to enable easy-use, low-cost, high-efficiency, and realistic mining applications, still remains as a challenging issue. In view of this, we explore in this dissertation a novel algorithm of mining top-k (closed) itemsets in the presence of the memory constraint. As opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. While complying with the requirement of the memory constraint, two efficient algorithms, called MTK and MTK_Close, were thus devised for mining frequent itemsets and closed itemsets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. Furthermore, a sampling model, called feature preserved sampling (FPS) that sequentially generates a high-quality sample over sliding windows, is developed. The sampling quality we consider refers to the degree of consistency between the sample proportion and the population proportion of each attribute value in a window. FPS has several advantages: (1) it sequentially generates a sample from a time-variant data source over sliding windows; (2) the execution time of FPS is linear with respect to the database size; (3) the relative proportional differences between the sample proportions and population proportions of most distinct attribute values are guaranteed to be below a specified error threshold, ε, while the relative proportion differences of the remaining attribute values are as close to ε as possible, which ensures that the generated sample is of high-quality; (4) the sample rate is close to the user specified rate so that a high-quality sampling result can be obtained without increasing the sample size; (5) FPS can excellently preserve the population proportion of multivariate statistics in the sample; and (6) FPS can be applied to infinite streams and finite datasets equally, and the generated samples can be used for various applications. We next investigate an important characteristic in real datasets, named the itemset support distribution, to provide better understanding on real datasets. The itemset support distribution refers to the distribution of the count of itemsets versus the itemset support. Importantly, from observations on various retail datasets and as validated by our empirical studies later, we find that the power-law relationship indeed appears in the itemset support distribution and we can characterize that as a Zipf distribution. Since it is prohibitively expensive to retrieve lots of itemsets before we identify the characteristics of the itemset support distribution in targeted data, we also propose a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the itemset support distribution. Furthermore, to fully explore the advantages of our discovery, we also propose novel mechanisms with the help of PPL to solve two important problems: (1) determining a subtle parameter for mining approximate frequent itemsets over data streams; and (2) determining the sufficient sample size for mining frequent patterns. In this dissertation, we also attempt to answer an important question: "What patterns will be frequent in the future?" Such a kind of patterns, referred to as prospective frequent patterns, is very informative to end-users, because many cross-selling strategies in real cases rely on the precise prediction of frequent patterns that will appear. Since any naive extension of previous works cannot effectively obtain the desired result, we proposed the framework of PFP, to precisely predict prospective frequent patterns while also predicting their supports.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Chen, Chun-Hung, and 陳春宏. "Mining Frequent Patterns in 9DLT Video Databases." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/66720803474083859747.

Повний текст джерела

Анотація:

碩士
國立臺灣大學
資訊管理學研究所
96
Multimedia database systems are becoming increasingly popular owing to the widespread use of audio-video equipment, digital cameras, CD-ROMs, and the Internet. Therefore, mining frequent patterns from video databases has attracted increasing attention in recent years. In this thesis, we proposed a novel algorithm, FVP-Miner (Frequent Video Pattern Miner), to mine frequent patterns in a video database. Our proposed algorithm consists of two phases. First, we transform every video into 9DLT strings. Second, we find all frequent image 2-patterns from the database and then recursively mine the frequent patterns in the spatial and temporal dimension. We employ three pruning strategies to prune many impossible candidates, and the concept of projected database to localize the support counting, pattern joining, and candidate pruning on the projected database. Therefore, our proposed algorithm can efficiently mine the frequent patterns in a video database. The experiment results show that our proposed method is efficient and scalable, and outperforms the modified Apriori algorithm in several orders of magnitude.

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Chen, Shih-Sheng, and 陳仕昇. "The Research of Mining Frequent Sequential Patterns." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/00571165628123445673.

Повний текст джерела

Анотація:

博士
國立中央大學
資訊管理研究所
91
Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively. Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns. Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns. The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Chun-Hung, Chen. "Mining Frequent Patterns in 9DLT Video Databases." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2507200819465400.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Vimieiro, Renato. "Mining disjunctive patterns in biomedical data sets." Thesis, 2012. http://hdl.handle.net/1959.13/936341.

Повний текст джерела

Анотація:

Research Doctorate - Doctor of Philosophy (PhD)
Frequent itemset mining is one of the most studied problems in data mining. Since Agrawal et al. (1993) introduced the problem, several advances both theoretical and practical have been achieved. In spite of that, there are still many unresolved issues to be tackled before frequent pattern mining can be claimed a cornerstone approach in data mining (Han et al., 2007). Here, we investigate issues related to: (1) the (un)suitability of frequent itemset mining algorithms to identify patterns in biomedical data sets; and (2) the limited expressiveness of such patterns, since, in its vast majority, frequent itemsets are exclusively conjunctions. Our ultimate goal in this thesis is to improve methods for frequent pattern mining in such a way that they provide alternative insightful solutions for mining biomedical data sets. Specifically, we provide eficient tools for mining disjunctive patterns in biomedical data sets. We tackle the problem of mining disjunctive patterns through three different fronts: (1) disjunctive minimal generators; (2) disjunctive closed patterns; and (3) quasi-CNF emerging patterns. We then propose three different algorithms, one for each task above: TitanicOR, Disclosed, and QCEP. While the first two aim for more descriptive patterns, the third is a more predictive. These algorithms are proposed as an attempt to cover different sources of data sets coming from biomedical researches. TitanicOR is more suitable to identify patterns in data sets containing physiological, biochemical, or medical record information. Disclosed was designed to exploit the characteristics of microarray gene expression data sets, which usually contains many features, but only few samples. Finally, QCEP is the only algorithm to consider data sets with class label information. We conducted experiments with both synthetic and real world data sets to assess the performance of our algorithms. Our experiments show that our algorithms overcame the state of the art algorithms in each of those categories of patterns.

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Teng, Wei-Guang. "Mining of Frequent Temporal Patterns on Data Streams." 2004. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-3007200403225800.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Hayduk, Yaroslav. "Mining frequent patterns from uncertain data with MapReduce." 2012. http://hdl.handle.net/1993/5250.

Повний текст джерела

Анотація:

Frequent pattern mining from uncertain data allows data analysts to mine frequent patterns from probabilistic databases, within which each item is associated with an existential probability representing the likelihood of the presence of the item in the transaction. When compared with precise data, the solution space for mining uncertain data is often much larger due to the probabilistic nature of uncertain databases. Thus, uncertain data mining algorithms usually take substantially more time to execute. Recent studies show that the MapReduce programming model yields significant performance gains for data mining algorithms, which can be mapped to the map and reduce execution phases of MapReduce. An attractive feature of MapReduce is fault-tolerance, which permits detecting and restarting failed jobs on working machines. In this M.Sc. thesis, I explore the feasibility of applying MapReduce to frequent pattern mining of uncertain data. Specifically, I propose two algorithms for mining frequent patterns from uncertain data with MapReduce.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Ip, Weng Chong, and 葉榮忠. "Mining Frequent Trajectory Patterns in Spatial-temporal Databases." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/91535254525268212114.

Повний текст джерела

Анотація:

碩士
國立臺灣大學
資訊管理學研究所
95
With advances in tracking technologies and great diffusion of location-based services, a large amount of data has been collected in a spatial-temporal database. The implicit knowledge in a spatial-temporal database can be used in many application areas and mining frequent trajectories in the spatial-temporal database can help us understand the movements of objects. Therefore, in this thesis, we propose a novel algorithm to mine the frequent trajectory patterns in a spatial-temporal database. Our proposed method consists of two phases. First, we transform all trajectories in the database into a mapping graph. For each vertex in the mapping graph, we record the information of the trajectories passing through the vertex in a data structure, called Trajectories Information lists (TI-lists). Second, we mine all frequent patterns from the mapping graph and TI-lists in a depth-first search manner. Our proposed method doesn’t generate unnecessary candidates, needs fewer database scans, and utilizes the consecutive property of trajectories to reduce the search space. Therefore, our proposed method is more efficient than the PrefixSpan-based method. The experiment results show that our proposed method outperforms PrefixSpan-based method by one order of magnitude in synthetic data and real data.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

許士俊. "Mining Frequent Tree-like Patterns in Large Datasets." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/12047818237781806568.

Повний текст джерела

Анотація:

碩士
長榮大學
經營管理研究所
92
Frequent sequential pattern mining is an important domain for data mining. In this thesis, we present a new data mining scheme to explore the hierarchical structure like tree represented the relationship of each item of sequences, named tree-like patterns. By tree-like patterns, we clear to find out the relation of items between the cause and effect. For counting support value, we propose a scheme which counts support of tree-like patterns by queue structure and efficient count the support values. We also present an efficient scheme to count the frequency of tree-like patterns on a sequence by dynamic programming. We could understand the importance of tree-like patterns on a sequence. In addition, we present two formulas to compute the significance of sequences, which describes the couple degree of items and tree-like patterns on a sequence. The higher value of significance means tree-like patterns with tighter couple on a sequence. Final, we compare characters of different patterns with ours. We have more characters and applications widely.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Hong, Ruey-Wen, and 洪瑞文. "Mining Frequent Patterns in Image and Video Databases." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/41015641340616459030.

Повний текст джерела

Анотація:

博士
國立臺灣大學
資訊管理學研究所
97
Because of fast growth in the volume of image and video data, how to get useful information from image and video databases has attracted more and more attention in recent years. In this dissertation, we propose three algorithms, 9DLT-Miner, 2DZ-Miner, and 3DZ-Closed algorithms. The 9DLT-Miner algorithm is to find the frequent spatial patterns in 9DLT image databases. The 2DZ-Miner algorithm is to find the frequent spatial patterns in 2DZ image databases. The 3DZ-Closed algorithm is to find the frequent closed spatial-temporal patterns in 3DZ video databases. In the 9DLT-Miner and 2DZ-Miner algorithms, in addition to using the anti-monotone pruning strategy to prune impossible candidate patterns, we utilize the characteristics of the 9DLT and 2DZ-string representations to design the relation inference matrices respectively. By using the inference matrices, we prune most impossible candidate patterns. The 3DZ-Closed algorithm uses the pattern index and pattern index tree to mine all frequent closed spatial-temporal patterns in 3DZ video databases. In the 3DZ-Closed algorithm, we not only use the 2DZ relation inference matrix to prune impossible candidate patterns, we also propose a “one-level-ahead checking” pruning strategy, which can mark the non-expandable nodes in the pattern index tree. Therefore, the 3DZ-Closed algorithm can effectively prune the unnecessary branch nodes in the pattern index tree and avoid the costly candidate generation. The experimental results show that the 9DLT-Miner, 2DZ-Miner and 3DZ-Closed algorithms outperform the Apriori-like algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Teng, Wei-Guang, and 鄧維光. "Mining of Frequent Temporal Patterns on Data Streams." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/15580676667580181508.

Повний текст джерела

Анотація:

博士
國立臺灣大學
電機工程學研究所
92
In recent years, several query problems and mining capabilities have been explored for a data stream environment. Among various data mining capabilities, the one receiving a significant amount of research attention is on mining frequent patterns over market basket data. In this dissertation, we first explore the model of frequent itemsets from static transaction databases and generalize relevant concepts to discovering of temporal relationship from online transaction flows. Then, we investigate the resource utilization issues in a data stream environment. Finally, we study the problem of quality guarantees when tracking online data streams. For the problem of mining frequent itemsets to derive association rules, a new mining capability, called mining of substitution rules, is first developed by extending the concepts of mining of association rules. Substitution refers to the choice made by a customer to replace the purchase of some items with that of others. The discovery of substitution rules, same as that of association rules, will lead to very valuable knowledge in various aspects, including market prediction, user behavior analysis and decision support. Specifically, we first derive theoretical properties for the model of substitution rule mining and devise a technique on the induction of positive itemset supports to improve the eﬃciency of support counting for negative itemsets. Then, in light of these properties, algorithm SRM (standing for substitution rule mining) is designed and implemented to discover the substitution rules eﬃciently while attaining good statistical significance. To mine frequent temporal patterns on data streams, a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams) is devised. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regression-based compact pattern representation. To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. The fundamental problem that how the limited resources, e.g., memory space and computation power, can be well utilized to produce accurate estimates in a data stream environment is also addressed. Two important features for tracking mined patterns with properly utilized resources are examined. The first issue is temporal granularity which refers to the phenomenon that as time advances, people are more interested in recent events, meaning that more resources can be utilized to explore more recent data with finer granularities. Second, with the mining task of discovering frequent temporal patterns, more resources are expected to be allocated to the processing of those borderline patterns whose statistics, e.g., occurrence frequencies, are close to the specified threshold so as to have proper frequent itemset identification. This feature is called mining with support count granularity. Consequently, a wavelet-based algorithm, called algorithm RAM-DS (Resource-Aware Mining for Data Streams) is devised to perform general pattern mining tasks for data streams by exploring both temporal and support count granularities. Algorithm RAM-DS is designed to not only reduce the memory required for data storage but also retain good approximation of target time series. In addition, algorithm RAM-DS can support a varying number of data streams by allocating memory space adaptively when tracking patterns generated from online transactions. For tracking online time series data which is directly collected from sensors or is generated by stream mining algorithms, we explore the energy preservation property of wavelet-based transform. The commonly used L1- and L2-error metrics are theoretically guaranteed when insignificant coefficients are discarded for saving precious resources in our framework. In addition, to handle infinite online data flows, an enhanced data structure RAID-tree which is based on the error tree is proposed for dynamic synopses maintenance over data streams. Specifically, an algorithm RAID with the resolution adaptability for incremental decomposition is developed. Experimental results have shown that the memory required for storing significant features of time series data is very small and the quality of approximation is stable when performing incremental data updates.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Wang, Sheng-Shun, and 王聖舜. "Mining Fault-Tolerant Frequent Patterns in Large Databases." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/84272332857502794908.

Повний текст джерела

Анотація:

碩士
國立交通大學
資訊工程系
90
In view of real world data may be interfered with noise which leads data to contain faults. The data mining methods proposed previously may not be applicable. Besides, we may hope that the knowledge discovered is more general and can be applied to find more interesting information. Hence, FT-Aprori was proposed for fault-tolerant data mining to discover information over large real-world data. However, FT-Apriori which generates and tests candidates based on Apriori property is not so efficient. In this paper, we develop memory-based algorithm FTP-mine which is based on the concept of pattern growth to mine fault-tolerant frequent patterns efficiently. In FTP-mine, the table, STable, is designed to count the item support and FT-support of the k-length patterns which have the same prefix of length k-1 by comparing transaction once. As to mining in a large database which is too large to fit in memory, FTP-mine also can be adopted by means of database partition. In addition, since there might exist a large number of fault tolerant frequent patterns and some may be contained in others, we also focus on the finding of maximal FT-frequent patterns by extending the FTP-mine algorithm. Our study shows that FTP-mine has higher performance than FT-Apriori in all kinds of parameter settings, such as various supports, tolerance, and scalability. The empirical evaluations show that the proposed method has good linear scalability and outperforms FT-Apriori in the discovery of FT-frequent pattern.

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Hsia, Liou Ming, and 劉明霞. "Algorithms and Applications for Mining Frequent Price Patterns." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/31660522086612595806.

Повний текст джерела

Анотація:

碩士
南台科技大學
資訊管理系
94
In our real life, the sale volume of the supermarkets and the department stores is often varied with the seasonal variation. To attract consumers and get better sale volume, the proprietors of supermarkets or department stores often hold all kinds of the sale promotions which offer customers good discount of merchandise. Therefore, price is one of the important factors which will affect the behaviors of the customers during these promotions. The traditional association mining stressed on the relations between merchandise and comprehended. The purpose of association mining is to know what merchandise was often bought together by customers. For example, the diapers and the beers may be bought together. However, it couldn’t show that in what discount when the merchandise be bought. If we used the traditional association mining, we couldn’t know the potential relation between merchandise and discounts. Maybe association rules implied the relationship between the merchandise and discount and this relationship possibly increase sales volume. For a long time, the price has been one of the main factors which will affect consumers’ behavior. In this paper, we propose an algorithm EFI_FP (An Efficient Approach For Filtering Frequent Price Itemsets). The properties of EFI_FP algorithm are as following: first, the algorithm uses two phrase filtration mechanisms to reduce a huge number of infrequent itemsets; second, to mine frequent patterns which are the combination of merchandises for sale at the different level of price; third, when the specified merchandise is on sale, it can make sure that the other merchandises can be bought at the same time. Because it has level relationship among the merchandises, we propose another algorithm MLEFI_FP (Multiple Level Efficient Apporach for Filtering Frequent Price Itemsets). MLEFI_FP algorithm is extended from EFI_FP algorithm. It can mine the multi-level price patterns which own the multi-level relationships among merchandises during these promotions. The addition of relationship makes the mining association rules are more significant.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Ip, Weng Chong. "Mining Frequent Trajectory Patterns in Spatial-temporal Databases." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-1607200715350100.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

49

邱紹禎. "A tree-projection pattern growing method for mining frequent XML query patterns." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/11776357668944572557.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Lien, Yu-Chieh, and 連育傑. "An Efficient Algorithm for Incremental Mining of Frequent Patterns." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/8pt2cy.

Повний текст джерела

Анотація:

碩士
銘傳大學
資訊工程學系碩士班
96
Traditional association rule mining is to find association rules in a given transaction database. However in many real applications, there are a lot of new data generated continuously and the user wants to get the new association rules in the updated transaction database. Hence how to find association rules efficiently under the circumstances where data would be added continuously becomes an important research issue for a practical purpose. It is called incremental mining of association rules or mining association rules in data streams. Many incremental mining algorithms and data streams mining algorithms proposed recently adopt the tree-based structure. It constructs a tree structure to store transactions in memory and then uses an algorithm similar to FP growth to find frequent itemsets from the tree structure. When the incremental transactions arrive and we want to find frequent itemsets in the updated transaction database, it would add the incremental transactions to the tree structure constructed previously and then uses the algorithm similar to FP growth to mine frequent itemsets from the updated tree structure again. However the size of the original transaction database is often much larger than that of incremental transactions. So it would be inefficient. In the paper, we propose an efficient algorithm for incremental mining of frequent patterns. When mining from the updated transaction database, it mines frequent itemsets from the incremental transactions and then integrates the mining result with the previous result to get the new frequent itemsets in the updated transaction database. Finally, we can use these new frequent itemsets to generate new association rules. Experiment results show our approach is more efficient.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!