Dissertations / Theses: 'Rule mining'

1

Wong, Wai-kit. "Security in association rule mining." Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/HKUTO/record/B39558903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wong, Wai-kit, and 王偉傑. "Security in association rule mining." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B39558903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Vithal, Kadam Omkar. "Novel applications of Association Rule Mining- Data Stream Mining." AUT University, 2009. http://hdl.handle.net/10292/826.

Full text

Abstract:

From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhang, Ya Klein Cerry M. "Association rule mining in cooperative research." Diss., Columbia, Mo. : University of Missouri--Columbia, 2009. http://hdl.handle.net/10355/6540.

Full text

Abstract:

The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Title from PDF of title page (University of Missouri--Columbia, viewed January 26, 2010). Thesis advisor: Dr. Cerry M. Klein. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

5

Icev, Aleksandar. "DARM distance-based association rule mining." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0506103-132405.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

HajYasien, Ahmed. "Preserving Privacy in Association Rule Mining." Thesis, Griffith University, 2007. http://hdl.handle.net/10072/365286.

Full text

Abstract:

With the development and penetration of data mining within different fields and disciplines, security and privacy concerns have emerged. Data mining technology which reveals patterns in large databases could compromise the information that an individual or an organization regards as private. The aim of privacy-preserving data mining is to find the right balance between maximizing analysis results (that are useful for the common good) and keeping the inferences that disclose private information about organizations or individuals at a minimum. In this thesis we present a new classification for privacy preserving data mining problems, we propose a new heuristic algorithm called the QIBC algorithm that improves the privacy of sensitive knowledge (as itemsets) by blocking more inference channels. We demonstrate the efficiency of the algorithm, we propose two techniques (item count and increasing cardinality) based on item-restriction that hide sensitive itemsets (and we perform experiments to compare the two techniques), we propose an efficient protocol that allows parties to share data in a private way with no restrictions and without loss of accuracy (and we demonstrate the efficiency of the protocol), and we review the literature of software engineering related to the associationrule mining domain and we suggest a list of considerations to achieve better privacy on software.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Faculty of Engineering and Information Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

7

Rahman, Sardar Muhammad Monzurur, and mrahman99@yahoo com. "Data Mining Using Neural Networks." RMIT University. Electrical & Computer Engineering, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080813.094814.

Full text

Abstract:

Data mining is about the search for relationships and global patterns in large databases that are increasing in size. Data mining is beneficial for anyone who has a huge amount of data, for example, customer and business data, transaction, marketing, financial, manufacturing and web data etc. The results of data mining are also referred to as knowledge in the form of rules, regularities and constraints. Rule mining is one of the popular data mining methods since rules provide concise statements of potentially important information that is easily understood by end users and also actionable patterns. At present rule mining has received a good deal of attention and enthusiasm from data mining researchers since rule mining is capable of solving many data mining problems such as classification, association, customer profiling, summarization, segmentation and many others. This thesis makes several contributions by proposing rule mining methods using genetic algorithms and neural networks. The thesis first proposes rule mining methods using a genetic algorithm. These methods are based on an integrated framework but capable of mining three major classes of rules. Moreover, the rule mining processes in these methods are controlled by tuning of two data mining measures such as support and confidence. The thesis shows how to build data mining predictive models using the resultant rules of the proposed methods. Another key contribution of the thesis is the proposal of rule mining methods using supervised neural networks. The thesis mathematically analyses the Widrow-Hoff learning algorithm of a single-layered neural network, which results in a foundation for rule mining algorithms using single-layered neural networks. Three rule mining algorithms using single-layered neural networks are proposed for the three major classes of rules on the basis of the proposed theorems. The thesis also looks at the problem of rule mining where user guidance is absent. The thesis proposes a guided rule mining system to overcome this problem. The thesis extends this work further by comparing the performance of the algorithm used in the proposed guided rule mining system with Apriori data mining algorithm. Finally, the thesis studies the Kohonen self-organization map as an unsupervised neural network for rule mining algorithms. Two approaches are adopted based on the way of self-organization maps applied in rule mining models. In the first approach, self-organization map is used for clustering, which provides class information to the rule mining process. In the second approach, automated rule mining takes the place of trained neurons as it grows in a hierarchical structure.

APA, Harvard, Vancouver, ISO, and other styles

8

Pray, Keith A. "Apriori Sets And Sequences: Mining Association Rules from Time Sequence Attributes." Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0506104-150831/.

Full text

Abstract:

Thesis (M.S.) -- Worcester Polytechnic Institute.
Keywords: mining complex data; temporal association rules; computer system performance; stock market analysis; sleep disorder data. Includes bibliographical references (p. 79-85).

APA, Harvard, Vancouver, ISO, and other styles

9

Palanisamy, Senthil Kumar. "Association rule based classification." Link to electronic thesis, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-050306-131517/.

Full text

Abstract:

Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: Itemset Pruning, Association Rules, Adaptive Minimal Support, Associative Classification, Classification. Includes bibliographical references (p.70-74).

APA, Harvard, Vancouver, ISO, and other styles

10

Lin, Weiyang. "Association rule mining for collaborative recommender systems." Link to electronic version, 2000. http://www.wpi.edu/Pubs/ETD/Available/etd-0515100-145926.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Toprak, Serkan. "Data Mining For Rule Discovery In Relational Databases." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605356/index.pdf.

Full text

Abstract:

Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.

APA, Harvard, Vancouver, ISO, and other styles

12

Ahmed, Shakil. "Strategies for partitioning data in association rule mining." Thesis, University of Liverpool, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.415661.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Hahsler, Michael, Kurt Hornik, and Thomas Reutterer. "Implications of probabilistic data modeling for rule mining." Institut für Statistik und Mathematik, WU Vienna University of Economics and Business, 2005. http://epub.wu.ac.at/764/1/document.pdf.

Full text

Abstract:

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this paper we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.
Series: Research Report Series / Department of Statistics and Mathematics

APA, Harvard, Vancouver, ISO, and other styles

14

Bogorny, Vania. "Enhancing spatial association rule mining in geographic databases." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2006. http://hdl.handle.net/10183/7841.

Full text

Abstract:

A técnica de mineração de regras de associação surgiu com o objetivo de encontrar conhecimento novo, útil e previamente desconhecido em bancos de dados transacionais, e uma grande quantidade de algoritmos de mineração de regras de associação tem sido proposta na última década. O maior e mais bem conhecido problema destes algoritmos é a geração de grandes quantidades de conjuntos freqüentes e regras de associação. Em bancos de dados geográficos o problema de mineração de regras de associação espacial aumenta significativamente. Além da grande quantidade de regras e padrões gerados a maioria são associações do domínio geográfico, e são bem conhecidas, normalmente explicitamente representadas no esquema do banco de dados. A maioria dos algoritmos de mineração de regras de associação não garantem a eliminação de dependências geográficas conhecidas a priori. O resultado é que as mesmas associações representadas nos esquemas do banco de dados são extraídas pelos algoritmos de mineração de regras de associação e apresentadas ao usuário. O problema de mineração de regras de associação espacial pode ser dividido em três etapas principais: extração dos relacionamentos espaciais, geração dos conjuntos freqüentes e geração das regras de associação. A primeira etapa é a mais custosa tanto em tempo de processamento quanto pelo esforço requerido do usuário. A segunda e terceira etapas têm sido consideradas o maior problema na mineração de regras de associação em bancos de dados transacionais e tem sido abordadas como dois problemas diferentes: “frequent pattern mining” e “association rule mining”. Dependências geográficas bem conhecidas aparecem nas três etapas do processo. Tendo como objetivo a eliminação dessas dependências na mineração de regras de associação espacial essa tese apresenta um framework com três novos métodos para mineração de regras de associação utilizando restrições semânticas como conhecimento a priori. O primeiro método reduz os dados de entrada do algoritmo, e dependências geográficas são eliminadas parcialmente sem que haja perda de informação. O segundo método elimina combinações de pares de objetos geográficos com dependências durante a geração dos conjuntos freqüentes. O terceiro método é uma nova abordagem para gerar conjuntos freqüentes não redundantes e sem dependências, gerando conjuntos freqüentes máximos. Esse método reduz consideravelmente o número final de conjuntos freqüentes, e como conseqüência, reduz o número de regras de associação espacial.
The association rule mining technique emerged with the objective to find novel, useful, and previously unknown associations from transactional databases, and a large amount of association rule mining algorithms have been proposed in the last decade. Their main drawback, which is a well known problem, is the generation of large amounts of frequent patterns and association rules. In geographic databases the problem of mining spatial association rules increases significantly. Besides the large amount of generated patterns and rules, many patterns are well known geographic domain associations, normally explicitly represented in geographic database schemas. The majority of existing algorithms do not warrant the elimination of all well known geographic dependences. The result is that the same associations represented in geographic database schemas are extracted by spatial association rule mining algorithms and presented to the user. The problem of mining spatial association rules from geographic databases requires at least three main steps: compute spatial relationships, generate frequent patterns, and extract association rules. The first step is the most effort demanding and time consuming task in the rule mining process, but has received little attention in the literature. The second and third steps have been considered the main problem in transactional association rule mining and have been addressed as two different problems: frequent pattern mining and association rule mining. Well known geographic dependences which generate well known patterns may appear in the three main steps of the spatial association rule mining process. Aiming to eliminate well known dependences and generate more interesting patterns, this thesis presents a framework with three main methods for mining frequent geographic patterns using knowledge constraints. Semantic knowledge is used to avoid the generation of patterns that are previously known as non-interesting. The first method reduces the input problem, and all well known dependences that can be eliminated without loosing information are removed in data preprocessing. The second method eliminates combinations of pairs of geographic objects with dependences, during the frequent set generation. A third method presents a new approach to generate non-redundant frequent sets, the maximal generalized frequent sets without dependences. This method reduces the number of frequent patterns very significantly, and by consequence, the number of association rules.

APA, Harvard, Vancouver, ISO, and other styles

15

Shrestha, Anuj. "Association Rule Mining of Biological Field Data Sets." Thesis, North Dakota State University, 2017. https://hdl.handle.net/10365/28394.

Full text

Abstract:

Association rule mining is an important data mining technique, yet, its use in association analysis of biological data sets has been limited. This mining technique was applied on two biological data sets, a genome and a damselfly data set. The raw data sets were pre-processed, and then association analysis was performed with various configurations. The pre-processing task involves minimizing the number of association attributes in genome data and creating the association attributes in damselfly data. The configurations include generation of single/maximal rules and handling single/multiple tier attributes. Both data sets have a binary class label and using association analysis, attributes of importance to each of these class labels are found. The results (rules) from association analysis are then visualized using graph networks by incorporating the association attributes like support and confidence, differential color schemes and features from the pre-processed data.
Bioinformatics Seed Grant Program NIH/UND
National Science Foundation (NSF) Grant IIA-1355466

APA, Harvard, Vancouver, ISO, and other styles

16

Chudán, David. "Association rule mining as a support for OLAP." Doctoral thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-201130.

Full text

Abstract:

The aim of this work is to identify the possibilities of the complementary usage of two analytical methods of data analysis, OLAP analysis and data mining represented by GUHA association rule mining. The usage of these two methods in the context of proposed scenarios on one dataset presumes a synergistic effect, surpassing the knowledge acquired by these two methods independently. This is the main contribution of the work. Another contribution is the original use of GUHA association rules where the mining is performed on aggregated data. In their abilities, GUHA association rules outperform classic association rules referred to the literature. The experiments on real data demonstrate the finding of unusual trends in data that would be very difficult to acquire using standard methods of OLAP analysis, the time consuming manual browsing of an OLAP cube. On the other hand, the actual use of association rules loses a general overview of data. It is possible to declare that these two methods complement each other very well. The part of the solution is also usage of LMCL scripting language that automates selected parts of the data mining process. The proposed recommender system would shield the user from association rules, thereby enabling common analysts ignorant of the association rules to use their possibilities. The thesis combines quantitative and qualitative research. Quantitative research is represented by experiments on a real dataset, proposal of a recommender system and implementation of the selected parts of the association rules mining process by LISp-Miner Control Language. Qualitative research is represented by structured interviews with selected experts from the fields of data mining and business intelligence who confirm the meaningfulness of the proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

17

Rantzau, Ralf. "Extended concepts for association rule discovery." [S.l. : s.n.], 1997. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB8937694.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Wu, Jingtong. "Interpretation of association rules with multi-tier granule mining." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/71455/1/Jing_Wu_Thesis.pdf.

Full text

Abstract:

This study was a step forward to improve the performance for discovering useful knowledge – especially, association rules in this study – in databases. The thesis proposed an approach to use granules instead of patterns to represent knowledge implicitly contained in relational databases; and multi-tier structure to interpret association rules in terms of granules. Association mappings were proposed for the construction of multi-tier structure. With these tools, association rules can be quickly assessed and meaningless association rules can be justified according to the association mappings. The experimental results indicated that the proposed approach is promising.

APA, Harvard, Vancouver, ISO, and other styles

19

Mahmood, Qazafi. "LC - an effective classification based association rule mining algorithm." Thesis, University of Huddersfield, 2014. http://eprints.hud.ac.uk/id/eprint/24274/.

Full text

Abstract:

Classification using association rules is a research field in data mining that primarily uses association rule discovery techniques in classification benchmarks. It has been confirmed by many research studies in the literature that classification using association tends to generate more predictive classification systems than traditional classification data mining techniques like probabilistic, statistical and decision tree. In this thesis, we introduce a novel data mining algorithm based on classification using association called “Looking at the Class” (LC), which can be used in for mining a range of classification data sets. Unlike known algorithms in classification using the association approach such as Classification based on Association rule (CBA) system and Classification based on Predictive Association (CPAR) system, which merge disjoint items in the rule learning step without anticipating the class label similarity, the proposed algorithm merges only items with identical class labels. This saves too many unnecessary items combining during the rule learning step, and consequently results in large saving in computational time and memory. Furthermore, the LC algorithm uses a novel prediction procedure that employs multiple rules to make the prediction decision instead of a single rule. The proposed algorithm has been evaluated thoroughly on real world security data sets collected using an automated tool developed at Huddersfield University. The security application which we have considered in this thesis is about categorizing websites based on their features to legitimate or fake which is a typical binary classification problem. Also, experimental results on a number of UCI data sets have been conducted and the measures used for evaluation is the classification accuracy, memory usage, and others. The results show that LC algorithm outperformed traditional classification algorithms such as C4.5, PART and Naïve Bayes as well as known classification based association algorithms like CBA with respect to classification accuracy, memory usage, and execution time on most data sets we consider.

APA, Harvard, Vancouver, ISO, and other styles

20

Baez, Monroy Vicente Oswaldo. "Neural networks as artificial memories for association rule mining." Thesis, University of York, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.437620.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Fjällström, Peter. "A way to compare measures in association rule mining." Thesis, Umeå universitet, Statistik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-124903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Cai, Chun Hing. "Mining association rules with weighted items." Hong Kong : Chinese University of Hong Kong, 1998. http://www.cse.cuhk.edu.hk/%7Ekdd/assoc%5Frule/thesis%5Fchcai.pdf.

Full text

Abstract:

Thesis (M. Phil.)--Chinese University of Hong Kong, 1998.
Description based on contents viewed Mar. 13, 2007; title from title screen. Includes bibliographical references (p. 99-103). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

23

Mahamaneerat, Wannapa Kay Shyu Chi-Ren. "Domain-concept mining an efficient on-demand data mining approach /." Diss., Columbia, Mo. : University of Missouri--Columbia, 2008. http://hdl.handle.net/10355/7195.

Full text

Abstract:

Title from PDF of title page (University of Missouri--Columbia, viewed on February 24, 2010). The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Dissertation advisor: Dr. Chi-Ren Shyu. Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

24

Li, Jiuyong. "Optimal and Robust Rule Set Generation." Thesis, Griffith University, 2002. http://hdl.handle.net/10072/366394.

Full text

Abstract:

The rapidly growing volume and complexity of modern databases makes the need for technologies to describe and summarise the information they contain increasingly important. Data mining is a process of extracting implicit, previously unknown and potentially useful patterns and relationships from data, and is widely used in industry and business applications. Rules characterise relationships among patterns in databases, and rule mining is one of the central tasks in data mining. There are fundamentally two categories of rules, namely association rules and classification rules. Traditionally, association rules are connected with transaction databases for market basket problems and classification rules are associated with relational databases for predictions. In this thesis, we will mainly focus on the use of association rules for predictions. An optimal rule set is a rule set that satisfies given optimality criteria. In this thesis we study two types of optimal rule sets, the informative association rule set and the optimal class association rule set, where the informative association rule set is used for market basket predictions and the class association rule set is used for the classification. A robust classification rule set is a rule set that is capable of providing more correct predictions than a traditional classification rule set on incomplete test data. Mining transaction databases for association rules usually generates a large number of rules, most of which are unnecessary when used for subsequent prediction. We define a rule set for a given transaction database that is significantly smaller than an association rule set but makes the same predictions as the complete association rule set. We call this rule set the informative rule set. The informative rule set is not constrained to particular target items; and it is smaller than the non-redundant association rule set. We characterise the relationships between the informative rule set and the non-redundant association rule set. We present an algorithm to directly generate the informative rule set without generating all frequent itemsets first, and that accesses databases less often than other direct methods. We show experimentally that the informative rule set is much smaller than both the association rule set and the non-redundant association rule set for a given database, and that it can be generated more efficiently. In addition, we discuss a new unsupervised discretization method to deal with numerical attributes in general association rule mining without target specification. Based on the analysis of the strengths and weaknesses of two commonly used unsupervised numerical attribute discretization methods, we present an adaptive numerical attribute merging algorithm that is shown better than both methods in general association rule mining. Relational databases are usually denser than transaction databases, so mining on them for class association rules, which is a set of association rules whose consequences are classes, may be difficult due to the combinatorial explosion. Based on the analysis of the prediction mechanism, we define an optimal class association rule set to be a subset of the complete class association rule set containing all potentially predictive rules. Using this rule set instead of the complete class association rule set we can avoid redundant computation that would otherwise be required for mining predictive association rules and hence improve the efficiency of the mining process significantly. We present an efficient algorithm for mining optimal class association rule sets using upward closure properties to prune weak rules before they are actually generated. We show theoretically the efficiency of the proposed algorithm will be greater than Apriori on dense databases, and confirm experimentally that it generates an optimal class association rule set, which is very much smaller than a complete class association rule set, in significantly less time than generating the complete class association rule set by Apriori. Traditional classification rule sets perform badly on test data that are not as complete as the training data. We study the problem of discovering more robust rule sets, i.e. we say a rule is more robust than another rule set if it is able to make more accurate predictions on test data with missing attribute values. We reveal a hierarchy of k-optimal rule sets where a k-optimal rule set with a large k is more robust, and they are more robust than a traditional classification rule set. We introduce two methods to find k-optimal rule sets, i.e. an optimal association rule mining approach and a heuristic approximate approach. We show experimentally that a k-optimal rule set generated from the optimal association rule mining approach performs better than that from the heuristic approximate approach and both rule sets perform significantly better than a typical classification rule set (C4.5Rules) on incomplete test data. Finally, we summarise the work discussed in this thesis, and suggest some future research directions.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Computing and Information Technology
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

25

Marinica, Claudia. "Association Rule Interactive Post-processing using Rule Schemas and Ontologies - ARIPSO." Phd thesis, Université de Nantes, 2010. http://tel.archives-ouvertes.fr/tel-00912580.

Full text

Abstract:

This thesis is concerned with the merging of two active research domains: Knowledge Discovery in Databases (KDD), more precisely the Association Rule Mining technique, and Knowledge Engineering (KE) with a main interest in knowledge representation languages developed around the Semantic Web. In Data Mining, the usefulness of association rule technique is strongly limited by the huge amount and the low quality of delivered rules. Experiments show that rules become almost impossible to use when their number exceeds 100. At the same time, nuggets are often represented by those rare (low support) unexpected association rules which are surprising to the user. Unfortunately, the lower the support is, the larger the volume of rules becomes. Thus, it is crucial to help the decision maker with an efficient technique to reduce the number of rules. To overcome this drawback, several methods have been proposed in the literature such as itemset concise representations, redundancy reduction, filtering, ranking and post-processing. Even though rule interestingness strongly depends on user knowledge and goals, most of the existing methods are generally based on data structure. For instance, if the user looks for unexpected rules, all the already known rules should be pruned. Or, if the user wants to focus on specific family of rules, only this subset of rules should be selected. In this context, we address two main issues: the integration of user knowledge in the discovery process and the interactivity with the user. The first issue requires defining an adapted formalism to express user knowledge with accuracy and flexibility such as ontologies in the Semantic Web. Second, the interactivity with the user allows a more iterative mining process where the user can successively test different hypotheses or preferences and focus on interesting rules. The main contributions of this work can be summarized as follows: (i) A model to represent user knowledge. First, we propose a new rule-like formalism, called Rule Schema, which allows the user to define his/her expectations regarding the rules through ontology concepts. Second, ontologies allow the user to express his/her domain knowledge by means of a high semantic model. Last, the user can choose among a set of Operators for interactive processing the one to be applied over each Rule Schema (i.e. pruning, conforming, unexpectedness, . . . ). (ii) A new post-processing approach, called ARIPSO (Association Rule Interactive Post-processing using rule Schemas and Ontologies), which helps the user to reduce the volume of the discovered rules and to improve their quality. It consists in an interactive process integrating user knowledge and expectations by means of the proposed model. At each step of ARIPSO, the interactive loop allows the user to change the provided information and to reiterate the post-processing phase which produces new results. (iii) The implementation in post-processing of the proposed approach. The developed tool is complete and operational, and it implements all the functionalities described in the approach. Also, it makes the connection between different elements like the set of rules and rule schemas stored in PMML/XML files, and the ontologies stored in OWL files and inferred by the Pellet reasoner. (iv) An adapted implementation without post-processing, called ARLIUS (Association Rule Local mining Interactive Using rule Schemas), consisting in an interactive local mining process guided by the user. It allows the user to focus on interesting rules without the necessity to extract all of them, and without minimum support limit. In this way, the user may explore the rule space incrementally, a small amount at each step, starting from his/her own expectations and discovering their related rules. (v) The experimental study analyzing the approach efficiency and the discovered rule quality. For this purpose, we used a real-life and large questionnaire database concerning customer satisfaction. For ARIPSO, the experimentation was carried out in complete cooperation with the domain expert. For different scenarios, from an input set of nearly 400 thousand association rules, ARIPSO filtered between 3 and 200 rules validated by the expert. Clearly, ARIPSO allows the user to significantly and efficiently reduce the input rule set. For ARLIUS, we experimented different scenarios over the same questionnaire database and we obtained reduced sets of rules (less than 100) with very low support.

APA, Harvard, Vancouver, ISO, and other styles

26

Laxminarayan, Parameshvyas. "Exploratory analysis of human sleep data." Worcester, Mass. : Worcester Polytechnic Institute, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0119104-120134/.

Full text

Abstract:

Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: association rule mining; logistic regression; statistical significance of rules; window-based association rule mining; data mining; sleep data. Includes bibliographical references (leaves 166-167).

APA, Harvard, Vancouver, ISO, and other styles

27

Weitl, Harms Sherri K. "Temporal association rule methodologies for geo-spatial decision support /." free to MU campus, to others for purchase, 2002. http://wwwlib.umi.com/cr/mo/fullcit?p3091989.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Unal, Calargun Seda. "Fuzzy Association Rule Mining From Spatio-temporal Data: An Analysis Of Meteorological Data In Turkey." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609308/index.pdf.

Full text

Abstract:

Data mining is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases. Association rule mining is a data mining method that seeks to discover associations among transactions encoded within a database. Data mining on spatio-temporal data takes into consideration the dynamics of spatially extended systems for which large amounts of spatial data exist, given that all real world spatial data exists in some temporal context. We need fuzzy sets in mining association rules from spatio-temporal databases since fuzzy sets handle the numerical data better by softening the sharp boundaries of data which models the uncertainty embedded in the meaning of data. In this thesis, fuzzy association rule mining is performed on spatio-temporal data using data cubes and Apriori algorithm. A methodology is developed for fuzzy spatio-temporal data cube construction. Besides the performance criteria interpretability, precision, utility, novelty, direct-to-the-point and visualization are defined to be the metrics for the comparison of association rule mining techniques. Fuzzy association rule mining using spatio-temporal data cubes and Apriori algorithm performed within the scope of this thesis are compared using these metrics. Real meteorological data (precipitation and temperature) for Turkey recorded between 1970 and 2007 are analyzed using data cube and Apriori algorithm in order to generate the fuzzy association rules.

APA, Harvard, Vancouver, ISO, and other styles

29

Koukal, Bohuslav. "OLAP Recommender: Supporting Navigation in Data Cubes Using Association Rule Mining." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-359132.

Full text

Abstract:

Manual data exploration in data cubes and searching for potentially interesting and useful information starts to be time-consuming and ineffective from certain volume of the data. In my thesis, I designed, implemented and tested a system, automating the data cube exploration and offering potentially interesting views on OLAP data to the end user. The system is based on integration of two data analytics methods - OLAP analysis data visualisation and data mining, represented by GUHA association rules mining. Another contribution of my work is a research of possibilities how to solve differences between OLAP analysis and association rule mining. Implemented solutions of the differences include data discretization, dimensions commensurability, design of automatic data mining task algorithm based on the data structure and mapping definition between mined association rules and corresponding OLAP visualisation. The system was tested with real retail sales data and with EU structural funds data. The experiments proved that complementary usage of the association rule mining together with OLAP analysis identifies relationships in the data with higher success rate than the isolated use of both techniques.

APA, Harvard, Vancouver, ISO, and other styles

30

Abar, Orhan. "Rule Mining and Sequential Pattern Based Predictive Modeling with EMR Data." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/85.

Full text

Abstract:

Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatrist’s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases.

APA, Harvard, Vancouver, ISO, and other styles

31

Zang, Hao. "Non-redundant sequential association rule mining based on closed sequential patterns." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/46166/1/Hao_Zang_Thesis.pdf.

Full text

Abstract:

In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.

APA, Harvard, Vancouver, ISO, and other styles

32

Padhye, Manoday D. "Use of data mining for investigation of crime patterns." Morgantown, W. Va. : [West Virginia University Libraries], 2006. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=4836.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2006.
Title from document title page. Document formatted into pages; contains viii, 108 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 80-81).

APA, Harvard, Vancouver, ISO, and other styles

33

Delpisheh, Elnaz, and University of Lethbridge Faculty of Arts and Science. "Two new approaches to evaluate association rules." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, c2010, 2010. http://hdl.handle.net/10133/2530.

Full text

Abstract:

Data mining aims to discover interesting and unknown patterns in large-volume data. Association rule mining is one of the major data mining tasks, which attempts to find inherent relationships among data items in an application domain, such as supermarket basket analysis. An essential post-process in an association rule mining task is the evaluation of association rules by measures for their interestingness. Different interestingness measures have been proposed and studied. Given an association rule mining task, measures are assessed against a set of user-specified properties. However, in practice, given the subjectivity and inconsistencies in property specifications, it is a non-trivial task to make appropriate measure selections. In this work, we propose two novel approaches to assess interestingness measures. Our first approach utilizes the analytic hierarchy process to capture quantitatively domain-dependent requirements on properties, which are later used in assessing measures. This approach not only eliminates any inconsistencies in an end user’s property specifications through consistency checking but also is invariant to the number of association rules. Our second approach dynamically evaluates association rules according to a composite and collective effect of multiple measures. It interactively snapshots the end user’s domain- dependent requirements in evaluating association rules. In essence, our approach uses neural networks along with back-propagation learning to capture the relative importance of measures in evaluating association rules. Case studies and simulations have been conducted to show the effectiveness of our two approaches.
viii, 85 leaves : ill. ; 29 cm

APA, Harvard, Vancouver, ISO, and other styles

34

Marinica, Claudia. "Association Rule Interactive Post-processing using Rule Schemas and Ontologies : aripso." Phd thesis, Nantes, 2010. https://archive.bu.univ-nantes.fr/pollux/show/show?id=90a57cc4-245f-420d-ac2b-f9ad7929e0f7.

Full text

Abstract:

Cette thèse s'inscrit à la confluence de deux domaines actifs de recherche: l'Extraction de Connaissances à partir des Données - la fouille de Règles
This thesis is concerned with the merging of two active research domains: Knowledge Discovery in Databases - Association Rule Mining technique, and Knowledge Engineering - representation languages of Semantic Web. The usefulness of association rule technique is strongly limited by the huge amount and the low quality of delivered rules. To overcome this drawback, several methods have been proposed in the literature such as itemset concise representations, redundancy reduction, filtering, ranking and post-processing, and most of them are based on data structure. However, rule interestingness strongly depends on user knowledge and goals. In this context, it is crucial to help the user with an efficient technique to reduce the number of rules while keeping interesting ones. This work addresses two main issues: the integration of user knowledge in the discovery process and the interactivity with the user. The first issue requires an accurate and flexible formalism to express user knowledge such as ontologies in the Semantic Web. The second one proposes a more iterative mining process allowing the user to explore the rule space incrementally focusing on interesting rules. The main contributions of this work can be summarized as follows: (i) A model to represent user knowledge. First, we propose to represent user domain knowledge by means of ontologies. Second, we develop a new formalism, called "Rule Schema", which allows the user to define his/her expectations throughout ontology concepts. Last, we suggest the user a set of "mining Operators" to be applied over Rule Schemas. (ii) A new post-processing approach, ARJPSO. Lt allows the user to reduce the volume of the discovered rules by keeping only the interesting rules. ARIPSO is an interactive process integrating user knowledge by means of the proposed model. The interactive loop allows at each step the user to change the provided information and to reiterate the post-processing phase. (iii) The implementation in post-processing of ARJPSO. The developed tool is complete and operational, and it implements all the functionalities described in the approach. An alternative implementation, without post-processing, was proposed (ARLIUS). It consists in an interactive local mining process. (iv) An experimental study analyzing the approach efficiency and the discovered rule quality. For this purpose, we used a large real-life database; for ARJPSO, the experimentation was carried out in complete cooperation with the domain expert. From an input set of nearly 400 thousand rules, for different scenarios, ARIPSO filtered between 3 and 200 rules validated by the expert

APA, Harvard, Vancouver, ISO, and other styles

35

Isik, Narin. "Fuzzy Spatial Data Cube Construction And Its Use In Association Rule Mining." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606056/index.pdf.

Full text

Abstract:

The popularity of spatial databases increases since the amount of the spatial data that need to be handled has increased by the use of digital maps, images from satellites, video cameras, medical equipment, sensor networks, etc. Spatial data are difficult to examine and extract interesting knowledge
hence, applications that assist decision-making about spatial data like weather forecasting, traffic supervision, mobile communication, etc. have been introduced. In this thesis, more natural and precise knowledge from spatial data is generated by construction of fuzzy spatial data cube and extraction of fuzzy association rules from it in order to improve decision-making about spatial data. This involves an extensive research about spatial knowledge discovery and how fuzzy logic can be used to develop it. It is stated that incorporating fuzzy logic to spatial data cube construction necessitates a new method for aggregation of fuzzy spatial data. We illustrate how this method also enhances the meaning of fuzzy spatial generalization rules and fuzzy association rules with a case-study about weather pattern searching. This study contributes to spatial knowledge discovery by generating more understandable and interesting knowledge from spatial data by extending spatial generalization with fuzzy memberships, extending the spatial aggregation in spatial data cube construction by utilizing weighted measures, and generating fuzzy association rules from the constructed fuzzy spatial data cube.

APA, Harvard, Vancouver, ISO, and other styles

36

Abu, Mansour Hussein Y. "Rule pruning and prediction methods for associative classification approach in data mining." Thesis, University of Huddersfield, 2012. http://eprints.hud.ac.uk/id/eprint/17476/.

Full text

Abstract:

Recent studies in data mining revealed that Associative Classification (AC) data mining approach builds competitive classification classifiers with reference to accuracy when compared to classic classification approaches including decision tree and rule based. Nevertheless, AC algorithms suffer from a number of known defects as the generation of large number of rules which makes it hard for end-user to maintain and understand its outcome and the possible over-fitting issue caused by the confidence-based rule evaluation used by AC. This thesis attempts to deal with above problems by presenting five new pruning methods, prediction method and employs them in an AC algorithm that significantly reduces the number of generated rules without having large impact on the prediction rate of the classifiers. Particularly, the new pruning methods that discard redundant and insignificant rules during building the classifier are employed. These pruning procedures remove any rule that either has no training case coverage or covers a training case without the requirement of class similarity between the rule class and that of the training case. This enables large coverage for each rule and reduces overfitting as well as construct accurate and moderated size classifiers. Beside, a novel class assignment method based on multiple rules is proposed which employs group of rule to make the prediction decision. The integration of both the pruning and prediction procedures has been used to enhanced a known AC algorithm called Multiple-class Classification based on Association Rules (MCAR) and resulted in competent model in regard to accuracy and classifier size called " Multiple-class Classification based on Association Rules 2(MCAR2)". Experimental results against different datasets from the UCI data repository showed that the predictive power of the resulting classifiers in MCAR2 slightly increase and the resulting classifier size gets reduced comparing with other AC algorithms such as Multiple-class Classification based on Association Rules (MCAR).

APA, Harvard, Vancouver, ISO, and other styles

37

Liao, Yuan-Fong, and 廖原豐. "Causal Association Rule Mining." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/sy5ufc.

Full text

Abstract:

碩士
國立中央大學
資訊管理研究所
94
This thesis mainly probes into the causality among the investment problems of the stock market to do for the experimental subject of this research. We focus on discussing how about to promote the performance of investment. If we want to promote the performance of investment, we must understand the causality among the factor which influences the performance and performance observing value. we will utilize the method of association rule of data mining to help to look for association rules about causality among the technological indicators which influences the performance and performance observing value (ex. the reversal point of the stock price). We call these rules as Causal Association Rules. We can make these rules up into the tactics of securities trading. In the past, many scholars proposed a lot of methods of association rules, but these methods will produce a large number of large itemsets. So that there are too many rules and it is difficult to assess the interesting of rules and relatively inefficient. So we propose a CFP algorithm structure which mainly improve FP-Growth algorithm to reduce mining the unnecessary large itemsets and enable only producing the interesting causal association rules efficiently. The common data dispersed methods now have equal width interval and equal frequency interval. But when investors pass in and out stock market to buy or sell stocks, they usually reference the aggregate value of technological indicators. So we propose equal width aggregate interval and equal frequency aggregate interval. These two data dispersed methods can also support mining causal association rules with level crossing so that we can mine more interesting rules. As the result of t test, the performance of our algorithm is better than FP-growth algorithm apparently. We also find the CFP algorithm is suitable for mining large-scalar database. We arrange causal association rules in an order by different point of view to analysis so as to offer investors assistance in arrangements of investment tactics and the reference of to avoid the loss.

APA, Harvard, Vancouver, ISO, and other styles

38

Chien, Peng Wang, and 王建鵬. "Find the General Rule of Data Mining Association Rules." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/08735074145658888662.

Full text

Abstract:

碩士
萬能科技大學
資訊管理研究所
99
At present the application of association rule mining and research, to exchange products generated discussion targeted mostly clustered, and in the exploration process and output that, there is no a general rule of representation, usually in a unique way or the text description . This study proposes a concept of transactions by participants in the association rule mining as an object. For association rule mining applications more flexible, to entities associated with the set methodology for the extension of a graphical representation, so that regardless of the implementation of the method, the can be simple and clear expression, and association rule mining to fully describe the various restrictions , regardless of entity-relationship structure, star structure, snow structure, can be described as a class can be summarized, and describe the relationship between different induction levels. Another object via the specified mining, exploration using different trading partners, meaning more like mining.

APA, Harvard, Vancouver, ISO, and other styles

39

Li, Shenzhi. "Higher order association rule mining." 2010. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389963.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

LIN, MING-HUNG, and 林銘泓. "Exploringthe Distribution Rules of Aggregate Using Data Mining Association Rule." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/00708958833560184595.

Full text

Abstract:

碩士
萬能科技大學
資訊管理研究所在職專班
104
Aggregate of ready-mixed concrete from the shipping dock to bulk cargo, then vehicle distribution to various ready-mix plant, temporary storage yard. Provided that the transportation process often because there was no effective distribution rules can refer to, lead to a pier laden vehicle waiting distribution caused by congestion. This study by the association rules of data mining methods to retrieve various schedules, content delivery and distribution locations, and thus the formation of the basket, with the relevance of interrelated rules refer to find it. In this study, the use of association rules rule the aggregate distribution is obtained, only that the same timetable and distribution of goods loaded reference rule, if delivery mainland thirds stone, they will delivery six points continent stone; and distribution Hualien sand, it must distribution will Hualien Hualien sixth of stone or stone-thirds. Whereby rules can help dispatchers to quickly make a correct and efficient delivery schedule, another of the study were not included because of the time it is not possible depth information delivery order.

APA, Harvard, Vancouver, ISO, and other styles

41

Lin, Shih Hsiang, and 林士翔. "DARM: Doughnut-shaped Association Rule Mining." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/54386438560648611106.

Full text

Abstract:

碩士
長庚大學
資訊管理學研究所
97
This is the age of “Information Explosion”. We can easier to get more and more information. Information visualization research is to be valuable for conveniently presenting the infinite information. It is often seen the information visualization products like maps, signs, graphs in our life. Information visualization can also use in data mining methodology. Data mining is often called knowledge discovery. Association rule mining is the most famous data mining method. Association rule mining is used to discover all associations among items. However, user can not hold the important item fast and exactly by text. We propose an association rule algorithm which use doughnut shapes to present association rule. DARM(Doughnut-shaped association rule mining) includes a overview circle and lots of detail circles which produced by items. DARM let user understand the mining step easily. User can use their self-knowledge and self-experience to participate in the process. Most importantly, we use the simple and clear doughnut shapes let user realize the database overview and all associations among items rapidly.

APA, Harvard, Vancouver, ISO, and other styles

42

Yan, Chen Shih, and 陳世彥. "Rule Induction on Mining Large Database." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/16757750258799678342.

Full text

Abstract:

碩士
東海大學
資訊工程與科學系
97
There are lots of valuable information that are hidden in medical databases, however, it is often too tedious or too complicate to discover useful knowledge from them. So that, how to use effective methods to extract information from large medical records has become an important issue today. The principle of data mining is in sorting through large amount of data and filtering out relevant information. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data” and “the science of extracting useful information from large data sets or databases.” To date, data mining techniques have been widely used in many fields such as education and e-commerce, etc. By applying data mining techniques, we proposed the Computer-aided Disease Diagnostic System (CDDS), which can be used to evaluate the relationship between diagnostic items and diagnosis from a large medical database to induce valuable information, rules, and to predict the diagnoses. CDDS takes three stages to complete the work: (1) reduces database size by calculating the correlation coefficients between diagnostic items and diagnosing decision, and prune items whose correlation coefficients are small; (2) find the best-fit probability distribution and generate random variates to fill in the missing values among those records; (3) employ AND operations on diagnostic items to generate rules, and calculate J-Information of each rule. Retain rules with higher J-Information and use them to predict the diagnostic. In our experiment, the ratio of correctness is 95%. As you can see, by applying CDDS, we can not only extract valuable information from medical databases but also provide some aids to those medical professionals in diagnosing diseases.

APA, Harvard, Vancouver, ISO, and other styles

43

Lin, Ming-Yen, and 林明言. "Efficient Algorithms for Association Rule Mining and Sequential Pattern Mining." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/m8z62p.

Full text

Abstract:

博士
國立交通大學
資訊工程系所
92
Tremendous amount of data being collected is increasing speedily by computerized applications around the world. Hidden in the vast data, the valuable information is attracting researchers of multiple disciplines to study effective approaches to derive useful knowledge from within. Among various data mining objectives, the mining of frequent patterns has been the focus of knowledge discovery in databases. This thesis aims to investigate efficient algorithms for mining frequent patterns including association rules and sequential patterns. We propose the LexMiner algorithm to deal with frequent item-set discovery for association rules. To alleviate the drawbacks of hash-tree placement of candidates, some algorithms store candidate patterns according to prefix-order of itemsets. LexMiner utilizes the lexicographic features and lexicographic comparisons to further speed up the kernel operation of mining algorithms. A memory indexing approach called MEMISP is proposed for fast sequential pattern mining using a find-then-index technique. MEMISP mines databases of any size, with respect to any support threshold, in just two passes of database scanning. MEMISP outperforms other algorithms in that neither candidate patterns nor intermediate databases are generated. Mining sequential patterns with time constraints, such as time gaps and sliding time-window, may reinforce the accuracy of mining results. However, the capabilities to mine the time-constrained patterns were previously available only within Apriori framework. Recent studies indicate that pattern- growth methodology could speed up sequence mining. We integrate the constraints into a divide-and-conquer strategy of sub-database projection and propose the pattern-growth based DELISP algorithm, which outperforms other algorithms in mining time-constrained sequential patterns. In practice, knowledge discovery is an iterative process. Thus, reducing the response time during user interactions for the desired outcome is crucial. The proposed KISP algorithm utilizes the knowledge acquired from individual mining process, accumulates the counting information to facilitate efficient counting of patterns, and accelerates the whole interactive sequence mining process. Current approaches for sequential pattern mining usually assume that the mining is performed with respect to a static sequence database. However, databases are not static due to update so that the discovered patterns might become invalid and new patterns could be created. Instead of re-mining from scratch, the proposed IncSP algorithm solves the incremental update problem through effective implicit merging and efficient separate counting over appended sequences. Patterns found in prior stages are incrementally updated rather than re-mining. Comprehensive experiments have been conducted to assess the performance of the proposed algorithms. The empirical results show that these algorithms outperform state-of-the-art algorithms with respect to various mining parameters and datasets of different characteristics. The scale-up experiments also verify that our algorithms successfully mine frequent patterns with good linear scalability.

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, Hung-Jen, and 陳宏任. "Algorithms for Negative Sequential Pattern Mining and Fuzzy Correlation Rule Mining." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/35030898529324940135.

Full text

Abstract:

博士
淡江大學
資訊工程學系博士班
96
Due to rapid developments in information technology and automatic data collection tools, a large amount of data has been collected and stored in various data repositories. To extract valuable information from these data is the key to improve business competition. Data mining offers ways to automatically find nontrivial, previously unknown, and potentially useful knowledge from large databases. Mining of frequent patterns plays an essential role in data mining. Many methods have been proposed for discovering various types of frequent patterns such as frequent itemsets, association rules, correlation rules, and sequential patterns. In this dissertation, three types of frequent patterns, namely, negative sequential patterns, negative fuzzy sequential patterns, and fuzzy correlation rules, have been introduced. We propose an algorithm for mining negative sequential patterns, which consider not only the occurrence of itemsets in transactions in databases but also their absence. In this algorithm, we have designed a candidate generation procedure employing the apriori principle to eliminate many redundant candidates during the mining task. Moreover, in this method, we also define a function based on the conditional probability theory to measure the interestingness of sequences in order to find more interesting negative sequential patterns. Additionally, most transaction data in real-world applications usually consist of quantitative values. In order to investigate various types of data in quantitative databases and then discover negative sequential patterns from such databases, we propose an algorithm, which combines fuzzy-set theory and negative sequential pattern concept, for mining negative fuzzy sequential patterns from quantitative databases. Furthermore, we propose a method for mining fuzzy correlation rules, which applies fuzzy correlation analysis to determine whether two sub-fuzzy itemsets in a fuzzy itemset are dependent, and then extract more interesting fuzzy correlation rules from quantitative databases. Experiments in the three proposed algorithms show that our algorithms can prune a lot of redundant candidates during the process of mining tasks and can effectively extract frequent patterns that are actually interesting.

APA, Harvard, Vancouver, ISO, and other styles

45

Cheng, Yung-Hsiung, and 鄭永雄. "A study of association rule mining algorithms." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/12205682895999423189.

Full text

Abstract:

碩士
義守大學
資訊管理學系碩士班
95
In recent years, the techniques of Data Mining has already become one of the rather popular research subjects. Its purpose is to mining meaningful information from the database, and provides it to the administrator for decision making. In past relevant research, many algorithms were proposed to improve the effect of association rule currently. These methods are to reduce the computation of non-correlation itemsets to save the CPU time, or to reduces the information search frequency to save the I/O cost, or even to improve storage configuration and access method to promote whole effect. These algorithms each have their own advantage. but lack of synthetically inter-communication. If the user is to mining an unknown database, it will be difficult to determine which algorithm provides the best effect, therefore we must consider the applicability of the association rule of data mining algorithm in order to mine data more effectively and obtain useful information. The research inquires into presently five association rule algorithms, and uses them individually to process several real databases. And then analyze these experiment data to see each algorithm’s pros and cons and its applicable type of database characteristics. We then carry on to process the Apriori algorithm, Frequent-pattern growth(FP-growth) algorithm, Dynamic Itemset Counting(DIC) algorithm, the Pruning of the Direct Hashing(DHP) algorithm and the LCM-freq algorithm according to the characteristic of database, obtain the processed data from several database and organize them. Finally, we wish to suggest the users use more effective association rules of data mining algorithm.

APA, Harvard, Vancouver, ISO, and other styles

46

Jin, Weiqing. "Fuzzy classification based on fuzzy association rule mining." 2004. http://www.lib.ncsu.edu/theses/available/etd-12072004-130619/unrestricted/etd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Chaudhary, Umang Kamalakar. "Flow classification using clustering and associative rule mining." 2010. http://www.lib.ncsu.edu/resolver/1840.16/6012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Wu, Chin-Wei, and 吳靜薇. "Association Rule Mining For Enrollment Grade And Graduate." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/48273824059652510501.

Full text

Abstract:

碩士
國立高雄師範大學
資訊教育研究所
101
The object of this research is to study the relationship of the entrance score, the admission types, and school achievement for the profession education schools. The research data is based on the 98、99、100 academic year's result of one private profession education school in Tainan. The relationship principles of Data Mining is used to analyze the school achievement, admission types, entrance score, gender, department, entrance identity, and the graduated junior high school for three academic years. Improve thorough understanding for the above factors, and can be a decision reference for school to recruit students.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Kun-Hsien, and 陳昆賢. "Using Fuzzy Rule Induction for Mining Classification Knowledge." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/61832007585491374318.

Full text

Abstract:

碩士
國立中山大學
資訊管理學系研究所
88
With the computerization of businesses, more and more data are generated and stored in databases for many business applications. Finding interesting patterns among those data may lead to useful knowledge that provides competitive advantage in business. Knowledge discovery in database has thus become an important issue to help business acquire knowledge that assists managerial and operational work. Among many types of knowledge, classification knowledge is widely used. Most classification rules learned by induction algorithms are in the crisp form. Fuzzy linguistic representation of rules, however, is much closer to the way human reasons. The objective of this research is to propose a method to mine classification knowledge from the database with fuzzy descriptions. The procedure contains five steps, starting from data preparation to rule pruning. A rule induction algorithm, RITIO, is employed to generate the classification rules. Fuzzy inference mechanism that includes fuzzy matching and output reasoning is specified to yield the output class. An experiment is conducted using several databases to show advantages of this work. The proposed method is justified with good system performance. It can be easily implemented in various business applications on classification tasks.

APA, Harvard, Vancouver, ISO, and other styles

50

Liu, Po-Ting, and 劉柏廷. "Association Rule Based Relational Mining for Stock Trading." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/99939684843737594402.

Full text

Abstract:

碩士
國立中央大學
資訊管理研究所
95
When it comes to analyzing numerical data by Association Rule, we have to disperse those numerical data before we start to use them as a data mining source data. The common data dispersed methods are “equal width interval” and “equal frequency interval”. We categorize these two methods into “absolute”, because both of them classify different values into each interval with the same length. In practice, equal width interval and equal frequency interval are not necessary the suitable way to deal with all kinds of data. For example, the usage of many popular and famous technical analysis indicators is considered “relative-comparison”, rather than “absolute- comparison”. Therefore, if we simply treat all kinds of data as “absolute-comparison” data without thinking about whether those data have “relative-comparison” characteristics in nature, we may lead to information loss because we ignore some important features in those data. 　　For this reason, we propose a concept of “relative-type comparative relation” which is an alternative to “equal width interval” and “equal frequency interval” for data preprocessing. Through “relative-comparison” we can transfer numerical data to data mining source data in a more appropriate way that make the source data more similar into the numerical data in meaning, so that we can reduce information loss and enhance the result of data mining. 　　After applying “relative-comparison” to association rule data mining, we use CBA(Classification Based on Associations) to classify and predict the target data. CBA can be divided in two steps which are “rule simplification” and “collective evaluation.” “Rule simplification” eliminates those redundant rules and integrates those general rules for classification. “Collective evaluation” uses the total confidence of screened rules to classify and predict the target data and enhance the accuracy of classification and prediction. 　　The experimental data is extracted from American stock trading data form 2003 to 2006. The results of the experiments show that the application of “relative-comparison” does improve the precision of stock price estimation. After we implement “rule simplification” and “collective evaluation” in the experiments, we improve the precision rate to a higher level.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Rule mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles