Dissertations / Theses: 'Data Mining Approaches'

1

Liu, Xiao, and Xiao Liu. "Health Data Analytics: Data and Text Mining Approaches for Pharmacovigilance." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/620913.

Full text

Abstract:

Pharmacovigilance is defined as the science and activities relating to the detection, assessment, understanding, and prevention of adverse drug events (WHO 2004). Post-approval adverse drug events are a major health concern. They attribute to about 700,000 emergency department visits, 120,000 hospitalizations, and $75 billion in medical costs annually (Yang et al. 2014). However, certain adverse drug events are preventable if detected early. Timely and accurate pharmacovigilance in the post-approval period is an urgent goal of the public health system. The availability of various sources of healthcare data for analysis in recent years opens new opportunities for the data-driven pharmacovigilance research. In an attempt to leverage the emerging healthcare big data, pharmacovigilance research is facing a few challenges. Most studies in pharmacovigilance focus on structured and coded data, and therefore miss important textual data from patient social media and clinical documents in EHR. Most prior studies develop drug safety surveillance systems using a single data source with only one data mining algorithm. The performance of such systems is hampered by the bias in data and the pitfalls of the data mining algorithms adopted. In my dissertation, I address two broad research questions: 1) How do we extract rich adverse drug event related information in textual data for active drug safety surveillance? 2) How do we design an integrated pharmacovigilance system to improve the decision-making process for drug safety regulatory intervention? To these ends, the dissertation comprises three essays. The first essay examines how to develop a high-performance information extraction framework for patient reports of adverse drug events in health social media. I found that medical entity extraction, drug-event relation extraction, and report source classification are necessary components for this task. In the second essay, I address the scalability issue of using social media for pharmacovigilance by proposing a distant supervision approach for information extraction. In the last essay, I develop a MetaAlert framework for pharmacovigilance with advanced text mining and data mining techniques to provide timely and accurate detection of adverse drug reactions. Models, frameworks, and design principles proposed in these essays advance not only pharmacovigilance research, but also more broadly contribute to health IT, business analytics, and design science research.

APA, Harvard, Vancouver, ISO, and other styles

2

Ma, Yao. "Financial market predictions using Web mining approaches /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20MAY.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Otaki, Keisuke. "Algorithmic Approaches to Pattern Mining from Structured Data." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215673.

Full text

Abstract:

The contents of Chapter 6 are based on work published in IPSJ Transactions on Mathematical Modeling and Its Applications, vol.9(1), pp.32-42, 2016.
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第19846号
情博第597号
新制||情||104(附属図書館)
32882
京都大学大学院情報学研究科知能情報学専攻
(主査)教授山本章博, 教授鹿島久嗣, 教授阿久津達也
学位規則第4条第1項該当

APA, Harvard, Vancouver, ISO, and other styles

4

Yang, L. "Optimisation approaches for data mining in biological systems." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1473809/.

Full text

Abstract:

The advances in data acquisition technologies have generated massive amounts of data that present considerable challenge for analysis. How to efficiently and automatically mine through the data and extract the maximum value by identifying the hidden patterns is an active research area, called data mining. This thesis tackles several problems in data mining, including data classification, regression analysis and community detection in complex networks, with considerable applications in various biological systems. First, the problem of data classification is investigated. An existing classifier has been adopted from literature and two novel solution procedures have been proposed, which are shown to improve the predictive accuracy of the original method and significantly reduce the computational time. Disease classification using high throughput genomic data is also addressed. To tackle the problem of analysing large number of genes against small number of samples, a new approach of incorporating extra biological knowledge and constructing higher level composite features for classification has been proposed. A novel model has been introduced to optimise the construction of composite features. Subsequently, regression analysis is considered where two piece-wise linear regression methods have been presented. The first method partitions one feature into multiple complementary intervals and ts each with a distinct linear function. The other method is a more generalised variant of the previous one and performs recursive binary partitioning that permits partitioning of multiple features. Lastly, community detection in complex networks is investigated where a new optimisation framework is introduced to identify the modular structure hidden in directed networks via optimisation of modularity. A non-linear model is firstly proposed before its linearised variant is presented. The optimisation framework consists of two major steps, including solving the non-linear model to identify a coarse initial partition and a second step of solving repeatedly the linearised models to re fine the network partition.

APA, Harvard, Vancouver, ISO, and other styles

5

Yun, Unil. "New approaches to weighted frequent pattern mining." Texas A&M University, 2005. http://hdl.handle.net/1969.1/5003.

Full text

Abstract:

Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining.

APA, Harvard, Vancouver, ISO, and other styles

6

Shao, Huijuan. "Temporal Mining Approaches for Smart Buildings Research." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/84349.

Full text

Abstract:

With the advent of modern sensor technologies, significant opportunities have opened up to help conserve energy in residential and commercial buildings. Moreover, the rapid urbanization we are witnessing requires optimized energy distribution. This dissertation focuses on two sub-problems in improving energy conservation; energy disaggregation and occupancy prediction. Energy disaggregation attempts to separate the energy usage of each circuit or each electric device in a building using only aggregate electricity usage information from the meter for the whole house. The second problem of occupancy prediction can be accomplished using non-invasive indoor activity tracking to predict the locations of people inside a building. We cast both problems as temporal mining problems. We exploit motif mining with constraints to distinguish devices with multiple states, which helps tackle the energy disaggregation problem. Our results reveal that motif mining is adept at distinguishing devices with multiple power levels and at disentangling the combinatorial operation of devices. For the second problem we propose time-gap constrained episode mining to detect activity patterns followed by the use of a mixture of episode generating HMM (EGH) models to predict home occupancy. Finally, we demonstrate that the mixture EGH model can also help predict the location of a person to address non-invasive indoor activities tracking.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

7

Delpisheh, Elnaz, and University of Lethbridge Faculty of Arts and Science. "Two new approaches to evaluate association rules." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, c2010, 2010. http://hdl.handle.net/10133/2530.

Full text

Abstract:

Data mining aims to discover interesting and unknown patterns in large-volume data. Association rule mining is one of the major data mining tasks, which attempts to find inherent relationships among data items in an application domain, such as supermarket basket analysis. An essential post-process in an association rule mining task is the evaluation of association rules by measures for their interestingness. Different interestingness measures have been proposed and studied. Given an association rule mining task, measures are assessed against a set of user-specified properties. However, in practice, given the subjectivity and inconsistencies in property specifications, it is a non-trivial task to make appropriate measure selections. In this work, we propose two novel approaches to assess interestingness measures. Our first approach utilizes the analytic hierarchy process to capture quantitatively domain-dependent requirements on properties, which are later used in assessing measures. This approach not only eliminates any inconsistencies in an end user’s property specifications through consistency checking but also is invariant to the number of association rules. Our second approach dynamically evaluates association rules according to a composite and collective effect of multiple measures. It interactively snapshots the end user’s domain- dependent requirements in evaluating association rules. In essence, our approach uses neural networks along with back-propagation learning to capture the relative importance of measures in evaluating association rules. Case studies and simulations have been conducted to show the effectiveness of our two approaches.
viii, 85 leaves : ill. ; 29 cm

APA, Harvard, Vancouver, ISO, and other styles

8

Shen, Shijun. "Approaches to creating anonymous patient database." Morgantown, W. Va. : [West Virginia University Libraries], 2000. http://etd.wvu.edu/templates/showETD.cfm?recnum=1693.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2000.
Title from document title page. Document formatted into pages; contains v, 68 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 67-68).

APA, Harvard, Vancouver, ISO, and other styles

9

Mougel, Pierre-Nicolas. "Finding homogeneous collections of dense subgraphs using constraint-based data mining approaches." Thesis, Lyon, INSA, 2012. http://www.theses.fr/2012ISAL0073.

Full text

Abstract:

Ce travail de thèse concerne la fouille de données sur des graphes attribués. Il s'agit de graphes dans lesquels des propriétés, encodées sous forme d'attributs, sont associées à chaque sommet. Notre objectif est la découverte, dans ce type de données, de sous-graphes organisés en plusieurs groupes de sommets fortement connectés et homogènes au regard des attributs. Plus précisément, nous définissons l'extraction sous contraintes d'ensembles de sous-graphes densément connectés et tels que les sommets partagent suffisamment d'attributs. Pour cela nous proposons deux familles de motifs originales ainsi que les algorithmes justes et complets permettant leur extraction efficace sous contraintes. La première famille, nommée Ensembles Maximaux de Cliques Homogènes, correspond à des motifs satisfaisant des contraintes concernant le nombre de sous-graphes denses, la taille de ces sous-graphes et le nombre d'attributs partagés. La seconde famille, nommée Collections Homogènes de k-cliques Percolées emploie quant à elle une notion de densité plus relaxée permettant d'adapter la méthode aux données avec des valeurs manquantes. Ces deux méthodes sont appliquées à l'analyse de deux types de réseaux, les réseaux de coopérations entre chercheurs et les réseaux d'interactions de protéines. Les motifs obtenus mettent en évidence des structures utiles dans un processus de prise de décision. Ainsi, dans un réseau de coopérations entre chercheurs, l'analyse de ces structures peut aider à la mise en place de collaborations scientifiques entre des groupes travaillant sur un même domaine. Dans le contexte d'un graphe de protéines, les structures exhibées permettent d'étudier les relations entre des modules de protéines intervenant dans des situations biologiques similaires. L'étude des performances en fonction de différentes caractéristiques de graphes attribués réels et synthétiques montre que les approches proposées sont utilisables sur de grands jeux de données
The work presented in this thesis deals with data mining approaches for the analysis of attributed graphs. An attributed graph is a graph where properties, encoded by means of attributes, are associated to each vertex. In such data, our objective is the discovery of subgraphs formed by several dense groups of vertices that are homogeneous with respect to the attributes. More precisely, we define the constraint-based extraction of collections of subgraphs densely connected and such that the vertices share enough attributes. To this aim, we propose two new classes of patterns along with sound and complete algorithms to compute them efficiently using constraint-based approaches. The first family of patterns, named Maximal Homogeneous Clique Set (MHCS), contains patterns satisfying constraints on the number of dense subgraphs, on the size of these subgraphs, and on the number of shared attributes. The second class of patterns, named Collection of Homogeneous k-clique Percolated components (CoHoP), is based on a relaxed notion of density in order to handle missing values. Both approaches are used for the analysis of scientific collaboration networks and protein-protein interaction networks. The extracted patterns exhibit structures useful in a decision support process. Indeed, in a scientific collaboration network, the analysis of such structures might give hints to propose new collaborations between researchers working on the same subjects. In a protein-protein interaction network, the analysis of the extracted patterns can be used to study the relationships between modules of proteins involved in similar biological situations. The analysis of the performances, on real and synthetic data, with respect to different attributed graph characteristics, shows that the proposed approaches scale well for large datasets

APA, Harvard, Vancouver, ISO, and other styles

10

Johansson, Fernstad Sara. "Algorithmically Guided Information Visualization : Explorative Approaches for High Dimensional, Mixed and Categorical Data." Doctoral thesis, Linköpings universitet, Medie- och Informationsteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70860.

Full text

Abstract:

Facilitated by the technological advances of the last decades, increasing amounts of complex data are being collected within fields such as biology, chemistry and social sciences. The major challenge today is not to gather data, but to extract useful information and gain insights from it. Information visualization provides methods for visual analysis of complex data but, as the amounts of gathered data increase, the challenges of visual analysis become more complex. This thesis presents work utilizing algorithmically extracted patterns as guidance during interactive data exploration processes, employing information visualization techniques. It provides efficient analysis by taking advantage of fast pattern identification techniques as well as making use of the domain expertise of the analyst. In particular, the presented research is concerned with the issues of analysing categorical data, where the values are names without any inherent order or distance; mixed data, including a combination of categorical and numerical data; and high dimensional data, including hundreds or even thousands of variables. The contributions of the thesis include a quantification method, assigning numerical values to categorical data, which utilizes an automated method to define category similarities based on underlying data structures, and integrates relationships within numerical variables into the quantification when dealing with mixed data sets. The quantification is incorporated in an interactive analysis pipeline where it provides suggestions for numerical representations, which may interactively be adjusted by the analyst. The interactive quantification enables exploration using commonly available visualization methods for numerical data. Within the context of categorical data analysis, this thesis also contributes the first user study evaluating the performance of what are currently the two main visualization approaches for categorical data analysis. Furthermore, this thesis contributes two dimensionality reduction approaches, which aim at preserving structure while reducing dimensionality, and provide flexible and user-controlled dimensionality reduction. Through algorithmic quality metric analysis, where each metric represents a structure of interest, potentially interesting variables are extracted from the high dimensional data. The automatically identified structures are visually displayed, using various visualization methods, and act as guidance in the selection of interesting variable subsets for further analysis. The visual representations furthermore provide overview of structures within the high dimensional data set and may, through this, aid in focusing subsequent analysis, as well as enabling interactive exploration of the full high dimensional data set and selected variable subsets. The thesis also contributes the application of algorithmically guided approaches for high dimensional data exploration in the rapidly growing field of microbiology, through the design and development of a quality-guided interactive system in collaboration with microbiologists.

APA, Harvard, Vancouver, ISO, and other styles

11

Smith, Sydney. "Approaches to Natural Language Processing." Scholarship @ Claremont, 2018. http://scholarship.claremont.edu/cmc_theses/1817.

Full text

Abstract:

This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-‐‑supervised NMF as a possible way to improve topic modeling.

APA, Harvard, Vancouver, ISO, and other styles

12

Curtarolo, Stefano 1969. "Coarse-graining and data mining approaches to the prediction of structures and their dynamics." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/17034.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2003.
Includes bibliographical references (p. 245-263).
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Predicting macroscopic properties of materials starting from an atomistic or electronic level description can be a formidable task due to the many orders of magnitude in length and time scales that need to be spanned. A characteristic of successful approaches to this problem is the systematic coarse-graining of less relevant degrees of freedom in order to obtain Hamiltonians that span larger length and time scale. Attempts to do this in the static regime (i.e. zero temperature) have already been developed, as well as thermodynamical models where all the internal degrees of freedom are removed. In this thesis, we present an approach that leads to a dynamics for thermodynamic-coarse-grained models. This allows us to obtain temperature-dependent and transport properties. The renormalization group theory is used to create new local potential models between nodes, within the approximation of local thermodynamical equilibrium. Assuming that these potentials give an averaged description of node dynamics, we calculate thermal and mechanical properties. If this method can be sufficiently generalized it may form the basis of a Multiscale Molecular Dynamics method with time and spatial coarse-graining. In the second part of the thesis, we analyze the problem of crystal structure prediction, by using quantum calculations.
(cont.) This is a fundamental problem in materials research and development, and it is typically addressed with highly accurate quantum mechanical computations on a small set of candidate structures, or with empirical rules that have been extracted from a large amount of experimental information, but have limited predictive power. In this thesis, we transfer the concept of heuristic rule extraction to a large library of ab-initio calculated information, and demonstrate that this can be developed into a tool for crystal structure prediction. In addition, we analyze the ab-initio results and prediction for a large number of transition-metal binary alloys.
by Stefano Curtarolo.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

13

Wang, Yunguan. "Data-driven Approaches to Understand Development, Diseases and Identify Therapeutics." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535704902199176.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Mohan, Sujaa Rani Park E. K. "Association rule based data mining approaches for Web Cache Maintenance and adaptive Intrusion Detection systems." Diss., UMK access, 2005.

Find full text

Abstract:

Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2005.
"A thesis in computer science." Typescript. Advisor: E.K. Park. Vita. Title from "catalog record" of the print edition Description based on contents viewed March 12, 2007. Includes bibliographical references (leaves 159-162). Online version of the print edition.

APA, Harvard, Vancouver, ISO, and other styles

15

Otey, Matthew Eric. "Approaches to Abnormality Detection with Constraints." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1150484039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Nallan, Sreedhar Acharya. "Geospatial and data mining approaches to assess the impact of watershed development in Indian rainfed areas." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2017. https://ro.ecu.edu.au/theses/1980.

Full text

Abstract:

Watershed development programs in India have played a significant role in improving the livelihoods of the rural communities living in rainfed areas. Current assessments are limited in assessing interrelated impacts as the watershed development is influenced by multi domain areas. Few studies have reported on the novel ICT techniques being used for watershed assessment with actual watershed data or examined the spatial or temporal variations in the watershed. The objective of the research was to study current novel geospatial and data mining methods used in hydrological assessments of watershed development and to apply the identified novel methods on a real-time watershed data. The following major research question has been addressed by the research study: “Can novel geospatial and data mining methods be effectively used to assess the hydrological impacts of watershed development? In order to answer this question, the research was carried out in a number of phases to examine existing ICT techniques utilised for impact assessment of the watershed area. The research methodology used in this study was a mixed method approach based on case study, diagnostic research and quantitative approach. Two contiguous watersheds in rainfed region of Andhra Pradesh, India was chosen as study area. Data sets were sourced from a number of Government and NGO agencies and field visits. Data representing sixteen parameters of hydrological, environmental and social factors which were known to influence watersheds were chosen for the study. The data consisted of both spatial and spatio-temporal data. A grid of 2880 cells covering the study area was developed. Data for the period 2006 to 2008 in two seasons (pre-monsoon and post-monsoon) were collected, compiled, classified and assigned to the grid network database. The study area was divided into three zones viz., upstream, midstream and downstream. The data underwent preprocessing in order to make it suitable for further data analysis. This included watershed delineation, creating grid network, handling point data, line data and polygon data, and formatting data into unified format. The data was converted into nominal classes to be utilised for data mining. The watershed data set was analysed using descriptive statistics, geospatial and data mining approach. The first analysis used descriptive statistics based on univariate analysis using pivot tables wizard. This analysis used all sixteen watershed parameters. A series of different scenarios for soils, ground water levels, landuse and checkdams were examined. The second approach was a geospatial analysis which used optimised hot spot analysis. The analysis used NDVI, ground water levels data as the input parameters. The data was examined in relation to landuse and location of checkdams. The third approach employed spatial data mining techniques by using DBSCAN clustering and Apriori rule based association rule mining techniques on watershed data. The analysis used fourteen spatio-temporal parameters. The output from the analysis was visualised using a GIS environment. A comparison of the results from the three approaches showed that all the three approaches provided some insight into the understanding of factors influencing the watershed development. The descriptive statistics provided a simple analysis of trends of the parameters. It was limited in its ability to show the interrelationships between parameters. The geospatial analysis of the watershed area was useful in understanding the spatial and temporal trends across the watershed area. This analysis can only be used for spatial data with numeric values. The data mining analysis of the watershed area was useful in understanding previously hidden relationships between the parameters influencing the watershed area. This analysis could be used for both spatial and spatio-temporal data analysis. The results obtained through each analysis approach require some expertise to interrogate the effects of changes in the watershed area. The relationships are complex and interrelationships are influencing the effects of parameters. Variation was found in the granularity of the outputs of each approach. It is evident that a combination of the approaches provided the capability to investigate these from general data trends to complex data analysis. Validation of the approaches was made with a similar study carried out by ACIAR funded project. Some validation could be made of the findings from this thesis with the ACIAR based studies. The importance of factors such as groundwater level, watershed zone and rainfall was noted in both studies. Although the ACIAR research was conducted in similar study area, it was limited in its analysis of the effects of upstream/downstream interactions and did not study on the integration of multiple parameters in a robust manner. The research was considered novel in the integration of three different approaches for watershed impact assessment utilising hydrological, socio and environmental parameters for a contiguous watershed data with a spatial and temporal analysis. It was also novel in that it proposed hybrid method of utilising Geospatial analysis and data mining methods together and visualising the output of data mining in a GIS environment. This research proposed a novel integrated technology based framework for impact assessment which comprises datasets, processing, analysis and results components. This framework could be used to develop it as a decision support tool to assess the impacts of watershed development to assist researchers and planners to provide unbiased assessment of the impact of the watershed development from a range of perspectives. The framework can be used at different spatial and temporal scales.

APA, Harvard, Vancouver, ISO, and other styles

17

Delabrière, Alexis. "New approaches for processing and annotations of high-throughput metabolomic data obtained by mass spectrometry." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS359/document.

Full text

Abstract:

La métabolomique est une approche de phénotypage présentant des perspectives prometteuses pour le diagnostic et le suivi de plusieurs pathologies. La technique d'observation la plus utilisée en métabolomique est la spectrométrie de masse (MS). Des développements technologiques récents ont considérablement accru la taille et la complexité des données. Cette thèse s'est concentrée sur deux verrous du traitement de ces données, l'extraction de pics des données brutes et l'annotation des spectres. La première partie de la thèse a porté sur le développement d'un nouvel algorithme de détection de pics pour des données d'analyse par injection en flot continue (Flow Injection Analysis ou FIA), une technique haut-débit. Un modèle dérivé de la physique de l'instrument de mesure prenant en compte la saturation de l'appareil a été proposé. Ce modèle inclut notamment un pic commun à tous les métabolites et un phénomène de saturation spécifique pour chaque ion. Ce modèle a permis de créer une workow qui estime ce pic commun sur des signaux peu bruités, puis l'utilise dans un filtre adapté sur tous les signaux. Son efficacité sur des données réelles a été étudiée et il a été montré que proFIA était supérieur aux algorithmes existants, avait une bonne reproductibilité et était très proche des mesures manuelles effectuées par un expert sur plusieurs types d'appareils. La seconde partie de cette thèse a porté sur le développement d'un outil de détection des similarités structurales d'un ensemble de spectre de fragmentation. Pour ce faire une nouvelle représentation sous forme de graphe a été proposée qui ne nécessite pas de connaître la composition atomique du métabolite. Ces graphes sont de plus une représentation naturelle des spectres MS/MS. Certaines propriétés de ces graphes ont ensuite permis de créer un algorithme efficace de détection des sous graphes fréquents (FSM) basé sur la génération d'arbres couvrants de graphes. Cet outil a été testé sur deux jeux de données différents et a prouvé sa vitesse et son interprétabilité comparé aux algorithmes de l'état de l'art. Ces deux algorithmes ont été implémentés dans des package R, proFIA et mineMS2 disponibles à la communauté
Metabolomics is a phenotyping approach with promising prospects for the diagnosis and monitoring of several diseases. The most widely used observation technique in metabolomics is mass spectrometry (MS). Recent technological developments have significantly increased the size and complexity of data. This thesis focused on two bottlenecks in the processing of these data, the extraction of peaks from raw data and the annotation of MS/MS spectra. The first part of the thesis focused on the development of a new peak detection algorithm for Flow Injection Analysis (FIA) data, a high-throughput metabolomics technique. A model derived from the physics of the mass spectrometer taking into account the saturation of the instrument has been proposed. This model includes a peak common to all metabolites and a specific saturation phenomenon for each ion. This model has made it possible to create a workflow that estimates the common peak on well-behaved signals, then uses it to perform matched filtration on all signals. Its effectiveness on real data has been studied and it has been shown that proFIA is superior to existing algorithms, has good reproducibility and is very close to manual measurements made by an expert on several types of devices. The second part of this thesis focused on the development of a tool for detecting the structural similarities of a set of fragmentation spectra. To do this, a new graphical representation has been proposed, which does not require the metabolite formula. The graphs are also a natural representation of MS/MS spectra. Some properties of these graphs have then made it possible to create an efficient algorithm for detecting frequent subgraphs (FSM) based on the generation of trees covering graphs. This tool has been tested on two different data sets and has proven its speed and interpretability compared to state-of-the-art algorithms. These two algorithms have been implemented in R, proFIA and mineMS2 packages available to the community

APA, Harvard, Vancouver, ISO, and other styles

18

Erdogan, Onur. "Predicting The Disease Of Alzheimer (ad) With Snp Biomarkers And Clinical Data Based Decision Support System Using Data Mining Classification Approaches." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614832/index.pdf.

Full text

Abstract:

Single Nucleotide Polymorphisms (SNPs) are the most common DNA sequence variations where only a single nucleotide (A, T, C, G) in the human genome differs between individuals. Besides being the main genetic reason behind individual phenotypic differences, SNP variations have the potential to exploit the molecular basis of many complex diseases. Association of SNPs subset with diseases and analysis of the genotyping data with clinical findings will provide practical and affordable methodologies for the prediction of diseases in clinical settings. So, there is a need to determine the SNP subsets and patients&rsquo
clinical data which is informative for the prediction or the diagnosis of the particular diseases. So far, there is no established approach for selecting the representative SNP subset and patients&rsquo
clinical data, and data mining methodology that is based on finding hidden and key patterns over huge databases. This approach have the highest potential for extracting the knowledge from genomic datasets and to select the number of SNPs and most effective clinical features for diseases that are informative and relevant for clinical diagnosis. In this study we have applied one of the widely used data mining classification methodology: &ldquo
decision tree&rdquo
for associating the SNP Biomarkers and clinical data with the Alzheimer&rsquo
s disease (AD), which is the most common form of &ldquo
dementia&rdquo
. Different tree construction parameters have been compared for the optimization, and the most efficient and accurate tree for predicting the AD is presented.

APA, Harvard, Vancouver, ISO, and other styles

19

de, Oliveira Lima Elen. "Domain knowledge integration in data mining for churn and customer lifetime value modelling : new approaches and applications." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/65692/.

Full text

Abstract:

The evaluation of the relationship with the customer and related benefits has become a key point for a company’s competitive advantage. Consequently, interest in key concepts, such as customer lifetime value and churn has increased over the years. However, the complexity of building, interpreting and applying customer lifetime value and churn models, creates obstacles for their implementation by companies. A proposed qualitative study demonstrates how companies implement and evaluate the importance of these key concepts, including the use of data mining and domain knowledge, emphasising and justifying the need of more interpretable and acceptable models. Supporting the idea of generating acceptable models, one of the main contributions of this research is to show how domain knowledge can be integrated as part of the data mining process when predicting churn and customer lifetime value. This is done through, firstly, the evaluation of signs in regression models and secondly, the analysis of rules’ monotonicity in decision tables. Decision tables are used for contrasting extracted knowledge, in this case from a decision tree model. An algorithm is presented, which allows verification of whether the knowledge contained in a decision table is in accordance with domain knowledge. In the case of churn, both approaches are applied to two telecom data sets, in order to empirically demonstrate how domain knowledge can facilitate the interpretability of results. In the case of customer lifetime value, both approaches are applied to a catalogue company data set, also demonstrating the interpretability of results provided by the domain knowledge application. Finally, a backtesting framework is proposed for churn evaluation, enabling the validation and monitoring process for the generated churn models.

APA, Harvard, Vancouver, ISO, and other styles

20

Dang, Vinh Q. "Evolutionary approaches for feature selection in biological data." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2014. https://ro.ecu.edu.au/theses/1276.

Full text

Abstract:

Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhang, Zhenyou. "Data Mining Approaches for Intelligent Condition-based Maintenance : A Framework of Intelligent Fault Diagnosis and Prognosis System (IFDPS)." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for produksjons- og kvalitetsteknikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-25148.

Full text

Abstract:

Condition-based Maintenance (CBM) is a maintenance policy that take maintenance action just when need arises with real-time condition monitoring. Intelligent CBM means a CBM system is capable of understanding and making maintenance decisions without human intervention. To achieve this objective, it is needed to detect current conditions of mechanical and electrical systems and predict the fault of the systems accurately. What’s more, the maintenance scheduling need to be optimized to reduce the maintenance cost and improve the reliability, availability and safety based on the results of fault detection and prediction. Data mining is a computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The goal of the data mining is to extract useful information from a data set and transform it into an understandable structure for further use. This thesis develops framework of Intelligent Fault Diagnosis and Prognosis System (IFDPS) for CBM based on Data Mining Techniques. It mainly includes two tasks: the one is to detect and predict the condition of the equipment and the other is to optimize maintenance scheduling accordingly. It contains several phases: sensor selection and its placement optimization, signal processing and feature extraction, fault diagnosis, fault prognosis and predictive maintenance scheduling optimization based on results of fault diagnosis and prognosis. This thesis applies different data mining techniques containing Artificial Neural Network such as Supervised Back-Propagation (SBP) and Self-Organizing Map (SOM), Swarm Intelligence such as Particle Swarm Optimization (PSO), Bee Colony Algorithm (BCA) and Ant Colony Optimization (ACO), and Association Rule (AI) in most of these phases. The outcomes of the thesis can be applied in mechanical and electrical system in industries of manufacturing, wind and hydro power plants.

APA, Harvard, Vancouver, ISO, and other styles

22

Lan, Yang. "Computational Approaches for Time Series Analysis and Prediction. Data-Driven Methods for Pseudo-Periodical Sequences." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4317.

Full text

Abstract:

Time series data mining is one branch of data mining. Time series analysis and prediction have always played an important role in human activities and natural sciences. A Pseudo-Periodical time series has a complex structure, with fluctuations and frequencies of the times series changing over time. Currently, Pseudo-Periodicity of time series brings new properties and challenges to time series analysis and prediction. This thesis proposes two original computational approaches for time series analysis and prediction: Moving Average of nth-order Difference (MANoD) and Series Features Extraction (SFE). Based on data-driven methods, the two original approaches open new insights in time series analysis and prediction contributing with new feature detection techniques. The proposed algorithms can reveal hidden patterns based on the characteristics of time series, and they can be applied for predicting forthcoming events. This thesis also presents the evaluation results of proposed algorithms on various pseudo-periodical time series, and compares the predicting results with classical time series prediction methods. The results of the original approaches applied to real world and synthetic time series are very good and show that the contributions open promising research directions.

APA, Harvard, Vancouver, ISO, and other styles

23

Bischler, Thorsten David [Verfasser], and Cynthia M. [Gutachter] Sharma. "Data mining and software development for RNA-seq-based approaches in bacteria / Thorsten David Bischler ; Gutachter: Cynthia M. Sharma." Würzburg : Universität Würzburg, 2018. http://d-nb.info/1163951714/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Sowan, Bilal I. "Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule Base." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5387.

Full text

Abstract:

Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests. The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets. Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction. The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the ii improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models. A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.
Applied Science University (ASU) of Jordan

APA, Harvard, Vancouver, ISO, and other styles

25

Sowan, Bilal Ibrahim. "Enhancing fuzzy associative rule mining approaches for improving prediction accuracy : integration of fuzzy clustering, apriori and multiple support approaches to develop an associative classification rule base." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5387.

Full text

Abstract:

Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests. The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets. Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction. The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the ii improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models. A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.

APA, Harvard, Vancouver, ISO, and other styles

26

Yildiz, Meliha Yetisgen. "Using statistical and knowledge-based approaches for literature-based discovery /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/7178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Li, Yang. "The time-series approaches in forecasting one-step-ahead cash-flow data of mining companies listed on the Johannesburg Stock Exchange." Thesis, University of the Western Cape, 2007. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_1552_1254470577.

Full text

Abstract:

Previous research pertaining to the financial aspect of the mining industry has focused predominantly on mining products' values and the companies' sensitivity to exchange rates. There has been very little empirical research carries out in the field of the statistical behaviour of mning companies' cash flow data. This paper aimed to study the time-series behaviour of the cash flow data series of JSE listed mining companies.

APA, Harvard, Vancouver, ISO, and other styles

28

Wu, Burton. "New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/46084/1/Burton_Wu_Thesis.pdf.

Full text

Abstract:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

APA, Harvard, Vancouver, ISO, and other styles

29

Gajvelly, Chakravarthy. "Approaches for estimating the Uniqueness of linked residential burglaries." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-11823.

Full text

Abstract:

Context: According to Swedish National Council for Crime Prevention, there is an increase in residential burglary crimes by 2% in 2014 compared to 2013and by 19% in the past decade. Law enforcement agencies could only solve three to five percent of crimes reported in 2012. Multiple studies done in the field of crime analysis report that most of the residential burglaries are committed by relatively small number of offenders. Thus, the law enforcement agencies need toinvestigate the possibility of linking crimes into crime series. Objectives: This study presents the computation of a median crime which is the centre most crime in a crime series calculated using the statistical concept of median. This approach is used to calculate the uniqueness of a crime series consisting of linked residential burglaries. The burglaries are characterised using temporal, spatial features and modus operandi. Methods: Quasi experiment with repeated measures is chosen as research method.The burglaries are linked based on their characteristics(features) by building a statistical model using logistic regression algorithm to formulate estimated crime series. The study uses median crime as an approach for computing the uniqueness of linked burglaries. The measure of uniqueness is compared between estimated series and legally verified known series. In addition, the study compares the uniqueness of estimated and known series to randomly selected crimes. The measure of uniqueness is used to know the feasibility of using the formulated estimated series for investigation by the law bodies. Results: Statistical model built for linking crimes achieved an AUC = 0.964,R 2 = 0.770 and Dxy = 0.900 during internal evaluation and achieved AU C =0.916 for predictions on test data set and AUC = 0.85 for predictions on known series data set. The uniqueness measure of estimated series ranges from 0.526to 0.715, and from 0.359 to 0.442 for known series corresponding to differentseries. The uniqueness of randomly selected crimes ranges from 0.522 to 0.726 for estimated series and from 0.636 to 0.743 for known series. The values obtained are analysed and evaluated using Independent two sample t-test, Cohen’s d and kolmogorov-smirnov test. From this analysis, it is evident that the uniqueness measure for estimated series is high compared to the known series and closely matches with randomly selected crimes. The uniqueness of known series is clearly low compared to both the estimated series and randomly selected crimes. Conclusion: The present study concludes that estimated series formulated using the statistical model has high uniqueness measures and needs to be furtherfiltered to be used by the law bodies.

APA, Harvard, Vancouver, ISO, and other styles

30

Phanse, Shruti. "Study on the performance of ontology based approaches to link prediction in social networks as the number of users increases." Thesis, Kansas State University, 2010. http://hdl.handle.net/2097/6914.

Full text

Abstract:

Master of Science
Department of Computing and Information Sciences
Doina Caragea
Recent advances in social network applications have resulted in millions of users joining such networks in the last few years. User data collected from social networks can be used for various data mining problems such as interest recommendations, friendship recommendations and many more. Social networks, in general, can be seen as a huge directed network graph representing users of the network (together with their information, e.g., user interests) and their interactions (also known as friendship links). Previous work [Hsu et al., 2007] on friendship link prediction has shown that graph features contain important predictive information. Furthermore, it has been shown that user interests can be used to improve link predictions, if they are organized into an explicitly or implicitly ontology [Haridas, 2009; Parimi, 2010]. However, the above mentioned previous studies have been performed using a small set of users in the social network LiveJournal. The goal of this work is to study the performance of the ontology based approach proposed in [Haridas, 2009], when number of users in the dataset is increased. More precisely, we study the performance of the approach in terms of performance for data sets consisting of 1000, 2000, 3000 and 4000 users. Our results show that the performance generally increases with the number of users. However, the problem becomes quickly intractable from a computation time point of view. As a part of our study, we also compare our results obtained using the ontology-based approach [Haridas, 2009] with results obtained with the LDA based approach in [Parimi, 2010], when such results are available.

APA, Harvard, Vancouver, ISO, and other styles

31

Sherzad, Abdul Rahman [Verfasser], Uwe [Akademischer Betreuer] Nestmann, Uwe [Gutachter] Nestmann, Niels [Gutachter] Pinkwart, Sebastian [Gutachter] Bab, and Nazir [Gutachter] Peroz. "Shaping the selection of fields of study in Afghanistan through educational data mining approaches / Abdul Rahman Sherzad ; Gutachter: Uwe Nestmann, Niels Pinkwart, Sebastian Bab, Nazir Peroz ; Betreuer: Uwe Nestmann." Berlin : Technische Universität Berlin, 2018. http://d-nb.info/1164076450/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Wazaefi, Yanal. "Automatic diagnosis of melanoma from dermoscopic images of melanocytic tumors : Analytical and comparative approaches." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4106.

Full text

Abstract:

Le mélanome est la forme la plus grave de cancer de la peau. Cette thèse a contribué au développement de deux approches différentes pour le diagnostic assisté par ordinateur du mélanome : approche analytique et approche comparative.L'approche analytique imite le comportement du dermatologue en détectant les caractéristiques de malignité sur la base de méthodes analytiques populaires dans une première étape, et en combinant ces caractéristiques dans une deuxième étape. Nous avons étudié l’impacte d’un système du diagnostic automatique utilisant des images dermoscopique de lésions cutanées pigmentées sur le diagnostic de dermatologues. L'approche comparative, appelé concept du Vilain Petit Canard (VPC), suppose que les naevus chez le même patient ont tendance à partager certaines caractéristiques morphologiques ainsi que les dermatologues identifient quelques groupes de similarité. VPC est le naevus qui ne rentre dans aucune de ces groupes, susceptibles d'être mélanome
Melanoma is the most serious type of skin cancer. This thesis focused on the development of two different approaches for computer-aided diagnosis of melanoma: analytical approach and comparative approach. The analytical approach mimics the dermatologist’s behavior by first detecting malignancy features based on popular analytical methods, and in a second step, by combining these features. We investigated to what extent the melanoma diagnosis can be impacted by an automatic system using dermoscopic images of pigmented skin lesions. The comparative approach, called Ugly Duckling (UD) concept, assumes that nevi in the same patient tend to share some morphological features so that dermatologists identify a few similarity clusters. UD is the nevus that does not fit into any of those clusters, likely to be suspicious. The goal was to model the ability of dermatologists to build consistent clusters of pigmented skin lesions in patients

APA, Harvard, Vancouver, ISO, and other styles

33

Muñoz, Mas Rafael. "Multivariate approaches in species distribution modelling: Application to native fish species in Mediterranean Rivers." Doctoral thesis, Universitat Politècnica de València, 2018. http://hdl.handle.net/10251/76168.

Full text

Abstract:

This dissertation focused in the comprehensive analysis of the capabilities of some non-tested types of Artificial Neural Networks, specifically: the Probabilistic Neural Networks (PNN) and the Multi-Layer Perceptron (MLP) Ensembles. The analysis of the capabilities of these techniques was performed using the native brown trout (Salmo trutta; Linnaeus, 1758), the bermejuela (Achondrostoma arcasii; Robalo, Almada, Levy & Doadrio, 2006) and the redfin barbel (Barbus haasi; Mertens, 1925) as target species. The analyses focused in the predictive capabilities, the interpretability of the models and the effect of the excess of zeros in the training datasets, which for presence-absence models is directly related to the concept of data prevalence (i.e. proportion of presence instances in the training dataset). Finally, the effect of the spatial scale (i.e. micro-scale or microhabitat scale and meso-scale) in the habitat suitability models and consequently in the e-flow assessment was studied in the last chapter.
Esta tesis se centra en el análisis comprensivo de las capacidades de algunos tipos de Red Neuronal Artificial aún no testados: las Redes Neuronales Probabilísticas (PNN) y los Conjuntos de Perceptrones Multicapa (MLP Ensembles). Los análisis sobre las capacidades de estas técnicas se desarrollaron utilizando la trucha común (Salmo trutta; Linnaeus, 1758), la bermejuela (Achondrostoma arcasii; Robalo, Almada, Levy & Doadrio, 2006) y el barbo colirrojo (Barbus haasi; Mertens, 1925) como especies nativas objetivo. Los análisis se centraron en la capacidad de predicción, la interpretabilidad de los modelos y el efecto del exceso de ceros en las bases de datos de entrenamiento, la así llamada prevalencia de los datos (i.e. la proporción de casos de presencia sobre el conjunto total). Finalmente, el efecto de la escala (micro-escala o escala de microhábitat y meso-escala) en los modelos de idoneidad del hábitat y consecuentemente en la evaluación de caudales ambientales se estudió en el último capítulo.
Aquesta tesis se centra en l'anàlisi comprensiu de les capacitats d'alguns tipus de Xarxa Neuronal Artificial que encara no han estat testats: les Xarxes Neuronal Probabilístiques (PNN) i els Conjunts de Perceptrons Multicapa (MLP Ensembles). Les anàlisis sobre les capacitats d'aquestes tècniques es varen desenvolupar emprant la truita comuna (Salmo trutta; Linnaeus, 1758), la madrilla roja (Achondrostoma arcasii; Robalo, Almada, Levy & Doadrio, 2006) i el barb cua-roig (Barbus haasi; Mertens, 1925) com a especies objecte d'estudi. Les anàlisi se centraren en la capacitat predictiva, interpretabilitat dels models i en l'efecte de l'excés de zeros a la base de dades d'entrenament, l'anomenada prevalença de les dades (i.e. la proporció de casos de presència sobre el conjunt total). Finalment, l'efecte de la escala (micro-escala o microhàbitat i meso-escala) en els models d'idoneïtat de l'hàbitat i conseqüentment en l'avaluació de cabals ambientals es va estudiar a l'últim capítol.
Muñoz Mas, R. (2016). Multivariate approaches in species distribution modelling: Application to native fish species in Mediterranean Rivers [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/76168
TESIS

APA, Harvard, Vancouver, ISO, and other styles

34

Mahamaneerat, Wannapa Kay Shyu Chi-Ren. "Domain-concept mining an efficient on-demand data mining approach /." Diss., Columbia, Mo. : University of Missouri--Columbia, 2008. http://hdl.handle.net/10355/7195.

Full text

Abstract:

Title from PDF of title page (University of Missouri--Columbia, viewed on February 24, 2010). The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Dissertation advisor: Dr. Chi-Ren Shyu. Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

35

Wang, Guan. "Graph-Based Approach on Social Data Mining." Thesis, University of Illinois at Chicago, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3668648.

Full text

Abstract:

Powered by big data infrastructures, social network platforms are gathering data on many aspects of our daily lives. The online social world is reflecting our physical world in an increasingly detailed way by collecting people's individual biographies and their various of relationships with other people. Although massive amount of social data has been gathered, an urgent challenge remain unsolved, which is to discover meaningful knowledge that can empower the social platforms to really understand their users from different perspectives.

Motivated by this trend, my research addresses the reasoning and mathematical modeling behind interesting phenomena on social networks. Proposing graph based data mining framework regarding to heterogeneous data sources is the major goal of my research. The algorithms, by design, utilize graph structure with heterogeneous link and node features to creatively represent social networks' basic structures and phenomena on top of them.

The graph based heterogeneous mining methodology is proved to be effective on a series of knowledge discovery topics, including network structure and macro social pattern mining such as magnet community detection (87), social influence propagation and social similarity mining (85), and spam detection (86). The future work is to consider dynamic relation on social data mining and how graph based approaches adapt from the new situations.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Xiaofeng. "A model-based approach for distributed data mining." HKBU Institutional Repository, 2007. http://repository.hkbu.edu.hk/etd_ra/877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Alkharboush, Nawaf Abdullah H. "A data mining approach to improve the automated quality of data." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/65641/1/Nawaf%20Abdullah%20H_Alkharboush_Thesis.pdf.

Full text

Abstract:

This thesis describes the development of a robust and novel prototype to address the data quality problems that relate to the dimension of outlier data. It thoroughly investigates the associated problems with regards to detecting, assessing and determining the severity of the problem of outlier data; and proposes granule-mining based alternative techniques to significantly improve the effectiveness of mining and assessing outlier data.

APA, Harvard, Vancouver, ISO, and other styles

38

Koperski, Krzysztof. "A progressive refinement approach to spatial data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0024/NQ51882.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Dehghani, M. (Mitra). "Descriptive data mining approach to visualize diabetes behaviour." Master's thesis, University of Oulu, 2014. http://urn.fi/URN:NBN:fi:oulu-201405261502.

Full text

Abstract:

Diabetes mellitus is a chronic disease that imposes unacceptably high human, social and economic costs on all countries. Moreover, minimizing its incidence and prevalence rate as well as its costly and dangerous complications requires effective management. Diabetes management hinges on close cooperation between the patient and health care professionals. However, owing to the increasing prevalence of diabetes, one emerging global trend is to replace traditional face-to-face health care with remote patient monitoring by taking advantage of new advances in electronics, such as wireless sensor networks and body sensors. This significantly reduces the cost and service pressures that health centers are facing, but produces a huge amount of heterogeneous data, confronting us with new challenges related to ‘big data’. One established method of handling the big data challenge is data mining. Data mining provides a variety of techniques to analyze big data in order to discover hidden knowledge. This study is an effort to design and implement a descriptive data mining approach and to devise association rules to visualize diabetes behaviour in combination with specific life style parameters, including physical activity and emotional states, particularly in elderly diabetics. The main goal of this type of data mining is to discover critical time stamps and salient parameters that lead patients either to success or failure in diabetes self-care. The visualization method is aimed at creating sufficient motivation in patients to improve their self-care through life style changes. At the same time, it provides a decision support system for health care professionals to improve diabetes treatment
Diabetes mellitus, joka aiheuttaa inhimillistä, sosiaalista ja taloudellista haittaa globaalisti, vaatii sairauden tehokasta hallintaa vaarallisten komplikaatioiden esiintymisriskin pienentämiseksi. Sairauden hallinta/hoito vaatii läheistä yhteistyötä potilaan ja hoitohenkilökunnan välillä. Koska taudin esiintymistiheys on kasvava, useat maat pyrkivät siirtymään kontaktihoidosta etämonitorointiin käyttämällä hyväksi uusia elektronisia sovelluksia kuten langattomia anturiverkkoja ja kehon antureita. Tämä vähentäisi merkittävästi terveyskeskusten kuormitusta, mutta tuottaisi suuria määriä heterogeenista dataa, jonka asettaa uusia haasteita. Tiedonrikastus, tarjoaa useita tekniikoita piilossa olevan tiedon tutkimiseen. Tässä diplomityössä suunnitellaan ja toteutetaan deskriptiivinen tiedonrikastuslähestymistapa ja assosiaatiosäännöt visualisoimaan diabeteksen käyttäytymistä yhdistämällä elintapaparametreja mukaan lukien diabeetikoiden fyysinen aktiivisuus ja mieliala. Tiedonrikastuksen päämääränä on tutkia kriittiset ajoitukset ja tärkeimmät parametrit, jotka johtavat diabeteksen omahoidon tasapainoon tai epätasapainoon. Visualisointitavan on tarkoitus luoda tarpeeksi motivaatiota potilaalle parantamaan heidän sairautensa hoitotasapainoa muuttamalla elintapoja kuten myös antamalla tukea terveydenhuollon päätöksenteolle hoidon parantamiseksi

APA, Harvard, Vancouver, ISO, and other styles

40

Lawera, Martin Lukas. "Futures prices: Data mining and modeling approaches." Thesis, 2000. http://hdl.handle.net/1911/19526.

Full text

Abstract:

We present a series of models capturing the non-stationarities and dependencies in the variance of yields on natural gas futures. Both univariate and multivariate models are explored, based on the ARIMA and Hidden-Markov methodologies. The models capture the effects uncovered through various data mining techniques including seasonality, age and transaction-time effects. Such effect have been previously described in the literature, but never comprehensively captured for the purpose of modeling. In addition, we have investigated the impact of temporal aggregation, by modeling both the daily and the monthly data. The issue of aggregation has not been explored in the current literature that focused on the daily data with uniformly underwhelming results. We have shown that modifications to current models to allow aggregation leads to improvements in performance. This is demonstrated by comparing the proposed models to the models currently used in the financial markets.

APA, Harvard, Vancouver, ISO, and other styles

41

Williams, James. "Unrealization approaches for privacy preserving data mining." Thesis, 2010. http://hdl.handle.net/1828/3156.

Full text

Abstract:

This thesis contains a critical evaluation of the unrealization approach to privacy preserving data mining. We cover a fair bit of ground, making numerous contributions to the existing literature. First, we present a comprehensive and accurate analysis of the challenges posed by data mining to privacy. Second, we put the unrealization approach on firmer ground by providing proofs of previously unproven claims, using the multi-relational algebra. Third, we extend the unrealization approach to the C4.5 algorithm. Fourth, we evaluate the algorithm's space requirements on three representative data sets. Lastly, we analyse the unrealization approach against various issues identified in the first contribution. Our conclusion is that the unrealization approach to privacy preserving data mining is novel, and capable of addressing some of the major challenges posed by data mining to privacy. Unfortunately, its space and time requirements vitiate its applicability on real-world data sets.

APA, Harvard, Vancouver, ISO, and other styles

42

Lee, P. C., and 李博智. "The Data Mining Approaches to Predict Chronic Diseases." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/60943808979916347850.

Full text

Abstract:

碩士
元智大學
資訊管理學系
90
The objective of this paper is to construct a prediction model for chronic diseases such as Diabetes Mellitus, Hypertension, and Hyperlipidemia through the application of methods in Data Mining using three dimensional human body measurements as a new venture of this research filed. According to records from Department Of Health, Diabetes Mellitus, Hypertension, and Hyperlipidemia were major manifestations among Taiwanese population leading to deaths of top ten causes in Taiwan. These three indications had some characteristics in common as increasing risk with increasing age and sharing the same pool of risk factors in our living environment. Basically, they are such diseases closely related with people’s life-styles as one can predict by some predisposing factors. The ultimate goal of a prediction model is to foresee risk not normally judged by clinicians’ routine works. From the perspectives of preventive medicine, some risk factors were collected from active survey instead of biochemical tests or physical examinations. Especially, the body measurement, life-style variables, and family history of diseases play important roles in predicting a man’s health. As for clinicians’ points of view, a useful predicting model can greatly help on implementation of diagnosis, treatment, and health education. The role of preventive medicine became more important as health insurance system in Taiwan transforming into prospective payment systems. The central role of data Mining uses artificial intelligence, database, and statistical methods to extract meaningful information from puzzles of variables and data. This particular study utilizes both genetic algorithm and case base reasoning in hybrid data mining technology. The research suggests this approach to be easy an effective technique to acquire of knowledge from database. This study has collected 1370 subjects from department of health examination, Chang Gung Memorial Hospital from Jul. 2000 to Jul. 2001 years. Results from predicting selected chronic diseases by anthropometrical and three-dimension measurements are promising and innovative in field of biomedical sciences. Specifically, significant predictors for Hyperlipidemia, Diabetes Mellitus, and Hypertension are wait—hip ratio, waist-profile-area, waist-circum, trunk-surf-area, and left-arm-volume, respectively.

APA, Harvard, Vancouver, ISO, and other styles

43

Lai-Chen, Chen, and 陳來成. "Credit Card Fraud Detection Using Data Mining Approaches." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/51921731728470410399.

Full text

Abstract:

碩士
元智大學
資訊管理學系
90
Credit card transactions and electronic commerce continue to grow in great number. The higher rate of fraudulent account numbers also growing fast in credit card industries that subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of commercial banks and the countries payment system. The prevention of credit card fraud is an application for prediction techniques. This paper shows how data mining techniques and artificial intelligence algorithms can be successfully to obtain a high fraud detection rate. We also describe an AI-based approach that construct and compare predict models separately by case-based reasoning, decision tree and neural network methods for detecting fraud pattern. To ensure proper model construction that concept had to be developed and tested on real credit card data of local bank. The prediction of user behavior and operation transaction can be integrated and implemented on the fraud detection models.

APA, Harvard, Vancouver, ISO, and other styles

44

Yang, Kuo-Tung, and 楊國棟. "Several Heuristic Approaches to Privacy-Preserving Data Mining." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/51265621784508458978.

Full text

Abstract:

碩士
國立高雄大學
資訊工程學系碩士班
98
Data mining technology can help extract useful knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this thesis, we propose three approaches for modifying original databases in order to hide sensitive itemsets. The first one is called SIF-IDF, which is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. It uses the above concept to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The second one is a lattice-based approach, in which a lattice is built based on the relation of sensitive itemsets. The bottom-up deletion strategies is also used to gradually reduce the frequency of sensitive itemsets in the hiding process. The third one is an evolutionary privacy-preserving data mining method to find appropriate transactions to be hidden from a database. The proposed approach designs a flexible evaluation function with three factors, and different weights may be assigned to them depending on users’ preference. Besides, the concept of pre-large itemsets is used to reduce the cost of rescanning databases, thus speeding up the evaluation process of chromosomes. The three proposed approaches can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approaches.

APA, Harvard, Vancouver, ISO, and other styles

45

Chang, Chieh-Hsiang, and 張傑翔. "Apply Data Mining Approaches in Financial Early Warning System." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/99158219835011321311.

Full text

Abstract:

碩士
華梵大學
資訊管理學系碩士班
95
Financial Early warning system can not only help the management of the financial institutions but also diagnose their common operations. Since the early 1970s, many related researches have already made. However, most of them use traditional statistic ways to build the early warning system until recent years. Because of the vigorous development of the data mining techniques, many researches begin to apply those techniques to various fields also including early warning system. Data mining doesn’t need to satisfy many statistical antecedent assumptions and can transform enormous original data into meaningful and useful information. To build the early warning system model, the related financial laws, data, and operation management rules need to be taken into consideration. However, the number of features is too large and not all of them are helpful to prediction. Data sets with unimportant, noisy or high correlated features will significant decrease the classification accuracy rate. By removing these features, the efficiency and accuracy rate can obtain a better result. Back-propagation neural network (BPN), support vector machine (SVM) and decision tree (DT) are well-known data mining techniques, which can be applied to various fields and have higher classification ability. However, data mining techniques may suffer the problem of parameters settings. Bad parameter setting of data mining techniques will result worse accuracy rate. Therefore, this paper utilize one meta-heuristic, particle swarm optimization (PSO), to obtain suitable parameter optimization and select a subset of feature without degrade the classification accuracy rate. By the meta-heuristic global search characteristic, the parameters of BPN, SVM and DT can be optimized and the feature selection can be done at the same time to obtain the minimum set of features which can result in higher accuracy effectively. In order to evaluate the proposed approach, this research taken the report of the Taiwan Ratings to be the authority. The “Condition and Performance of Domestic Banks” from the Central bank of China, Republic of China (Taiwan) and the “Statistics of Financial Institutions” from the Financial Supervisory commission, Executive Yuan are planed to be the source data. Banks will be classified as one of three categories ( ”well”, ”average”, and ”risky”). In the experiment, although BPN and SVM have the high accuracy of forecast, the processes among them are black-box testing. Professionals can’t take these results into their future judgments. By the tree structure which was obtained from the proposed PSO+DT architecture, experts can obtain the best decision rules and thus make further evaluation and correction of our early warning system model. The experiment results shown that our proposed approaches can reduce unnecessarily features and improve classification accuracy significantly.

APA, Harvard, Vancouver, ISO, and other styles

46

Ni, Sheng-Fu, and 倪聖富. "Applying Data Mining Approaches to Churn Prediction in Retailing." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/73165743827586089960.

Full text

Abstract:

碩士
元智大學
企業管理學系
95
Recently, churn attracts increasing attention to customer relationship management (CRM). Moreover, retailers suffer from that customers can switch their suppliers without informing them. The issue of churn prediction has been extensively researched. However, few studies focus on the non-contractual environment like retailing. In this study, we not only apply several classification techniques, such as logistic regression, discriminant analysis, random forests, and artificial neural networks, but propose a combination model of discriminant analysis and back propagation neural network to churn prediction. The percentage correctly classified (PCC) and area under the receiver operating characteristic curve (AUROC) are used for model evaluation in this study. Moreover, we improve the definition of partial defection proposed by the previous literature to solve the problem of churn determination in non-contractual settings. Our findings suggest that: (1) the combination of two techniques outperforms the single technique; (2) variables like promotion, use of the loyalty points, customer interaction, and demographics are shown to be useful for churn prediction.

APA, Harvard, Vancouver, ISO, and other styles

47

Lee, Hong-yu, and 李弘裕. "A Study on Efficient Approaches for Weighted Data Mining." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/95839345197873262085.

Full text

Abstract:

碩士
國立高雄大學
資訊工程學系碩士班
100
Weighted data mining in the field of data mining has been widely discussed in recent years due to its various practical applications. Different from traditional association-rule mining, an item on weighted data mining is flexibly given a suitable weight value to represent its importance in a database, and then weighted frequent itemsets can be found from a database. But, the downward-closure property in association-rule mining can be not kept in the weighted data mining. Although traditional upper-bound model can be applied to achieve the goal, lots of unpromising candidate itemsets still have to be generated by using the traditional model. To address this, we thus develop several efficient methods for mining weighted frequent itemsets and weighted sequential patterns. For the issue of weighted itemset mining, a new upper-bound model, which adopts the maximum weight in a transaction as upper-bound of the transaction, is first proposed to obtain more accurate upper-bound for itemsets. In addition, two effective strategies, pruning and filtering, are designed to further improve the model. To effectively utilize the model and strategies, the two efficient algorithms, projection-based weighted mining algorithms based on the improved upper-bound approach with the pruning strategy and projection-based weighted mining algorithms based on the improved upper-bound approach with effective strategies, are proposed for finding weighted frequent itemsets in databases. On the other hand, the proposed concepts on weighted itemset mining can be further extended to the problem of weighted sequential pattern mining. Finally, the experimental results on the synthetic and real datasets also show the performance of the proposed algorithms outperforms the traditional weighted mining algorithms under various parameter settings.

APA, Harvard, Vancouver, ISO, and other styles

48

Lan, Guo-Cheng, and 藍國誠. "Efficient Approaches for the Filtration Mechanisms of Data Mining." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/57250897808963244132.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
94
In recent years, the technology of data mining is generally applied to the various commercial domains. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. In this thesis, we aim at the four research subjects which are association rule, sequential pattern, traversal pattern, and the correlation about traversal path and purchasing merchandises to propose several efficient approaches. First, we propose new mining approaches of association rule such as EFI and GRA. The one of characters EFI (An Efficient Approach for Filtering Infrequent Itemsets) is the two phase filtration mechanisms. EFI only generates these itemsets which are the most possible to be frequent via the two filtration mechanisms. In the mining frequent itemsets, EFI does not generate candidate sets and scans the database four times, and then finish the mining task quickly. In addition, GRA (Gradation Reduction Approaches) is a level-wise technique. The one of the characters of GRA algorithm is the gradation filtration mechanisms, and the algorithm uses the simple mask method to generate itemsets. In the mining association rules, GRA can avoid generating a huge number of unnecessary itemsets via the gradation filtration mechanisms, and the algorithm does not need to generate the candidate sets, and then finish the mining task quickly. Besides, when the algorithms deal with the very large databases, EFI does not modify any mining process, and then perform the mining task. In addition, we propose another algorithm GRA-M (Gradation Reduction Approaches – Modified Version) which is modified from the GRA algorithm. EFI and GRA-M will first divide the large database into several sub-databases which are loaded in the memory by the algorithms. The algorithms only perform each sub-database four times I/O processes, and then finish the mining task. Next, we also propose the algorithms SFA (Mining Sequential Patterns Using Filtering Approaches) and GRS (Gradation Reduction Approaches for Mining Sequential Patterns) to discover the sequential patterns. Because SFA is extended from the EFI algorithm, the algorithm also only generates these subsequences which are the most possible to be frequent via the two phase filtration mechanisms, and the algorithm scans the database four times without generating candidate sets, and then finishes the mining task. In the same way, GRS is modified from the GRA algorithm. GRS can effectively reduce a huge number of unnecessary subsequences via the gradation filtration mechanisms, and the algorithm does not generate candidate set, and then finishes the mining task. Besides, when the algorithms deal with the very large databases, SFA does not modify any mining process, and then perform the mining task. In addition, we propose another algorithm GRS-M (Gradation Reduction Approaches for Mining Sequential Patterns – Modified Version) which is modified from the GRS algorithm. SFA and GRS-M will first divide the large database into several sub-databases which are loaded in the memory by the algorithms. The algorithms only perform each sub-database four times I/O processes, and then finish the mining task. Now, the e-commerce websites are growing fast at the surprising speed. If we can understand the behaviors of users’ traversal path in the websites, we can make a better target marketing. Therefore, we propose a new algorithm TFA (Mining Traversal Patterns Using Filtering Approaches) to discover the traversal patterns. The one of characters of TFA algorithm is the adjacency filtration mechanisms. TFA can effectively reduce a huge number of unnecessary continuity subsequences via the adjacency filtration mechanisms. The process of generating continuity subsequences is very simple, and the algorithm does not generate candidate sets, and then finishes the mining task. However, if we only consider the traversal path factor, the degree of accuracy is not enough. Some researchers later propose the correlation about traversal path and purchasing merchandises, and then increase the accuracy of patterns. Therefore, we aim at the subject to propose a new algorithm CFA (Mining the Correlation Using Filtering Approaches). The processes of CFA is combined the TFA algorithm with the EFI algorithm. CFA can reduce a huge number of unnecessary continuity subsequences and itemsets via the filtration mechanisms, and then get all of frequent combination patterns quickly. Besides, when the algorithms deal with very large databases, we do not modify the mining process of TFA or CFA, and then perform the mining task. In addition, the algorithms also use the method of dividing database, and the algorithm only perform each sub-database three times I/O processes, and then finish the mining task. In the experimental, we aim at each parameter to compare the performance of algorithms. From the analysis of various results, we can obviously discover that our proposed algorithms have a better the performance and the utility rate of memory.

APA, Harvard, Vancouver, ISO, and other styles

49

Sukchotrat, Thuntee. "Data mining-driven approaches for process monitoring and diagnosis." 2008. http://hdl.handle.net/10106/1827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Bradley, Paul S. "Mathematical programming approaches to machine learning and data mining." 1998. http://catalog.hathitrust.org/api/volumes/oclc/42583739.html.

Full text

Abstract:

Thesis (Ph. D.)--University of Wisconsin--Madison, 1998.
Typescript. eContent provider-neutral record in process. Description based on print version record. Includes bibliographical references (leaves 145-165).

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data Mining Approaches'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles