To see the other types of publications on this topic, follow the link: Hybrid data mining.

Dissertations / Theses on the topic 'Hybrid data mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Hybrid data mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Daglar, Toprak Seda. "A New Hybrid Multi-relational Data Mining Technique." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606150/index.pdf.

Full text
Abstract:
Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this work, we propose a relational concept learning technique, which adopts concept descriptions as associations between the concept and the preconditions to this concept and employs a relational upgrade of association rule mining search heuristic, APRIORI rule, to effectively prune the search space. The proposed system is a hybrid predictive inductive logic system, which utilizes inverse resolution for generalization of concept instances in the presence of background knowledge and refines these general patterns into frequent and strong concept definitions with a modified APRIORI-based specialization operator. Two versions of the system are tested for three real-world learning problems: learning a linearly recursive relation, predicting carcinogenicity of molecules within Predictive Toxicology Evaluation (PTE) challenge and mesh design. Results of the experiments show that the proposed hybrid method is competitive with state-of-the-art systems.
APA, Harvard, Vancouver, ISO, and other styles
2

Seetan, Raed. "A Data Mining Approach to Radiation Hybrid Mapping." Diss., North Dakota State University, 2014. https://hdl.handle.net/10365/27315.

Full text
Abstract:
The task of mapping markers from Radiation Hybrid (RH) mapping experiments is typically viewed as equivalent to the traveling-salesman problem, which has combinatorial complexity. As an additional problem, experiments commonly result in some unreliable markers that reduce the overall map quality. Due to the large numbers of markers in current radiation hybrid populations, the use of the data mining techniques becomes increasingly important for reducing both the computational complexity and the impact of noise of the original data. In this dissertation, a clustering-based approach is proposed for addressing both the problem of filtering unreliable markers (framework maps) and the problem of mapping large numbers of markers (comprehensive maps) efficiently. Traditional approaches for eliminating unreliable markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed algorithms use a divide-and-conquer strategy to construct framework maps based on clusters that exclude unreliable markers. The clusters of markers are ordered using parallel processing and are then combined to form the complete map. Three algorithms are presented that explore the trade-off between the number of markers included in the framework map and placement accuracy. Since the mapping problem is susceptible to noise, it is often beneficial to remove markers that are not trustworthy. Traditional mapping techniques for building comprehensive maps process all markers together, including unreliable markers, in a single-iteration approach. The accuracy of the constructed maps may be reduced. In this research work, two-stage algorithms are proposed to mapping most markers by first creating a framework map of the reliable markers, and then incrementally adding the remaining markers to construct high quality comprehensive maps. All proposed algorithms have been evaluated on several human chromosomes using radiation hybrid datasets with varying sizes, and also the performance of our proposed algorithms is compared with state-of-the-art RH mapping softwares. Overall, the proposed algorithms are not only much faster than the comparative approaches, but that the quality of the resulting maps is also much higher.
APA, Harvard, Vancouver, ISO, and other styles
3

Zall, Davood. "Visual Data Mining : An Approach to Hybrid 3D Visualization." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-16601.

Full text
Abstract:
By increasing the volume and complexity of datasets, Visual Data Mining (VDM), new visualization techniques evolved and new techniques released. However, some of these techniques performing well and cover all expectations; the others failed to save their positions. The main issue of such techniques is problem dependency.In this study, after a short description about necessity of Visual Data Mining techniques, I will provide a classified review of previous researches. This will result in a deep understanding as well as simple accessibility to previous researches, in a concise manner. This will facilitate the extraction of the specifications of 3D visualization technique and will provide a comprehensive knowledge of this technique in a classified manner. After that, all possible combination of 3D visualization technique will review.3D Visualization technique as a popular technique is a concrete foundation for visualization of multi-dimensional datasets, but it has some limitations. To overcome these limitations, previous studies in literature as well as the experiences of professionals will gather. The results will prove the theoretical findings as well as offering new hybrid techniques (combination with 3D visualization and other visual data mining techniques).The contribution of professionals will empower and complement the results of this study, as they can address solutions for the weaknesses of 3D Visualization technique in their business which is new combination of techniques. These combinations of techniques will create the basis for future researches in order to discover new limitations and provide solutions to overcome by use of hybrid techniques.
Program: Magisterutbildning i informatik
APA, Harvard, Vancouver, ISO, and other styles
4

Yang, Pengyi. "Ensemble methods and hybrid algorithms for computational and systems biology." Thesis, The University of Sydney, 2012. https://hdl.handle.net/2123/28979.

Full text
Abstract:
Modern molecular biology increasingly relies on the application of high-throughput technologies for studying the function, interaction, and integration of genes, proteins, and a variety of other molecules on a large scale. The application of those high throughput technologies has led to the exponential growth of biological data, making modern molecular biology a data-intensive science. Huge effort has been directed to the development of robust and efficient computational algorithms in order to make sense of these extremely large and complex biological data, giving rise to several interdisciplinary fields, such as computational and systems biology. Machine learning and data mining are disciplines dealing with knowledge discovery from large data, and their application to computational and systems biology has been extremely fruitful. However, the ever-increasing size and complexity of the biological data require novel computational solutions to be developed. This thesis attempts to contribute to these inter-disciplinary fields by deve10ping and applying different ensemble learning methods and hybrid algorithms for solving a variety of problems in computational and systems biology. Through the study of different types of data generated from a variety of biological systems using different high-throughput approaches, we demonstrate that ensemble learning methods and hybrid algorithms are general, flexible, and highly effective tools for computational and systems biology.
APA, Harvard, Vancouver, ISO, and other styles
5

Theobald, Claire. "Bayesian Deep Learning for Mining and Analyzing Astronomical Data." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0081.

Full text
Abstract:
Dans cette thèse, nous abordons le problème de la confiance que nous pouvons avoir en des systèmes prédictifs de type réseaux profonds selon deux directions de recherche complémentaires. Le premier axe s'intéresse à la capacité d'une IA à estimer de la façon la plus juste possible son degré d'incertitude liée à sa prise de décision. Le second axe quant à lui se concentre sur l'explicabilité de ces systèmes, c'est-à-dire leur capacité à convaincre l'utilisateur humain du bien fondé de ses prédictions. Le problème de l'estimation des incertitudes est traité à l'aide de l'apprentissage profond bayésien. Les réseaux de neurones bayésiens admettent une distribution de probabilité sur leurs paramètres, qui leur permettent d'estimer différents types d'incertitudes. Tout d'abord, l'incertitude aléatoire qui est liée aux données, mais également l'incertitude épistémique qui quantifie le manque de connaissance que le modèle possède sur la distribution des données. Plus précisément, cette thèse propose un modèle de réseau de neurones bayésien capable d'estimer ces incertitudes dans le cadre d'un problème de régression multivarié. Ce modèle est appliqué dans le contexte du projet ANR "AstroDeep'' à la régression des ellipticités complexes sur des images de galaxies. Ces dernières peuvent être corrompues par différences sources de perturbation et de bruit qui peuvent être estimées de manière fiable par les différentes incertitudes. L'exploitation de ces incertitudes est ensuite étendue à la cartographie de galaxies, puis au "coaching'' du réseau de neurones bayésien. Cette dernière technique consiste à générer des données de plus en plus complexes durant l'apprentissage du modèle afin d'en améliorer les performances. Le problème de l'explicabilité est quant à lui abordé via la recherche d'explications contrefactuelles. Ces explications consistent à identifier quels changements sur les paramètres en entrée auraient conduit à une prédiction différente. Notre contribution dans ce domaine s'appuie sur la génération d'explications contrefactuelles basées sur un autoencodeur variationnel (VAE) et sur un ensemble de prédicteurs entrainés sur l'espace latent généré par le VAE. Cette méthode est plus particulièrement adaptée aux données en haute dimension, telles que les images. Dans ce cas précis, nous parlerons d'explications contrefactuelles visuelles. En exploitant à la fois l'espace latent et l'ensemble de prédicteurs, nous arrivons à produire efficacement des explications contrefactuelles visuelles atteignant un degré de réalisme supérieur à plusieurs méthodes de l'état de l'art
In this thesis, we address the issue of trust in deep learning predictive systems in two complementary research directions. The first line of research focuses on the ability of AI to estimate its level of uncertainty in its decision-making as accurately as possible. The second line, on the other hand, focuses on the explainability of these systems, that is, their ability to convince human users of the soundness of their predictions.The problem of estimating the uncertainties is addressed from the perspective of Bayesian Deep Learning. Bayesian Neural Networks assume a probability distribution over their parameters, which allows them to estimate different types of uncertainties. First, aleatoric uncertainty which is related to the data, but also epistemic uncertainty which quantifies the lack of knowledge the model has on the data distribution. More specifically, this thesis proposes a Bayesian neural network can estimate these uncertainties in the context of a multivariate regression task. This model is applied to the regression of complex ellipticities on galaxy images as part of the ANR project "AstroDeep''. These images can be corrupted by different sources of perturbation and noise which can be reliably estimated by the different uncertainties. The exploitation of these uncertainties is then extended to galaxy mapping and then to "coaching'' the Bayesian neural network. This last technique consists of generating increasingly complex data during the model's training process to improve its performance.On the other hand, the problem of explainability is approached from the perspective of counterfactual explanations. These explanations consist of identifying what changes to the input parameters would have led to a different prediction. Our contribution in this field is based on the generation of counterfactual explanations relying on a variational autoencoder (VAE) and an ensemble of predictors trained on the latent space generated by the VAE. This method is particularly adapted to high-dimensional data, such as images. In this case, they are referred as counterfactual visual explanations. By exploiting both the latent space and the ensemble of classifiers, we can efficiently produce visual counterfactual explanations that reach a higher degree of realism than several state-of-the-art methods
APA, Harvard, Vancouver, ISO, and other styles
6

Cheng, Xueqi. "Exploring Hybrid Dynamic and Static Techniques for Software Verification." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/26216.

Full text
Abstract:
With the growing importance of software on which human lives increasingly depend, the correctness requirement of the underlying software becomes especially critical. However, the increasing complexities and sizes of modern software systems pose special challenges on the effectiveness as well as efficiency of software verification. Two major obstacles include the quality of test generation in terms of error detection in software testing and the state space explosion problem in software formal verification (model checking). In this dissertation, we investigate several hybrid techniques that explore dynamic (with program execution), static (without program execution) as well as the synergies of multiple approaches in software verification from the perspectives of testing and model checking. For software testing, a new simulation-based internal variable range coverage metric is proposed with the goal of enhancing the error detection capability of the generated test data when applied as the target metric. For software model checking, we utilize various dynamic analysis methods, such as data mining, swarm intelligence (ant colony optimization), to extract useful high-level information from program execution data. Despite being incomplete, dynamic program execution can still help to uncover important program structure features and variable correlations. The extracted knowledge, such as invariants in different forms, promising control flows, etc., is then used to facilitate code-level program abstraction (under-approximation/over-approximation), and/or state space partition, which in turn improve the performance of property verification. In order to validate the effectiveness of the proposed hybrid approaches, a wide range of experiments on academic and real-world programs were designed and conducted, with results compared against the original as well as the relevant verification methods. Experimental results demonstrated the effectiveness of our methods in improving the quality as well as performance of software verification. For software testing, the newly proposed coverage metric constructed based on dynamic program execution data is able to improve the quality of test cases generated in terms of mutation killing â a widely applied measurement for error detection. For software model checking, the proposed hybrid techniques greatly take advantage of the complementary benefits from both dynamic and static approaches: the lightweight dynamic techniques provide flexibility in extracting valuable high-level information that can be used to guide the scope and the direction of static reasoning process. It consequently results in significant performance improvement in software model checking. On the other hand, the static techniques guarantee the completeness of the verification results, compensating the weakness of dynamic methods.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
7

Viademonte, da Rosa Sérgio I. (Sérgio Ivan) 1964. "A hybrid model for intelligent decision support : combining data mining and artificial neural networks." Monash University, School of Information Management and Systems, 2004. http://arrow.monash.edu.au/hdl/1959.1/5159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

pande, anurag. "ESTIMATION OF HYBRID MODELS FOR REAL-TIME CRASH RISK ASSESSMENT ON FREEWAYS." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3016.

Full text
Abstract:
Relevance of reactive traffic management strategies such as freeway incident detection has been diminishing with advancements in mobile phone usage and video surveillance technology. On the other hand, capacity to collect, store, and analyze traffic data from underground loop detectors has witnessed enormous growth in the recent past. These two facts together provide us with motivation as well as the means to shift the focus of freeway traffic management toward proactive strategies that would involve anticipating incidents such as crashes. The primary element of proactive traffic management strategy would be model(s) that can separate 'crash prone' conditions from 'normal' traffic conditions in real-time. The aim in this research is to establish relationship(s) between historical crashes of specific types and corresponding loop detector data, which may be used as the basis for classifying real-time traffic conditions into 'normal' or 'crash prone' in the future. In this regard traffic data in this study were also collected for cases which did not lead to crashes (non-crash cases) so that the problem may be set up as a binary classification. A thorough review of the literature suggested that existing real-time crash 'prediction' models (classification or otherwise) are generic in nature, i.e., a single model has been used to identify all crashes (such as rear-end, sideswipe, or angle), even though traffic conditions preceding crashes are known to differ by type of crash. Moreover, a generic model would yield no information about the collision most likely to occur. To be able to analyze different groups of crashes independently, a large database of crashes reported during the 5-year period from 1999 through 2003 on Interstate-4 corridor in Orlando were collected. The 36.25-mile instrumented corridor is equipped with 69 dual loop detector stations in each direction (eastbound and westbound) located approximately every ½ mile. These stations report speed, volume, and occupancy data every 30-seconds from the three through lanes of the corridor. Geometric design parameters for the freeway were also collected and collated with historical crash and corresponding loop detector data. The first group of crashes to be analyzed were the rear-end crashes, which account to about 51% of the total crashes. Based on preliminary explorations of average traffic speeds; rear-end crashes were grouped into two mutually exclusive groups. First, those occurring under extended congestion (referred to as regime 1 traffic conditions) and the other which occurred with relatively free-flow conditions (referred to as regime 2 traffic conditions) prevailing 5-10 minutes before the crash. Simple rules to separate these two groups of rear-end crashes were formulated based on the classification tree methodology. It was found that the first group of rear-end crashes can be attributed to parameters measurable through loop detectors such as the coefficient of variation in speed and average occupancy at stations in the vicinity of crash location. For the second group of rear-end crashes (referred to as regime 2) traffic parameters such as average speed and occupancy at stations downstream of the crash location were significant along with off-line factors such as the time of day and presence of an on-ramp in the downstream direction. It was found that regime 1 traffic conditions make up only about 6% of the traffic conditions on the freeway. Almost half of rear-end crashes occurred under regime 1 traffic regime even with such little exposure. This observation led to the conclusion that freeway locations operating under regime 1 traffic may be flagged for (rear-end) crashes without any further investigation. MLP (multilayer perceptron) and NRBF (normalized radial basis function) neural network architecture were explored to identify regime 2 rear-end crashes. The performance of individual neural network models was improved by hybridizing their outputs. Individual and hybrid PNN (probabilistic neural network) models were also explored along with matched case control logistic regression. The stepwise selection procedure yielded the matched logistic regression model indicating the difference between average speeds upstream and downstream as significant. Even though the model provided good interpretation, its classification accuracy over the validation dataset was far inferior to the hybrid MLP/NRBF and PNN models. Hybrid neural network models along with classification tree model (developed to identify the traffic regimes) were able to identify about 60% of the regime 2 rear-end crashes in addition to all regime 1 rear-end crashes with a reasonable number of positive decisions (warnings). It translates into identification of more than ¾ (77%) of all rear-end crashes. Classification models were then developed for the next most frequent type, i.e., lane change related crashes. Based on preliminary analysis, it was concluded that the location specific characteristics, such as presence of ramps, mile-post location, etc. were not significantly associated with these crashes. Average difference between occupancies of adjacent lanes and average speeds upstream and downstream of the crash location were found significant. The significant variables were then subjected as inputs to MLP and NRBF based classifiers. The best models in each category were hybridized by averaging their respective outputs. The hybrid model significantly improved on the crash identification achieved through individual models and 57% of the crashes in the validation dataset could be identified with 30% warnings. Although the hybrid models in this research were developed with corresponding data for rear-end and lane-change related crashes only, it was observed that about 60% of the historical single vehicle crashes (other than rollovers) could also be identified using these models. The majority of the identified single vehicle crashes, according to the crash reports, were caused due to evasive actions by the drivers in order to avoid another vehicle in front or in the other lane. Vehicle rollover crashes were found to be associated with speeding and curvature of the freeway section; the established relationship, however, was not sufficient to identify occurrence of these crashes in real-time. Based on the results from modeling procedure, a framework for parallel real-time application of these two sets of models (rear-end and lane-change) in the form of a system was proposed. To identify rear-end crashes, the data are first subjected to classification tree based rules to identify traffic regimes. If traffic patterns belong to regime 1, a rear-end crash warning is issued for the location. If the patterns are identified to be regime 2, then they are subjected to hybrid MLP/NRBF model employing traffic data from five surrounding traffic stations. If the model identifies the patterns as crash prone then the location may be flagged for rear-end crash, otherwise final check for a regime 2 rear-end crash is applied on the data through the hybrid PNN model. If data from five stations are not available due to intermittent loop failures, the system is provided with the flexibility to switch to models with more tolerant data requirements (i.e., model using traffic data from only one station or three stations). To assess the risk of a lane-change related crash, if all three lanes at the immediate upstream station are functioning, the hybrid of the two of the best individual neural network models (NRBF with three hidden neurons and MLP with four hidden neurons) is applied to the input data. A warning for a lane-change related crash may be issued based on its output. The proposed strategy is demonstrated over a complete day of loop data in a virtual real-time application. It was shown that the system of models may be used to continuously assess and update the risk for rear-end and lane-change related crashes. The system developed in this research should be perceived as the primary component of proactive traffic management strategy. Output of the system along with the knowledge of variables critically associated with specific types of crashes identified in this research can be used to formulate ways for avoiding impending crashes. However, specific crash prevention strategies e.g., variable speed limit and warnings to the commuters demand separate attention and should be addressed through thorough future research.
Ph.D.
Department of Civil and Environmental Engineering
Engineering and Computer Science
Civil Engineering
APA, Harvard, Vancouver, ISO, and other styles
9

Sainani, Varsha. "Hybrid Layered Intrusion Detection System." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_theses/44.

Full text
Abstract:
The increasing number of network security related incidents has made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). Detecting intrusion in a distributed network from outside network segment as well as from inside is a difficult problem. IDSs are expected to analyze a large volume of data while not placing a significant added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel hybrid layered multiagent-based intrusion detection system is created, particularly with the support of a multi-class supervised classification technique. In agent-based IDS, there is no central control and therefore no central point of failure. Agents can detect and take predefined actions against malicious activities, which can be detected with the help of data mining techniques. The proposed IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDSs with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on a multiagent platform along with a supervised classification technique. Applying multiagent technology to the management of network security is a challenging task since it requires the management on different time instances and has many interactions. To facilitate information exchange between different agents in the proposed hybrid layered multiagent architecture, a low cost and low response time agent communication protocol is developed to tackle the issues typically associated with a distributed multiagent system, such as poor system performance, excessive processing power requirement, and long delays. The bandwidth and response time performance of the proposed end-to-end system is investigated through the simulation of the proposed agent communication protocol on our private LAN testbed called Hierarchical Agent Network for Intrusion Detection Systems (HAN-IDS). The simulation results show that this system is efficient and extensible since it consumes negligible bandwidth with low cost and low response time on the network.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Jiapu. "Derivative-free hybrid methods in global optimization and their applications." Thesis, University of Ballarat, 2005. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/34054.

Full text
Abstract:
In recent years large-scale global optimization (GO) problems have drawn considerable attention. These problems have many applications, in particular in data mining and biochemistry. Numerical methods for GO are often very time consuming and could not be applied for high-dimensional non-convex and / or non-smooth optimization problems. The thesis explores reasons why we need to develop and study new algorithms for solving large-scale GO problems .... The thesis presents several derivative-free hybrid methods for large scale GO problems. These methods do not guarantee the calculation of a global solution; however, results of numerical experiments presented in this thesis demonstrate that they, as a rule, calculate a solution which is a global one or close to it. Their applications to data mining problems and the protein folding problem are demonstrated.
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
11

Hussain, Mukhtar. "Data-driven discovery of mode switching conditions to create hybrid models of cyber-physical systems." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/235043/1/Mukhtar_Hussain_Thesis.pdf.

Full text
Abstract:
Models are essential tools for evaluating a system’s behaviour under different scenarios. However, in industrial practice pre-existing models of cyber-physical systems (CPSs) are not always available because CPSs can be legacy systems which are subject to changes and upgrades over time that may not be well documented. System identification addresses the problem by creating models from the external observation of a system. This research is concerned with hybrid system identification of CPSs, i.e., building models of dynamic systems switching between different operating modes. This thesis presents methods for discovering data-driven mode switching conditions essential for building such models.
APA, Harvard, Vancouver, ISO, and other styles
12

Barak, Sasan. "Technical and Fundamental Features’ analysis for Stock Market Prediction with Data Mining Methods." Doctoral thesis, Università degli studi di Bergamo, 2019. http://hdl.handle.net/10446/128764.

Full text
Abstract:
Of the most important concerns of market practitioners is future information of the companies which offer stocks. A reliable prediction of the company’s financial status provides a situation for the investor to more confident investments and gaining more profits(Huang, 2012b). Accurately prediction of stocks’ prices has a positive affects into the organizations financial stability (Asadi et al., 2012). Since financial market is complex and has non-linear dynamic systems, its prediction is really challenging (Huang and Tsai, 2009). The steady and amazing progress of computer hardware technology in the past decades has led to large supplies of powerful and affordable computers, data collection equipment, and storage media. This technology provides a great boost to the database and information industry and makes a huge number of databases and information repositories available for transaction management, information retrieval, and data analysis. Data mining are defined as group of algorithms and methods designed to analyze data or to extract patterns in specific categories from data contributing greatly to business strategies, engineering, medical research, and financial areas (Klosgen and Zytkow, 1996). Prediction of stock prices, credit scores, and even bankruptcy potentials are examples of significant applicability of data mining in the field of finance. In this research we are using the potential tools of data mining area for the forecasting the stocks and shares’ prices and future trends. However, there are different approaches in financial forecasting in general and stock market price forecasting in particular including using fundamental analysis, technical analysis, and news via econometric or machine learning algorithms (Atsalakis et al., 2011; Kar et al., 2014), while in the this thesis we will go through all of these methodologies. The structure of the thesis is consist of three papers of the author, published in the ISI journals about using technical and fundamental features for stock market prediction with different algorithms in the data mining as chapter 3 until chapter 5. The thesis exploits different types of financial data set and established three aspects of stock market forecasting via different combination of feature engineering in the finance dataset and machine learning models.
APA, Harvard, Vancouver, ISO, and other styles
13

Paramasivam, Vijayajothi. "Conceptual framework of a novel hybrid methodology between computational fluid dynamics and data mining techniques for medical dataset application." Thesis, Curtin University, 2017. http://hdl.handle.net/20.500.11937/54143.

Full text
Abstract:
This thesis proposes a novel hybrid methodology that couples computational fluid dynamic (CFD) and data mining (DM) techniques that is applied to a multi-dimensional medical dataset in order to study potential disease development statistically. This approach allows an alternate solution for the present tedious and rigorous CFD methodology being currently adopted to study the influence of geometric parameters on hemodynamics in the human abdominal aortic aneurysm. This approach is seen as a “marriage” between medicine and computer domains.
APA, Harvard, Vancouver, ISO, and other styles
14

Cheng, Iunniang. "Hybrid Methods for Feature Selection." TopSCHOLAR®, 2013. http://digitalcommons.wku.edu/theses/1244.

Full text
Abstract:
Feature selection is one of the important data preprocessing steps in data mining. The feature selection problem involves finding a feature subset such that a classification model built only with this subset would have better predictive accuracy than model built with a complete set of features. In this study, we propose two hybrid methods for feature selection. The best features are selected through either the hybrid methods or existing feature selection methods. Next, the reduced dataset is used to build classification models using five classifiers. The classification accuracy was evaluated in terms of the area under the Receiver Operating Characteristic (ROC) curve (AUC) performance metric. The proposed methods have been shown empirically to improve the performance of existing feature selection methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Lin, Pengpeng. "A Framework for Consistency Based Feature Selection." TopSCHOLAR®, 2009. http://digitalcommons.wku.edu/theses/62.

Full text
Abstract:
Feature selection is an effective technique in reducing the dimensionality of features in many applications where datasets involve hundreds or thousands of features. The objective of feature selection is to find an optimal subset of relevant features such that the feature size is reduced and understandability of a learning process is improved without significantly decreasing the overall accuracy and applicability. This thesis focuses on the consistency measure where a feature subset is consistent if there exists a set of instances of length more than two with the same feature values and the same class labels. This thesis introduces a new consistency-based algorithm, Automatic Hybrid Search (AHS) and reviews several existing feature selection algorithms (ES, PS and HS) which are based on the consistency rate. After that, we conclude this work by conducting an empirical study to a comparative analysis of different search algorithms.
APA, Harvard, Vancouver, ISO, and other styles
16

Alsalama, Ahmed. "A Hybrid Recommendation System Based on Association Rules." TopSCHOLAR®, 2013. http://digitalcommons.wku.edu/theses/1250.

Full text
Abstract:
Recommendation systems are widely used in e-commerce applications. Theengine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of users and a small number of items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.
APA, Harvard, Vancouver, ISO, and other styles
17

Pagnossim, José Luiz Maturana. "Uma abordagem híbrida para sistemas de recomendação de notícias." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-07062018-101232/.

Full text
Abstract:
Sistemas de Recomendação (SR) são softwares capazes de sugerir itens aos usuários com base no histórico de interações de usuários ou por meio de métricas de similaridade que podem ser comparadas por item, usuário ou ambos. Existem diferentes tipos de SR e dentre os que despertam maior interesse deste trabalho estão: SR baseados em conteúdo; SR baseados em conhecimento; e SR baseado em filtro colaborativo. Alcançar resultados adequados às expectativas dos usuários não é uma meta simples devido à subjetividade inerente ao comportamento humano, para isso, SR precisam de soluções eficientes e eficazes para: modelagem dos dados que suportarão a recomendação; recuperação da informação que descrevem os dados; combinação dessas informações dentro de métricas de similaridade, popularidade ou adequabilidade; criação de modelos descritivos dos itens sob recomendação; e evolução da inteligência do sistema de forma que ele seja capaz de aprender a partir da interação com o usuário. A tomada de decisão por um sistema de recomendação é uma tarefa complexa que pode ser implementada a partir da visão de áreas como inteligência artificial e mineração de dados. Dentro da área de inteligência artificial há estudos referentes ao método de raciocínio baseado em casos e da recomendação baseada em casos. No que diz respeito à área de mineração de dados, os SR podem ser construídos a partir de modelos descritivos e realizar tratamento de dados textuais, constituindo formas de criar elementos para compor uma recomendação. Uma forma de minimizar os pontos fracos de uma abordagem, é a adoção de aspectos baseados em uma abordagem híbrida, que neste trabalho considera-se: tirar proveito dos diferentes tipos de SR; usar técnicas de resolução de problemas; e combinar recursos provenientes das diferentes fontes para compor uma métrica unificada a ser usada para ranquear a recomendação por relevância. Dentre as áreas de aplicação dos SR, destaca-se a recomendação de notícias, sendo utilizada por um público heterogêneo, amplo e exigente por relevância. Neste contexto, a presente pesquisa apresenta uma abordagem híbrida para recomendação de notícias construída por meio de uma arquitetura implementada para provar os conceitos de um sistema de recomendação. Esta arquitetura foi validada por meio da utilização de um corpus de notícias e pela realização de um experimento online. Por meio do experimento foi possível observar a capacidade da arquitetura em relação aos requisitos de um sistema de recomendação de notícias e também confirmar a hipótese no que se refere à privilegiar recomendações com base em similaridade, popularidade, diversidade, novidade e serendipidade. Foi observado também uma evolução nos indicadores de leitura, curtida, aceite e serendipidade conforme o sistema foi acumulando histórico de preferências e soluções. Por meio da análise da métrica unificada para ranqueamento foi possível confirmar sua eficácia ao verificar que as notícias melhores colocadas no ranqueamento foram as mais aceitas pelos usuários
Recommendation Systems (RS) are software capable of suggesting items to users based on the history of user interactions or by similarity metrics that can be compared by item, user, or both. There are different types of RS and those which most interest in this work are content-based, knowledge-based and collaborative filtering. Achieving adequate results to user\'s expectations is a hard goal due to the inherent subjectivity of human behavior, thus, the RS need efficient and effective solutions to: modeling the data that will support the recommendation; the information retrieval that describes the data; combining this information within similarity, popularity or suitability metrics; creation of descriptive models of the items under recommendation; and evolution of the systems intelligence to learn from the user\'s interaction. Decision-making by a RS is a complex task that can be implemented according to the view of fields such as artificial intelligence and data mining. In the artificial intelligence field there are studies concerning the method of case-based reasoning that works with the principle that if something worked in the past, it may work again in a new similar situation the one in the past. The case-based recommendation works with structured items, represented by a set of attributes and their respective values (within a ``case\'\' model), providing known and adapted solutions. Data mining area can build descriptive models to RS and also handle, manipulate and analyze textual data, constituting one option to create elements to compose a recommendation. One way to minimize the weaknesses of an approach is to adopt aspects based on a hybrid solution, which in this work considers: taking advantage of the different types of RS; using problem-solving techniques; and combining resources from different sources to compose a unified metric to be used to rank the recommendation by relevance. Among the RS application areas, news recommendation stands out, being used by a heterogeneous public, ample and demanding by relevance. In this context, the this work shows a hybrid approach to news recommendations built through a architecture implemented to prove the concepts of a recommendation system. This architecture has been validated by using a news corpus and by performing an online experiment. Through the experiment it was possible to observe the architecture capacity related to the requirements of a news recommendation system and architecture also related to privilege recommendations based on similarity, popularity, diversity, novelty and serendipity. It was also observed an evolution in the indicators of reading, likes, acceptance and serendipity as the system accumulated a history of preferences and solutions. Through the analysis of the unified metric for ranking, it was possible to confirm its efficacy when verifying that the best classified news in the ranking was the most accepted by the users
APA, Harvard, Vancouver, ISO, and other styles
18

Jiang, Xinxin. "Mining heterogeneous enterprise data." Thesis, 2018. http://hdl.handle.net/10453/129377.

Full text
Abstract:
University of Technology Sydney. Faculty of Engineering and Information Technology.
Heterogeneity is becoming one of the key characteristics inside enterprise data, because the current nature of globalization and competition stress the importance of leveraging huge amounts of enterprise accumulated data, according to various organizational processes, resources and standards. Effectively deriving meaningful insights from complex large-scaled heterogeneous enterprise data poses an interesting, but critical challenge. The aim of this thesis is to investigate the theoretical foundations of mining heterogeneous enterprise data in light of the above challenges and to develop new algorithms and frameworks that are able to effectively and efficiently consider heterogeneity in four elements of the data: objects, events, context, and domains. Objects describe a variety of business roles and instruments involved in business systems. Object heterogeneity means that object information at both the data and structural level is heterogeneous. The cost-sensitive hybrid neural network (Cs-HNN) proposed leverages parallel network architectures and an algorithm specifically designed for minority classification to generate a robust model for learning heterogeneous objects. Events trace an object’s behaviours or activities. Event heterogeneity reflects the level of variety in business events and is normally expressed in the type and format of features. The approach proposed in this thesis focuses on fleet tracking as a practical example of an application with a high degree of event heterogeneity. Context describes the environment and circumstances surrounding objects and events. Context heterogeneity reflects the degree of diversity in contextual features. The coupled collaborative filtering (CCF) approach proposed in this thesis is able to provide context-aware recommendations by measuring the non-independent and identically distributed (non-IID) relationships across diverse contexts. Domains are the sources of information and reflect the nature of the business or function that has generated the data. The cross-domain deep learning (Cd-DLA) proposed in this thesis provides a potential avenue to overcome the complexity and nonlinearity of heterogeneous domains. Each of the approaches, algorithms, and frameworks for heterogeneous enterprise data mining presented in this thesis outperform the state-of-the-art methods in a range of backgrounds and scenarios, as evidenced by a theoretical analysis, an empirical study, or both. All outcomes derived from this research have been published or accepted for publication, and the follow-up work has also been recognised, which demonstrates scholarly interest in mining heterogeneous enterprise data as a research topic. However, despite this interest, heterogeneous data mining still holds increasing attractive opportunities for further exploration and development in both academia and industry.
APA, Harvard, Vancouver, ISO, and other styles
19

Babu, T. Ravindra. "Large Data Clustering And Classification Schemes For Data Mining." Thesis, 2006. https://etd.iisc.ac.in/handle/2005/440.

Full text
Abstract:
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk. In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost. The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons. In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
APA, Harvard, Vancouver, ISO, and other styles
20

Babu, T. Ravindra. "Large Data Clustering And Classification Schemes For Data Mining." Thesis, 2006. http://hdl.handle.net/2005/440.

Full text
Abstract:
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk. In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost. The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons. In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
APA, Harvard, Vancouver, ISO, and other styles
21

蔡明憲. "A Hybrid Data Mining Model for Customer Retention." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/25689304585306477235.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
90
Competition in the wireless telecommunications industry is fierce. To maintain profitability, wireless carriers must control churn, which is the loss of subscribers who switch from one carrier to another. This thesis proposes a hybrid architecture that tackles the complete customer retention problem, in the sense that it not only predicts churn probability but also proposes retention policies. The architecture works in two modes, namely, the learning and usage modes. In the learning mode, the churn model learner learns potential associations inside the historical subscriber database to form a churn model. The policy model constructor then uses the attributes that appear in the churn model to segment all churners into distinct groups. It is also responsible for developing a specific policy model for each churner group. In the usage mode, the churner predictor uses the churn model to predict the churn probability of a given subscriber. A high churn probability will cause the churner predictor to invoke the policy maker to suggest specific retention policies according to the policy model. Our experiments illustrate that the learned churner model has around 85% of correctness in evaluation. Currently, we have no proper data to evaluate the constructed policy model. The construction process, however, signifies an interesting and important approach toward a better support in retaining possible churners. This work is significant since the state-of-the-art technology only focuses on how to increase the accuracy of churn prediction. They either never touched the issue of retention policies, or only proposed policies according to the path conditions of the decision tree, the churn model. Our policy model construction process goes on step further to investigate the concept of churner groups, which equivalently digs out the associations between the paths of the decision tree. We believe with this in depth knowledge about how churns are related, we can propose better retention policy models for possible churners.
APA, Harvard, Vancouver, ISO, and other styles
22

Tzu-Fan, Tang, and 湯子範. "A hybrid data mining approach for customer relationship management." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/20933792886165601712.

Full text
Abstract:
碩士
國立中正大學
會計與資訊科技研究所
96
It is the fact that current domestic and foreign enterprises have been facing an unprecedented competition. The ‘product-oriented’ model has been transferred to the ‘customer-oriented’ one. This results in the importance of Customer relationship management (CRM). Customer retention is one major problem in CRM. Data mining techniques have been applied to predict the loss of customers (or customer churn). In literature, they have been proven its applicability in customer churn prediction. In this thesis, hybrid data mining methods are developed in order to improve current single prediction models. In particular, two (different) techniques are combined in sequence, which leads to two stages of training or learning. This research considers Self-Organizing Maps (SOM) and Artificial Neural Networks (ANN) for the first component of the hybrid models respectively. Then, the second component as the prediction model to produce the final output is based on ANN. The baseline to be compared with the two hybrid models are based on the single ANN without combining with the fist component. The experimental result shows that hybrid models outperform the baseline model in terms of prediction accuracy. In particular, ANN combined with ANN performs the best, which provides 93% prediction accuracy. in addition, it provides the lowest Type I and II error rates.
APA, Harvard, Vancouver, ISO, and other styles
23

Yang, Ren-fu, and 楊仁富. "Hybrid Data Mining and MSVM for Short Term Load Forecasting." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/28661440234858506249.

Full text
Abstract:
碩士
國立中山大學
電機工程學系研究所
98
The accuracy of load forecast has a significant impact for power companies on executing the plan of power development, reducing operating costs and providing reliable power to the client. Short-term load forecasting is to forecast load demand for the duration of one hour or less. This study presents a new approach to process load forecasting. A Support Vector Machine (SVM) was used for the initial load estimation. Particle Swarm Optimization (PSO) was then adopted to search for optimal parameters for the SVM. In doing the load forecast, training data is the most important factor to affect the calculation time. Using more data for model training should provide a better forecast results, but it needs more computing time and is less efficient. Applications of data mining can provide means to reduce the data requirement and the computing time. The proposed Modified Support Vector Machines approach can be proved to provide a more accurate load forecasting.
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Lei Chun, and 陳蕾淳. "A Hybrid Data Mining Model in Analyzing Corporate Social Responsibility." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/07858790424620614969.

Full text
Abstract:
碩士
國立暨南國際大學
資訊管理學系
101
Over the past two decades, Corporate Social Responsibility (CSR) has received worldwide attention. Publication of CSR Reports has become the trend for domestic and foreign enterprises. In the constantly changing competition environment, it will be focus of public attention that how enterprises to play the role of corporate citizenship and to achieve a balance in profit, environmental and charitable activities. However, most of previous quantitative studies of CSR concentrate on traditional statistic approaches. The data mining technique has not been widely explored in this area. Thus, this investigation proposed a hybrid data mining CSFSC model integrating data preprocessing approaches, a classification method, and a rule generation mechanism. The data preprocessing approaches include Correlation-based Feature Selection(CFS), Synthetic Minority Over-sampling Technique (SMOTE) and Fuzzy C-Means (FCM) clustering algorithm. The One-Against-One Support Vector Machine (OAOSVM) method was employed as a classifier for performing multi-classification task. The rule-based learning algorithm C5.0 was utilized to generate rules from the results of OAOSVM model. CSR data collected from China’s Listed Firms in 2010 were employed to examine the performance of the proposed model. The empirical results showed that the designed CSFSC model can yield satisfactory classification accuracy as well as provide rules for decision makers. Therefore, the presented CSFSC model is a feasible and effective alternative in analyzing CSR.
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Chia-Hsun, and 李嘉訓. "A Hybrid Data Mining Approach to Quality Control of Machining Process." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/31833725094107767456.

Full text
Abstract:
碩士
國立暨南國際大學
資訊管理學系
94
Nowadays quality is one of the best sources of competitive advantage. High quality performance is becoming of critical importance. Quality control is a process employed to ensure a certain level of quality in a product or service. One of the techniques in quality control is to predict the product quality abased on the product features. However, traditional quality control techniques have some weaknesses such as specific control limits, heavily on the collection and analysis of data and uncertainty processing. In order to promote the effectiveness of quality control, an agent-based hybrid approach incorporated with the rough set theory (RST), fuzzy logic and genetic algorithm is proposed in this thesis. In this agent-based system, each agent is able to perform one or more functionality in three stages: The feature & rule extraction stage is a RST procedure which used to extract significant features and decision rules. The quality prediction stage is used to develop a FLS to predict machining part quality. The optimization stage is to search the optimal solution of the FLS.
APA, Harvard, Vancouver, ISO, and other styles
26

Lu, Chi-Jie, and 呂奇傑. "Hybrid Neural Network Classification Techniques in the Application of Data Mining." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/37742016816981561250.

Full text
Abstract:
碩士
輔仁大學
應用統計學研究所
89
Data mining is the art of finding patterns in data and is a new approach based on a general recognition that there is undraped value in large databases and utilities data-driven extraction of information. However, it is still not easy to identify the complicate relationship in the huge data set. Moreover, in most case, the estimation of parameters or the classification results can not really describe the realization of business modeling. The artificial neural network is becoming a very popular alternative in prediction and classification task due to its associated memory characteristic and generalization capability. However, neural network has been criticized by its long training process in the application of classification problems. In order to solve the above-mentioned drawback, the proposed study trying to explore the performance of data classification by integrating the artificial neural networks technique with the linear discriminant analysis and fuzzy discriminant analysis approach respectively. To demonstrate the inclusions of the classification results from the linear discriminant and fuzzy discriminant analysis would improve the classification accuracy of the designed neural networks, classification tasks are performed on two data sets, the often used Iris data and one practical bank credit card data. As the results reveal, the two proposed integrated approach provides a better initial solution and hence converges much faster than the conventional neural networks. Besides, in comparison with the traditional neural network approach, the classification accuracies increase for both cases in terms of the two proposed methodology. Moreover, the superiority of the proposed technique can be observed by comparing the classification results using only linear discriminant or fuzzy discrimintant analysis approaches.
APA, Harvard, Vancouver, ISO, and other styles
27

Chen, Hsiao-ming, and 陳小明. "Prevention of Drug Dispensing Errors by Using Hybrid Data Mining Approaches." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/61860258367373868836.

Full text
Abstract:
碩士
國立成功大學
資訊工程學系碩博士班
96
One important issue in medical care is the prevention of drug dispensing errors since they caused numerous injuries and deaths with expensive cost. In this thesis, we propose a hybrid data mining approach with an implemented system to solve this problem. Our approach consists of two main modules, HDMmodel and HDMclustering. In HDMmodel, J48 and logistic regression are used to derive the decision tree and regression function from the given dispensing error cases and drug database. In HDMclustering, similar drugs, which are easily confused with each other, are then gathered together into clusters by the clustering technique named PoCluster and the extracted logistic regression function. Risky drug pairs that may cause dispensing errors are then alerted in our implemented system with interpretable prevention rules. Finally, by the experimental evaluation on real datasets in a medical center, our approach is shown to be capable of diagnosing the potential dispensing errors effectively.
APA, Harvard, Vancouver, ISO, and other styles
28

Fan, Ching-Yi, and 范景怡. "Applying Data Mining Techniques to Combine Predictions in Hybrid Recommender Systems." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/92064178136185259961.

Full text
Abstract:
碩士
中國文化大學
資訊管理學系碩士在職專班
101
Nowadays, the Recommender System has been developed in several different ways for operating. The main techniques are used to develop Recommender System: CB (Content-Based), CF (Collaborative Filtering) and DF (Demographic Filtering). However, each technique has its advantages and limitations. For this reason, many scholars have proposed combine several techniques, intended to reduce the disadvantages of a single method, and achieve more precise recommendation. Currently, the main techniques are used to develop Recommender System, mostly according to the experience of the past research or heuristic method. It lacks of rigorous theoretical foundation. Therefore, this study hopes to use the concepts of CB and CF, plus DF techniques, combining the Data Mining techniques (i.e., Linear Regression, Neural Networks) with the predication. To sum up, it will provide a more accurate prediction than one single technique, and overcome the limitations of each respective potential problem.
APA, Harvard, Vancouver, ISO, and other styles
29

Shu, I.-Ping, and 徐一平. "Study of Hybrid Data Mining Techniques Applied for Filtering Spam Mail." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/00992363360301001047.

Full text
Abstract:
碩士
華梵大學
資訊管理學系碩士班
97
The network has been established and developed since 1970; people have generally used the network. People artificially delivered mail before, but this tendency was transferred to E-mail. The time and distance of communication were decreased by E-mail, and E-mail gradually changed our live and working way. At this moment, some beneficial people use the malicious programs or collect the email boxes in many ways, then send email arbitrarily. It has been perplexed to the receiver. This study (GA/DT) adopts the genetic algorithm (Genetic Algorithms, GA) and decision tree (Decision Tree, DT) of data mining techniques to select Minimum Case and Pruning CF parameters. Experiment results indicate that the accuracy of the hybrid GA/DT algorithm is 95.0%. In other algorithms, Logistic Algorithm has a better accuracy of 5.0% and ANN has an accuracy of 2.337 % and SVM has an accuracy of 4.0 %, and it shows that the GA/DT algorithm can accurately select the Minimum Case and Pruning CF parameters in the DT algorithm and effectively enhance the performance of identifying spam mails.
APA, Harvard, Vancouver, ISO, and other styles
30

呂奇傑. "Hybrid Neural Network Classification Techniques in the Application of Data Mining." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/76303042740984553449.

Full text
Abstract:
碩士
輔仁大學
應用統計研究所
89
Data mining is the art of finding patterns in data and is a new approach based on a general recognition that there is undraped value in large databases and utilities data-driven extraction of information. However, it is still not easy to identify the complicate relationship in the huge data set. Moreover, in most case, the estimation of parameters or the classification results can not really describe the realization or business modeling. The artificial neural network is becoming a very popular alternative in prediction and classification task due to its associated memory. characteristic and generalization capability. However, neural network has been criticized by its long training process in the application of classification problems. In order to solve the above-mentioned drawback, the proposed study trying to explore the performance of data classification by integrating the artificial neural networks technique with the linear discriminant analysis and fuzzy discriminant analysis approach respectively.   To demonstrate the inclusions of the classification results from the linear discriminant and fuzzy discriminant analysis would improve the classification accuracy of the designed neural networks, classification tasks are performed on two data sets, the often used Iris data and one practical bank credit card data.   As the results reveal, the two proposed integrated approach provides a better initial solution and hence converges much faster than the conventional neural networks. Besides, in comparison with the traditional neural network approach, the classification accuracies increase for both cases in terms of the two proposed methodology. Moreover, the superiority of the proposed technique can be observed by comparing the classification results using only linear discriminant or fuzzy discrimintant analysis approaches.
APA, Harvard, Vancouver, ISO, and other styles
31

Kumar, Nishant. "Sentiment Analysis Using Hybrid Machine Learning Technique." Thesis, 2016. http://ethesis.nitrkl.ac.in/8616/1/2016_MT_214CS3513_Nishant_Kumar.pdf.

Full text
Abstract:
It is observed that consumers often share their opinion, views or feeling about any term used on social network in the form of reviews, comments or feedback. Those feedbacks given by end users have a great impact for evolution of new version of any product. Due to this trend in social media in recent years, sentiment analysis has become an important concern for theoreticians and practitioners Moreover reviews are often written in natural language and are mostly unstructured. Thus, to obtain any meaningful information from these reviews, it needs to be processed. Due to large size of data it is impossible to process this information manually. Hence machine learning algorithms are considered for analysis. Since data are unstructured in nature, unsupervised machine learning algorithm can be helpful in solving this problem. But unsupervised methods have less accuracy; hence not acceptable. In this study, a hybrid machine learning approach is adopted to automatically find the requirements for next version of software. Also some reviews neither belong to positive cluster nor to negative. They mixed reaction or feeling about some topics. Those problem associated with NLP is solved using hybrid technique of the fuzzy c-means and ANN. Moreover in this study, different methods of unsupervised machine leaning algorithm are implemented and their results are compared with each other. The best outcome is used to train the neural network. By using this hybridization technique, accuracy gets increased. And in later stage, this technique is applied to find the new requirement of product.
APA, Harvard, Vancouver, ISO, and other styles
32

Kang, Shu-Tyng, and 康舒婷. "Applying Hybrid Data Mining Approach to Develop a Cerebrovascular Disease Prediction Model." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/70886576446134037009.

Full text
Abstract:
碩士
國立臺灣科技大學
工業管理系
101
With Taiwan’s economic take-off, Taiwanese people gradually placed importance on the health and medical issues. According to the data reported by WHO, stroke has become a big threat of health in the developed countries since 1999. In Taiwan, stroke is the third of the top ten causes of deaths. Therefore, how to prevent and discover stroke is very important issue now. The best way to examine and diagnose stroke is using the brain image examination and the carotid ultrasound. However, the price of these examinations is excessively higher than others. If people didn’t have any advice from doctor; they have to pay all the expenses for these examinations. It’s the main reason that some people are not willing to do these brain examinations. Now, we used the brain examination data provided by one hospital which is located in Taipei. We do feature selection and find out which features are important to the cause of stroke by using a hybrid method which is combined with data mining technology and meta-heuristic algorithm (including genetic algorithm, particle swarm optimization and back-propagation network). Finally, we use these features to develop a cerebrovascular disease prediction model. The cerebrovascular disease prediction model can support doctors to give people some advises whether to do the brain examination or not. People can know the state of their brain health, prevent and cure as soon as possible.
APA, Harvard, Vancouver, ISO, and other styles
33

Herani, Inggi Rengganing, and Inggi Rengganing Herani. "Development of Carotid Artery Diagnostic Prediction Model using Hybrid Data Mining Approach." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/67320389316269101304.

Full text
Abstract:
碩士
國立臺灣科技大學
工業管理系
101
Carotid artery disease is the main caused of disability and death related with stroke or cerebrovascular disease, and in the worldwide medical issue, stroke was responsible for the high number of death. Because there are no symptoms of carotid artery disease, it is important to perform medical test using ultrasound or imaging method to visualize the carotid arteries. This kind of test is uncomfortable, expensive, and has some risks. Therefore, to reduce the risks and economic issue, this research presents method that generates some important information for the doctor to diagnose the carotid artery disease. Hybrid data mining approach is applied to produce some combination models. Dataset in real world are often imbalance. It dominated by normal data and only small percentage of abnormal or sick data. To overcome the imbalance dataset, we used Synthetic Minority Over-Sampling Technique (SMOTE) and Simple K-Means Clustering. While SMOTE is used to over-sampling the minority data, Clustering is used to under-sampling the majority data. Genetic Algorithm and Gain Ratio also used for selecting important features. These methods emphasized on selecting subset of salient features and reduced the number of features. Towards the end, new dataset would be processed using Back Propagation Network (BPN), Naive Bayes, and Decision Tree to predict the accuracy of the disease. Experimental results show that these hybrid methods achieved high accuracy, so it can assist doctors to analyze and predict the presence of carotid artery disease in patients.
APA, Harvard, Vancouver, ISO, and other styles
34

Feng, Hsin-lan, and 馮欣嵐. "Applying a Hybrid Data Mining Approach to Develop a Stroke Prediction Model." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/75650209307133005293.

Full text
Abstract:
碩士
國立臺灣科技大學
工業管理系
101
Stroke has become a big threat of health for people worldwide, the death rate and disable rate of stroke are both high. Therefore, how to prevent stroke and discover it is an important issue now. The best way to examine and discover stroke is the brain image examination and ultrasound, however, the price of these examinations is relatively high. People won’t take these examinations if there is no advice from doctor or no obvious symptom people feel. Consequently, we want to use normal healthy examination that is cheaper and easy to take to be the basic of our research, using hybrid data mining techniques to find the association between normal healthy examination and stroke. And adding some suggestion to the normal healthy examination report, hope to provide more information to the public. We use the brain examination data from 2004 to 2011 to develop a Stroke-Risk-Predicting-Assistance Model by BPN. First, we do the clustering under sampling, and then find the relative feature by rough set theory, information gain and gain ratio. Finally, we use Taguchi method to set the best parameter for BPN. The Stroke-Risk-Predicting-Assistance Model can support doctor to give people some advise whether to do the brain examination or not, And to maximum the value of normal healthy examination. People can know their brain health state, and prevent or cure the stroke as soon as possible.
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Yu-Chung, and 王鈺中. "Evaluating Renewable Energy Policies Using Hybrid Data Mining and Analytic Hierarchy Process Modeling." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/68531396159632779020.

Full text
Abstract:
碩士
國立清華大學
工業工程與工程管理學系
102
When a large percentage of energy (>90%) is generated by fossil fuel, carbon dioxide emissions increase the greenhouse effect. Therefore, renewable, sustainable, and economically viable energy sources are needed as alternatives to fossil fuels. The facilities and installation costs for generating renewable energy is much higher than the cost of fossil fuel facilities. Thus, governments need effective policies, regulations, and incentive programs to promote the usage of renewable energy. Renewable energy can be classified into different categories, including offshore and onshore wind power, photovoltaic solar, and geothermal. The policies used for promoting specific categories vary significantly. These policies depend on the policy goals, regulations, taxation, incentives and promotional schemes. The purpose of this study is to apply clustering techniques and AHP to analyze types of renewable energies and their attributes with respect to economic factors, energy resource and supply, and environmental effects. AHP method is used to evaluate actions that can resolve challenges found in development of renewable energy. The study provides scientific results to help the government plan renewable energy policies. The data for the case study are collected from Taiwan’s renewable energy statistics related to PV cells, wind farms, ocean thermal energy, geothermal energy, hydro power, and solid waste fuels. The research will have four major results and findings. (1) Constructing models for analyzing renewable energy policies using data mining techniques, (2) Using seven categories of renewable energy sources, i.e., wind power, photovoltaic, geothermal and solid waste power in Taiwan, as specific renewable energy types to find the best promotional policy. (3) Providing reliable advice to government (and the means to effectively analyze given scenarios) for policy planning and execution. (4) Giving suggestions of the renewable policy from some benchmarking countries and providing some strategies from another countries.
APA, Harvard, Vancouver, ISO, and other styles
36

Tseng, Jui-Chih, and 曾瑞智. "A Hybrid Data Mining Approach to Construct the Target Customers Choice Reference Model." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/26022623832684281818.

Full text
Abstract:
碩士
大同大學
資訊經營學系(所)
101
Marketing, the prevailing commercial activity of enterprises, is an important strategy to increase customer loyalty and potential customer for more profit. To maximize profit with limited resources, it would be more profitable for enterprises to choose the right target customers. Therefore, it is necessary to build up an efficient, objective and accurate target customer choice model. Using data mining techniques to find the target customers is a traditional way. However, researches in the past mainly focused on finding the high accuracy classifier, but different classifiers perform differently in varied situations. So this study is to propose an integrated choice of target customer model, integrating support vector machine, neural network and K-Means algorithm into a two-phase analysis model. This model is expected to enhance classification accuracy and reduce Type I and Type II errors at the same time. The research results indicate that the integrated model is effective in simultaneously enhancing classification accuracy and reducing Type I and Type II errors.
APA, Harvard, Vancouver, ISO, and other styles
37

蔡永順. "An RFID-based Data Mining Using Hybrid and Heuristic Methods for Quality Management." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/82687441413491623671.

Full text
Abstract:
博士
國立交通大學
資訊管理研究所
101
Many enterprises are confronting global competition and shortened life cycle of new products now. Therefore, if they can not master product quality, they will delay the product development as well as time to market and can not even provide product variety immediately. The data mining can find hidden knowledge patterns in data and enable complex business processes to be understood and reengineered. In addition, RFID can effortlessly turn every object into mobile network nodes which can be tracked, traced, monitored, trigger actions, or respond to action requests. Therefore, this study focused on the integration and application of data mining and RFID system for quality management. The purpose of this study was to propose an RFID-based hybrid heuristic data mining (RHHDM) framework to discover hidden and meaningful knowledge rules for product quality and reengineer the quality management processes in order to help enterprises enhance product quality. The RHHDM framework primarily utilized RFID, manufacturing execution system (MES), genetic algorithm (GA), artificial neural network (ANN) based on back propagation network (BPN) algorithm, decision tree algorithm, and Bayesian classification algorithm. In the RHHDM framework, the MES storing product quality data was the data source of data mining system. This study proposed three types of algorithms to act as the nucleus of data mining engines for data classification and prediction. These algorithms were divided into experiment group and contrast group in the data mining system. In experiment group, this study utilized artificial intelligence approaches to propose hybrid heuristic methods integrating the GA and BPN (GABPN) algorithm. In contrast group, this study utilized statistical approaches to propose the decision tree algorithm and Bayesian classification algorithm. After testing and verifying these algorithms, the best algorithm was selected and utilized to mining hidden product quality knowledge. Then, this study incorporated the discovered product quality knowledge into RFID system in the RHHDM framework. The RFID system could enable the quality management processes to be reengineered. This study actually applied the proposed RHHDM framework to the enterprise in practice in order to test and verify the applicability and effectiveness of the framework. The results of this study were that the proposed RHHDM framework could tightly integrate data mining, RFID system, and quality management processes. The RHHDM framework was applied to improve the traceability and visibility of product quality and make better decisions by the enterprise. According to the experimental results and analyses, the RHHDM framework could actually help the enterprise enhance customer satisfaction, save the cost, enhance the efficiency of the internal process, and enable the organization learning and growth. The proposed RHHDM framework could be also applied to production management, warehouse management, and product recommendation for sale, in order to help enterprises enhance competitiveness.
APA, Harvard, Vancouver, ISO, and other styles
38

Yu, Ting-Yi, and 尤婷藝. "Using A Hybrid Meta-evolutionary Algorithm for Mining Classification Rules Through Microarray Data." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/5ns46j.

Full text
Abstract:
碩士
國立虎尾科技大學
資訊管理研究所
98
With the rapid development of information technology, microarray data is an important field of study for cancer research. However, microarray data is with high dimensional attributes and small sample size resulting in lengthy computation time and low classification accuracy. Due to gene microarray data classification issues, how to get more accurate prediction results with better quality becomes an important area of research. This thesis has proposed a hybrid evolutionary algorithm which combines a genetic algorithm and binary particle swarm optimization with fuzzy discriminate function. The proposed method is used to estimate the fitness value for classification, significant variables extraction, and parameters of fuzzy membership function in the meanwhile. Through the adjustment of the dimension of microarray data and the choice of membership function, fewer significantly characteristic attributes can reach high classification accuracy. Fuzzy rules can also be observed through data attributes and the relationship between categories. To reduce the vast computation time for classification process, this study integrates grid computing technology in the proposed approach. The experimental results show our proposed method can achieve higher classification accuracy and effectively reduces the computation time.
APA, Harvard, Vancouver, ISO, and other styles
39

(7054517), Syed Zahid Hassan. "A novel hybrid data mining approach for knowledge extraction and classification in medical databases." Thesis, 2008. https://figshare.com/articles/thesis/A_novel_hybrid_data_mining_approach_for_knowledge_extraction_and_classification_in_medical_databases/21443082.

Full text
Abstract:

Over the past several years, there has been an explosion in the amount of medical data generated and subsequently collected in medical domain. Data mining techniques have been used extensively in mining the medical data. Obtaining high quality data mining results is very challenging because of the inconsistency of the results of different data mining algorithms and noise in the medical data.

This thesis presents a novel hybrid data mining approach for knowledge extraction and classification in medical databases. The proposed approach is formulated to cluster extracted features from medical databases into soft clusters using unsupervised learning strategies and fuse the decisions using serial and parallel data fusion techniques. The idea is to observe associations in the features and fuse the decisions made by learning algorithms to find the strong clusters which can make impact on overall classification accuracy. The novel techniques such as serial cascaded data fusion, parallel majority-voting based neural data fusion and parallel neural network based data fusion are proposed that allow integration of various clustering algorithms for hybrid data mining approach.

The proposed approach has been implemented and evaluated on the benchmark databases such as Digital Database for Screening Mammograms, Wisconsin Breast Cancer, Pima Indian Diabetics and ECG Heart Arrhythmia.

A comparative performance analysis of the proposed hybrid data mining approach with other existing approaches for knowledge extraction and classification is presented. The experimental results demonstrate the effectiveness of the proposed approach in terms of improved classification accuracy on benchmark medical databases.

APA, Harvard, Vancouver, ISO, and other styles
40

Guo, Mu-Liang, and 郭木良. "A Hybrid System Integrating Data Mining and Artificial Intelligence Approaches for Stock Price Prediction." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/30048291361050573695.

Full text
Abstract:
碩士
國立中正大學
財務金融研究所
102
In this study, we develop a new hybrid stock prediction system by integrating data mining and artificial intelligence techniques. Different from other studies, this study proposes a system that does not predict stock price using these techniques directly. We posit that technical indicators are not always effective. Each indicator is affected by other indicators and fundamentalist factors. Consequently, the proposed system integrates these two techniques to optimize their advantages based on technical and fundamental indicators. We conduct two experiments to examine the prediction ability of the proposed system across different industries. The results reveal that the proposed system is capable of determining the right timing for an investor to avoid extra loss, increase profitability, and decrease trading cost.
APA, Harvard, Vancouver, ISO, and other styles
41

HUANG, TING-XUAN, and 黃婷萱. "A Hybrid Data Mining Model for Analyzing the Association between Diabetes and Breast Cancer." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/59205764816125214266.

Full text
Abstract:
碩士
輔仁大學
企業管理學系管理學碩士班
104
Diabetes is a chronic disease which cannot be cured by medical technology nowadays, it death population created by complications of diabetes increasing year by year, and breast cancer brings huge medical expenses, and it becomes the burden of the National Health Insurance. The relevance between diabetes and cancer is a well-known issue in recent years, among all the cancer, the incidence of breast cancer is the highest in Taiwanese female. Therefore, the purpose of this study is applying data mining techniques to retrospective cohort study the association between diabetes and breast cancer. The proposed disease risk factor analysis model combines under sampling based on clustering (SBC), and classification and regression trees (CART) to construct a disease prediction model. Analysis the databases of national health insurance to explore disease risk factors affecting diabetic patients without breast cancer start dialysis treatment in next two years. Experimental results showed that female patients suffers “diabetes neuropathy” or “Diabetes mellitus with peripheral circulatory disorder”, that it prevalence rate and incidence rate was significantly higher. With this model, it can reduce the effect of big data's class imbalance problem and finding the potential disease risk. The proposed model also can use in different disease and alleviate the burden of National Health Insurance.
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, Chien-Wei, and 陳建維. "Development of Real Time Production Control System in FAB By Hybrid Data Mining Approach." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/60626635834370788788.

Full text
Abstract:
碩士
華梵大學
資訊管理學系碩士班
96
Using machine learning-based real time dispatching rule selection mechanism to develop knowledge bases (KBs) for production control system (PCS) has shown encouraging results in recent research. However, there is still little research focusing on employed real time dispatching rule selection mechanism to improve production performance in semiconductor wafer fabri-cation factories PCS. Moreover, due to short product life cycles, most actual FABs produce multiple products and the product mix changes from time to time. All of earlier work of machine learning-based real time PCS must add new training sample and regenerate KBs periodically. Hence, the machine learning-based PCS is confronted with training data overflow problem and increase dispatching rule selection mechanism KB building time and is not suited for on-line production control. To resolve discussed above problems, the PCS KBs are developed by two phase: SVM-based KB category selection mechanism and SVM-based real time dispatching rule classifier. Therefore, this investigate develops hybrid data mining-based approach includes overall knowledge discovery in data-bases (KDD) processes that comprise six key components: simulation-based training example generation mechanism, data normalization mechanism, GA-based feature selection through SVM classifier, build KB category by two-level self-organizing map (SOM) approach, SVM-based KB category selec-tion mechanism and SVM-based real time dispatching rule classifier to achieve these research goals. At the KB category selection phase, is applied by two-level SOM ap-proach clustering of the unclassified training data such that data with a similar characteristic which is defined as system attribute fall into the same class. Us-ing SVM learning algorithm learn the whole set of training examples with KB class label to construct KB category selection mechanism. The proposed SVM classifier using the hybrid data mining-based approach yields a better system performance than those obtained with a classical machine learning-based dis-patching rule selection mechanism and heuristic individual dispatching rules under various performance criteria over a long period in FABs.
APA, Harvard, Vancouver, ISO, and other styles
43

Li, Jie-Ruei, and 李睿傑. "An Intelligent Vehicular Maintenance and Replacement System in Distribution Services: A Hybrid Data Mining Technique." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/16632688345502903558.

Full text
Abstract:
碩士
輔仁大學
資訊管理學系
97
As e-commerce has grown exponentially, the business of the distribution service is also growing up and expanding quickly for recent years. Namely, e-commerce not only changes customers’ shopping behaviors to bring new opportunities to the B2C marketspace. Nevertheless, from the perspective of merchant-side, the maintenance, repair and operations (MRO) fees of vehicles in distribution service is also increasingly dramatically. In this project, we develop an intelligent maintenance and replacement system to help the manager and technicians conduct preventive maintenance and replacement for vehicles. Practically, we employ association rule mining and sequential pattern mining methods to analyze the relationships and the priorities among vehicles’ components to execute preventive maintenance. Furthermore, we employ C4.5 algorithm of decision tree to predict the vehicles in dangerous based on the historical maintenance lists. Consequently, it can help the manager and technicians to make decision on either repairing the vehicles or sold out them. For taking the advantage of Web-based Platform, we adopt the .Net technique to develop the intelligent system based on the proposed methods; therefore, employees can access the services anytime-anywhere via the Internet. Finally, we will realize the system in the distributions service to evaluate the accuracy and feasibility of the proposed model and system.
APA, Harvard, Vancouver, ISO, and other styles
44

Chauhan, Ajay Singh. "Financial statement fraud detection Model based on Hybrid data mining methods: Proposing an optimized Detection model." Thesis, 2019. http://dspace.dtu.ac.in:8080/jspui/handle/repository/17200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Rodic, Daniel. "A Hybrid heuristic-exhaustive search approach for rule extraction." Diss., 2001. http://hdl.handle.net/2263/25095.

Full text
Abstract:
The topic of this thesis is knowledge discovery and artificial intelligence based knowledge discovery algorithms. The knowledge discovery process and associated problems are discussed, followed by an overview of three classes of artificial intelligence based knowledge discovery algorithms. Typical representatives of each of these classes are presented and discussed in greater detail. Then a new knowledge discovery algorithm, called Hybrid Classifier System (HCS), is presented. The guiding concept behind the new algorithm was simplicity. The new knowledge discovery algorithm is loosely based on schemata theory. It is evaluated against one of the discussed algorithms from each class, namely: CN2; C4.5, BRAINNE and BGP. Results are discussed and compared. A comparison was done using a benchmark of classification problems. These results show that the new knowledge discovery algorithm performs satisfactory, yielding accurate, crisp rule sets. Probably the main strength of the HCS algorithm is its simplicity, so it can be the foundation for many possible future extensions. Some of the possible extensions of the new proposed algorithm are suggested in the final part of this thesis.
Dissertation (MSc)--University of Pretoria, 2007.
Computer Science
unrestricted
APA, Harvard, Vancouver, ISO, and other styles
46

Liu, Minhui. "Multivariate nonnormal regression models, information complexity, and genetic algorithms a three way hybrid for intelligent data mining /." 2006. http://etd.utk.edu/2006/LiuMinhui.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Yang, Chun-Yi, and 楊竣壹. "A Hybrid of Data Mining and Statistical Analysis Approach on Association between Pulmonary Tuberculosis and Lung Cancer." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/45849810654231649880.

Full text
Abstract:
碩士
國立臺灣科技大學
工業管理系
102
Background and objective: Being as a global infectious disease and top 10 most fatal cancers in Taiwan, it is important to acquire the clinical pathology of tuberculosis (TB) and lung cancer. This study explored the association of tuberculosis and lung cancer with other comorbidities and investigated whether any featured attribute could be critical factor in influence of the risk of lung cancer among TB patients by conducting a hybrid data mining and statistical approach. Methods: Study objects were be identified from the NHIRD with diagnosis of tuberculosis between 2000 and 2002 and tracked to 2011. In a cohort of 6,137 patients with tuberculosis and aged over 20 years old, 1,459 patients were divided into middle age group and 3,527 patients were identified as elder age group based on the result of decision tree. Association rule, Cox regression and survival analysis were used for comparison between groups. Results: The incident rate of lung cancer is approximately 4-fold higher in the middle age group than the elder age group (8.45 versus 39.03 per 10,000 person-years). COPD increases the risk of lung cancer in both middle age group (6.64; 95% CI, 2.17-20.33) and elder age group (2.22; 95% CI, 1.52-3.23). The patients in middle age generally have more chance to be free from lung cancer compared to those with elder age in survival analysis (98.9% versus 95.8%, log-rank p < 0.0001). Conclusions: This study provides a comprehensive analysis on impacts of age with comorbidities in risk of lung cancer among tuberculosis patients. The risk may increase further on patients in middle age group than those with elder age.
APA, Harvard, Vancouver, ISO, and other styles
48

(9028061), Chenxi Xiong. "HYBRID FEATURE SELECTION IN NETWORK INTRUSION DETECTION USING DECISION TREE." Thesis, 2020.

Find full text
Abstract:
The intrusion detection system has been widely studied and deployed by researchers for providing better security to computer networks. The increasing of the attack volume and the dramatic advancement of the machine learning make the cooperation between the intrusion detection system and machine learning a hot topic and a promising solution for the cybersecurity. Machine learning usually involves the training process using huge amount of sample data. Since the huge input data may cause a negative effect on the training and detection performance of the machine learning model. Feature selection becomes a crucial technique to rule out the irrelevant and redundant features from the dataset. This study applied a feature selection approach that combines the advanced feature selection algorithms and attacks characteristic features to produce the optimal feature subset for the machine learning model in network intrusion detection. The optimal feature subset was created using the CSE-CIC-IDS2018 dataset, which is the most up-to-date benchmark dataset with comprehensive attack diversity and features. The result of the experiment was produced using machine learning models with decision tree classifier and analyzed with respect to the accuracy, precision, recall, and f1 score.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Shu-Chao, and 王淑昭. "The Factors Affecting Academic Achievement for the 5th and 6th Grade Elementary School Students by Hybrid Data Mining Approach." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/75460218666045377079.

Full text
Abstract:
碩士
華梵大學
資訊管理學系碩士班
96
A total number of 485 5th and 6th grade students of effective samples were all from an elementary school in Taipei country during 2004 to 2006. To resolve student academic achievement problem, this study develops a hybrid Genetic Algorithm/Decision Tree (i.e., GA/DT) approach. Then, the proposed GA/DT approach compares with DT, factors analysis combining DT, and correlation combining DT. The study results indicated that the key attributes of 5th and 6th grade elementary school students academic achievement include mother’s age, father’s academic history, parenting methods, gender, family’s atmosphere, relations of parents, living environment, rank of family, members of family, live with someone, preschool education, inhabit the situation, stature and so on. Among them, mother's age is the most important attribute for the 5th and 6th grade elementary school students in academic achievement. A student’s academic achievement is excellent, if his mother's age over 36 and his father's academic credentials is for the above university. A student’s academic achievement is poor, if his mother's age over 41, his father's academic credentials are high schools or vocational school and his father of parenting style for the others. The prediction system has higher accuracy by hybrid GA/DT algorithm and the average accuracy amount to 69.4%, better than DT 61.2%, factor analysis combining DT 59.1% and correlation combining DT 54.5%.
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Li-Fei, and 陳麗妃. "A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/30869955008789719497.

Full text
Abstract:
博士
國立清華大學
工業工程與工程管理學系
95
Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography