Дисертації з теми "Data Mining Approaches"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Data Mining Approaches".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Liu, Xiao, and Xiao Liu. "Health Data Analytics: Data and Text Mining Approaches for Pharmacovigilance." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/620913.
Повний текст джерелаMa, Yao. "Financial market predictions using Web mining approaches /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20MAY.
Повний текст джерелаOtaki, Keisuke. "Algorithmic Approaches to Pattern Mining from Structured Data." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215673.
Повний текст джерелаKyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第19846号
情博第597号
新制||情||104(附属図書館)
32882
京都大学大学院情報学研究科知能情報学専攻
(主査)教授 山本 章博, 教授 鹿島 久嗣, 教授 阿久津 達也
学位規則第4条第1項該当
Yang, L. "Optimisation approaches for data mining in biological systems." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1473809/.
Повний текст джерелаYun, Unil. "New approaches to weighted frequent pattern mining." Texas A&M University, 2005. http://hdl.handle.net/1969.1/5003.
Повний текст джерелаShao, Huijuan. "Temporal Mining Approaches for Smart Buildings Research." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/84349.
Повний текст джерелаPh. D.
Delpisheh, Elnaz, and University of Lethbridge Faculty of Arts and Science. "Two new approaches to evaluate association rules." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, c2010, 2010. http://hdl.handle.net/10133/2530.
Повний текст джерелаviii, 85 leaves : ill. ; 29 cm
Shen, Shijun. "Approaches to creating anonymous patient database." Morgantown, W. Va. : [West Virginia University Libraries], 2000. http://etd.wvu.edu/templates/showETD.cfm?recnum=1693.
Повний текст джерелаTitle from document title page. Document formatted into pages; contains v, 68 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 67-68).
Mougel, Pierre-Nicolas. "Finding homogeneous collections of dense subgraphs using constraint-based data mining approaches." Thesis, Lyon, INSA, 2012. http://www.theses.fr/2012ISAL0073.
Повний текст джерелаThe work presented in this thesis deals with data mining approaches for the analysis of attributed graphs. An attributed graph is a graph where properties, encoded by means of attributes, are associated to each vertex. In such data, our objective is the discovery of subgraphs formed by several dense groups of vertices that are homogeneous with respect to the attributes. More precisely, we define the constraint-based extraction of collections of subgraphs densely connected and such that the vertices share enough attributes. To this aim, we propose two new classes of patterns along with sound and complete algorithms to compute them efficiently using constraint-based approaches. The first family of patterns, named Maximal Homogeneous Clique Set (MHCS), contains patterns satisfying constraints on the number of dense subgraphs, on the size of these subgraphs, and on the number of shared attributes. The second class of patterns, named Collection of Homogeneous k-clique Percolated components (CoHoP), is based on a relaxed notion of density in order to handle missing values. Both approaches are used for the analysis of scientific collaboration networks and protein-protein interaction networks. The extracted patterns exhibit structures useful in a decision support process. Indeed, in a scientific collaboration network, the analysis of such structures might give hints to propose new collaborations between researchers working on the same subjects. In a protein-protein interaction network, the analysis of the extracted patterns can be used to study the relationships between modules of proteins involved in similar biological situations. The analysis of the performances, on real and synthetic data, with respect to different attributed graph characteristics, shows that the proposed approaches scale well for large datasets
Johansson, Fernstad Sara. "Algorithmically Guided Information Visualization : Explorative Approaches for High Dimensional, Mixed and Categorical Data." Doctoral thesis, Linköpings universitet, Medie- och Informationsteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70860.
Повний текст джерелаSmith, Sydney. "Approaches to Natural Language Processing." Scholarship @ Claremont, 2018. http://scholarship.claremont.edu/cmc_theses/1817.
Повний текст джерелаCurtarolo, Stefano 1969. "Coarse-graining and data mining approaches to the prediction of structures and their dynamics." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/17034.
Повний текст джерелаIncludes bibliographical references (p. 245-263).
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Predicting macroscopic properties of materials starting from an atomistic or electronic level description can be a formidable task due to the many orders of magnitude in length and time scales that need to be spanned. A characteristic of successful approaches to this problem is the systematic coarse-graining of less relevant degrees of freedom in order to obtain Hamiltonians that span larger length and time scale. Attempts to do this in the static regime (i.e. zero temperature) have already been developed, as well as thermodynamical models where all the internal degrees of freedom are removed. In this thesis, we present an approach that leads to a dynamics for thermodynamic-coarse-grained models. This allows us to obtain temperature-dependent and transport properties. The renormalization group theory is used to create new local potential models between nodes, within the approximation of local thermodynamical equilibrium. Assuming that these potentials give an averaged description of node dynamics, we calculate thermal and mechanical properties. If this method can be sufficiently generalized it may form the basis of a Multiscale Molecular Dynamics method with time and spatial coarse-graining. In the second part of the thesis, we analyze the problem of crystal structure prediction, by using quantum calculations.
(cont.) This is a fundamental problem in materials research and development, and it is typically addressed with highly accurate quantum mechanical computations on a small set of candidate structures, or with empirical rules that have been extracted from a large amount of experimental information, but have limited predictive power. In this thesis, we transfer the concept of heuristic rule extraction to a large library of ab-initio calculated information, and demonstrate that this can be developed into a tool for crystal structure prediction. In addition, we analyze the ab-initio results and prediction for a large number of transition-metal binary alloys.
by Stefano Curtarolo.
Ph.D.
Wang, Yunguan. "Data-driven Approaches to Understand Development, Diseases and Identify Therapeutics." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535704902199176.
Повний текст джерелаMohan, Sujaa Rani Park E. K. "Association rule based data mining approaches for Web Cache Maintenance and adaptive Intrusion Detection systems." Diss., UMK access, 2005.
Знайти повний текст джерела"A thesis in computer science." Typescript. Advisor: E.K. Park. Vita. Title from "catalog record" of the print edition Description based on contents viewed March 12, 2007. Includes bibliographical references (leaves 159-162). Online version of the print edition.
Otey, Matthew Eric. "Approaches to Abnormality Detection with Constraints." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1150484039.
Повний текст джерелаNallan, Sreedhar Acharya. "Geospatial and data mining approaches to assess the impact of watershed development in Indian rainfed areas." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2017. https://ro.ecu.edu.au/theses/1980.
Повний текст джерелаDelabrière, Alexis. "New approaches for processing and annotations of high-throughput metabolomic data obtained by mass spectrometry." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS359/document.
Повний текст джерелаMetabolomics is a phenotyping approach with promising prospects for the diagnosis and monitoring of several diseases. The most widely used observation technique in metabolomics is mass spectrometry (MS). Recent technological developments have significantly increased the size and complexity of data. This thesis focused on two bottlenecks in the processing of these data, the extraction of peaks from raw data and the annotation of MS/MS spectra. The first part of the thesis focused on the development of a new peak detection algorithm for Flow Injection Analysis (FIA) data, a high-throughput metabolomics technique. A model derived from the physics of the mass spectrometer taking into account the saturation of the instrument has been proposed. This model includes a peak common to all metabolites and a specific saturation phenomenon for each ion. This model has made it possible to create a workflow that estimates the common peak on well-behaved signals, then uses it to perform matched filtration on all signals. Its effectiveness on real data has been studied and it has been shown that proFIA is superior to existing algorithms, has good reproducibility and is very close to manual measurements made by an expert on several types of devices. The second part of this thesis focused on the development of a tool for detecting the structural similarities of a set of fragmentation spectra. To do this, a new graphical representation has been proposed, which does not require the metabolite formula. The graphs are also a natural representation of MS/MS spectra. Some properties of these graphs have then made it possible to create an efficient algorithm for detecting frequent subgraphs (FSM) based on the generation of trees covering graphs. This tool has been tested on two different data sets and has proven its speed and interpretability compared to state-of-the-art algorithms. These two algorithms have been implemented in R, proFIA and mineMS2 packages available to the community
Erdogan, Onur. "Predicting The Disease Of Alzheimer (ad) With Snp Biomarkers And Clinical Data Based Decision Support System Using Data Mining Classification Approaches." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614832/index.pdf.
Повний текст джерелаclinical data which is informative for the prediction or the diagnosis of the particular diseases. So far, there is no established approach for selecting the representative SNP subset and patients&rsquo
clinical data, and data mining methodology that is based on finding hidden and key patterns over huge databases. This approach have the highest potential for extracting the knowledge from genomic datasets and to select the number of SNPs and most effective clinical features for diseases that are informative and relevant for clinical diagnosis. In this study we have applied one of the widely used data mining classification methodology: &ldquo
decision tree&rdquo
for associating the SNP Biomarkers and clinical data with the Alzheimer&rsquo
s disease (AD), which is the most common form of &ldquo
dementia&rdquo
. Different tree construction parameters have been compared for the optimization, and the most efficient and accurate tree for predicting the AD is presented.
de, Oliveira Lima Elen. "Domain knowledge integration in data mining for churn and customer lifetime value modelling : new approaches and applications." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/65692/.
Повний текст джерелаDang, Vinh Q. "Evolutionary approaches for feature selection in biological data." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2014. https://ro.ecu.edu.au/theses/1276.
Повний текст джерелаZhang, Zhenyou. "Data Mining Approaches for Intelligent Condition-based Maintenance : A Framework of Intelligent Fault Diagnosis and Prognosis System (IFDPS)." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for produksjons- og kvalitetsteknikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-25148.
Повний текст джерелаLan, Yang. "Computational Approaches for Time Series Analysis and Prediction. Data-Driven Methods for Pseudo-Periodical Sequences." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4317.
Повний текст джерелаBischler, Thorsten David [Verfasser], and Cynthia M. [Gutachter] Sharma. "Data mining and software development for RNA-seq-based approaches in bacteria / Thorsten David Bischler ; Gutachter: Cynthia M. Sharma." Würzburg : Universität Würzburg, 2018. http://d-nb.info/1163951714/34.
Повний текст джерелаSowan, Bilal I. "Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule Base." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5387.
Повний текст джерелаApplied Science University (ASU) of Jordan
Sowan, Bilal Ibrahim. "Enhancing fuzzy associative rule mining approaches for improving prediction accuracy : integration of fuzzy clustering, apriori and multiple support approaches to develop an associative classification rule base." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5387.
Повний текст джерелаYildiz, Meliha Yetisgen. "Using statistical and knowledge-based approaches for literature-based discovery /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/7178.
Повний текст джерелаLi, Yang. "The time-series approaches in forecasting one-step-ahead cash-flow data of mining companies listed on the Johannesburg Stock Exchange." Thesis, University of the Western Cape, 2007. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_1552_1254470577.
Повний текст джерелаPrevious research pertaining to the financial aspect of the mining industry has focused predominantly on mining products' values and the companies' sensitivity to exchange rates. There has been very little empirical research carries out in the field of the statistical behaviour of mning companies' cash flow data. This paper aimed to study the time-series behaviour of the cash flow data series of JSE listed mining companies.
Wu, Burton. "New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/46084/1/Burton_Wu_Thesis.pdf.
Повний текст джерелаGajvelly, Chakravarthy. "Approaches for estimating the Uniqueness of linked residential burglaries." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-11823.
Повний текст джерелаPhanse, Shruti. "Study on the performance of ontology based approaches to link prediction in social networks as the number of users increases." Thesis, Kansas State University, 2010. http://hdl.handle.net/2097/6914.
Повний текст джерелаDepartment of Computing and Information Sciences
Doina Caragea
Recent advances in social network applications have resulted in millions of users joining such networks in the last few years. User data collected from social networks can be used for various data mining problems such as interest recommendations, friendship recommendations and many more. Social networks, in general, can be seen as a huge directed network graph representing users of the network (together with their information, e.g., user interests) and their interactions (also known as friendship links). Previous work [Hsu et al., 2007] on friendship link prediction has shown that graph features contain important predictive information. Furthermore, it has been shown that user interests can be used to improve link predictions, if they are organized into an explicitly or implicitly ontology [Haridas, 2009; Parimi, 2010]. However, the above mentioned previous studies have been performed using a small set of users in the social network LiveJournal. The goal of this work is to study the performance of the ontology based approach proposed in [Haridas, 2009], when number of users in the dataset is increased. More precisely, we study the performance of the approach in terms of performance for data sets consisting of 1000, 2000, 3000 and 4000 users. Our results show that the performance generally increases with the number of users. However, the problem becomes quickly intractable from a computation time point of view. As a part of our study, we also compare our results obtained using the ontology-based approach [Haridas, 2009] with results obtained with the LDA based approach in [Parimi, 2010], when such results are available.
Sherzad, Abdul Rahman [Verfasser], Uwe [Akademischer Betreuer] Nestmann, Uwe [Gutachter] Nestmann, Niels [Gutachter] Pinkwart, Sebastian [Gutachter] Bab, and Nazir [Gutachter] Peroz. "Shaping the selection of fields of study in Afghanistan through educational data mining approaches / Abdul Rahman Sherzad ; Gutachter: Uwe Nestmann, Niels Pinkwart, Sebastian Bab, Nazir Peroz ; Betreuer: Uwe Nestmann." Berlin : Technische Universität Berlin, 2018. http://d-nb.info/1164076450/34.
Повний текст джерелаWazaefi, Yanal. "Automatic diagnosis of melanoma from dermoscopic images of melanocytic tumors : Analytical and comparative approaches." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4106.
Повний текст джерелаMelanoma is the most serious type of skin cancer. This thesis focused on the development of two different approaches for computer-aided diagnosis of melanoma: analytical approach and comparative approach. The analytical approach mimics the dermatologist’s behavior by first detecting malignancy features based on popular analytical methods, and in a second step, by combining these features. We investigated to what extent the melanoma diagnosis can be impacted by an automatic system using dermoscopic images of pigmented skin lesions. The comparative approach, called Ugly Duckling (UD) concept, assumes that nevi in the same patient tend to share some morphological features so that dermatologists identify a few similarity clusters. UD is the nevus that does not fit into any of those clusters, likely to be suspicious. The goal was to model the ability of dermatologists to build consistent clusters of pigmented skin lesions in patients
Muñoz, Mas Rafael. "Multivariate approaches in species distribution modelling: Application to native fish species in Mediterranean Rivers." Doctoral thesis, Universitat Politècnica de València, 2018. http://hdl.handle.net/10251/76168.
Повний текст джерелаEsta tesis se centra en el análisis comprensivo de las capacidades de algunos tipos de Red Neuronal Artificial aún no testados: las Redes Neuronales Probabilísticas (PNN) y los Conjuntos de Perceptrones Multicapa (MLP Ensembles). Los análisis sobre las capacidades de estas técnicas se desarrollaron utilizando la trucha común (Salmo trutta; Linnaeus, 1758), la bermejuela (Achondrostoma arcasii; Robalo, Almada, Levy & Doadrio, 2006) y el barbo colirrojo (Barbus haasi; Mertens, 1925) como especies nativas objetivo. Los análisis se centraron en la capacidad de predicción, la interpretabilidad de los modelos y el efecto del exceso de ceros en las bases de datos de entrenamiento, la así llamada prevalencia de los datos (i.e. la proporción de casos de presencia sobre el conjunto total). Finalmente, el efecto de la escala (micro-escala o escala de microhábitat y meso-escala) en los modelos de idoneidad del hábitat y consecuentemente en la evaluación de caudales ambientales se estudió en el último capítulo.
Aquesta tesis se centra en l'anàlisi comprensiu de les capacitats d'alguns tipus de Xarxa Neuronal Artificial que encara no han estat testats: les Xarxes Neuronal Probabilístiques (PNN) i els Conjunts de Perceptrons Multicapa (MLP Ensembles). Les anàlisis sobre les capacitats d'aquestes tècniques es varen desenvolupar emprant la truita comuna (Salmo trutta; Linnaeus, 1758), la madrilla roja (Achondrostoma arcasii; Robalo, Almada, Levy & Doadrio, 2006) i el barb cua-roig (Barbus haasi; Mertens, 1925) com a especies objecte d'estudi. Les anàlisi se centraren en la capacitat predictiva, interpretabilitat dels models i en l'efecte de l'excés de zeros a la base de dades d'entrenament, l'anomenada prevalença de les dades (i.e. la proporció de casos de presència sobre el conjunt total). Finalment, l'efecte de la escala (micro-escala o microhàbitat i meso-escala) en els models d'idoneïtat de l'hàbitat i conseqüentment en l'avaluació de cabals ambientals es va estudiar a l'últim capítol.
Muñoz Mas, R. (2016). Multivariate approaches in species distribution modelling: Application to native fish species in Mediterranean Rivers [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/76168
TESIS
Mahamaneerat, Wannapa Kay Shyu Chi-Ren. "Domain-concept mining an efficient on-demand data mining approach /." Diss., Columbia, Mo. : University of Missouri--Columbia, 2008. http://hdl.handle.net/10355/7195.
Повний текст джерелаWang, Guan. "Graph-Based Approach on Social Data Mining." Thesis, University of Illinois at Chicago, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3668648.
Повний текст джерелаPowered by big data infrastructures, social network platforms are gathering data on many aspects of our daily lives. The online social world is reflecting our physical world in an increasingly detailed way by collecting people's individual biographies and their various of relationships with other people. Although massive amount of social data has been gathered, an urgent challenge remain unsolved, which is to discover meaningful knowledge that can empower the social platforms to really understand their users from different perspectives.
Motivated by this trend, my research addresses the reasoning and mathematical modeling behind interesting phenomena on social networks. Proposing graph based data mining framework regarding to heterogeneous data sources is the major goal of my research. The algorithms, by design, utilize graph structure with heterogeneous link and node features to creatively represent social networks' basic structures and phenomena on top of them.
The graph based heterogeneous mining methodology is proved to be effective on a series of knowledge discovery topics, including network structure and macro social pattern mining such as magnet community detection (87), social influence propagation and social similarity mining (85), and spam detection (86). The future work is to consider dynamic relation on social data mining and how graph based approaches adapt from the new situations.
Zhang, Xiaofeng. "A model-based approach for distributed data mining." HKBU Institutional Repository, 2007. http://repository.hkbu.edu.hk/etd_ra/877.
Повний текст джерелаAlkharboush, Nawaf Abdullah H. "A data mining approach to improve the automated quality of data." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/65641/1/Nawaf%20Abdullah%20H_Alkharboush_Thesis.pdf.
Повний текст джерелаKoperski, Krzysztof. "A progressive refinement approach to spatial data mining." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0024/NQ51882.pdf.
Повний текст джерелаDehghani, M. (Mitra). "Descriptive data mining approach to visualize diabetes behaviour." Master's thesis, University of Oulu, 2014. http://urn.fi/URN:NBN:fi:oulu-201405261502.
Повний текст джерелаDiabetes mellitus, joka aiheuttaa inhimillistä, sosiaalista ja taloudellista haittaa globaalisti, vaatii sairauden tehokasta hallintaa vaarallisten komplikaatioiden esiintymisriskin pienentämiseksi. Sairauden hallinta/hoito vaatii läheistä yhteistyötä potilaan ja hoitohenkilökunnan välillä. Koska taudin esiintymistiheys on kasvava, useat maat pyrkivät siirtymään kontaktihoidosta etämonitorointiin käyttämällä hyväksi uusia elektronisia sovelluksia kuten langattomia anturiverkkoja ja kehon antureita. Tämä vähentäisi merkittävästi terveyskeskusten kuormitusta, mutta tuottaisi suuria määriä heterogeenista dataa, jonka asettaa uusia haasteita. Tiedonrikastus, tarjoaa useita tekniikoita piilossa olevan tiedon tutkimiseen. Tässä diplomityössä suunnitellaan ja toteutetaan deskriptiivinen tiedonrikastuslähestymistapa ja assosiaatiosäännöt visualisoimaan diabeteksen käyttäytymistä yhdistämällä elintapaparametreja mukaan lukien diabeetikoiden fyysinen aktiivisuus ja mieliala. Tiedonrikastuksen päämääränä on tutkia kriittiset ajoitukset ja tärkeimmät parametrit, jotka johtavat diabeteksen omahoidon tasapainoon tai epätasapainoon. Visualisointitavan on tarkoitus luoda tarpeeksi motivaatiota potilaalle parantamaan heidän sairautensa hoitotasapainoa muuttamalla elintapoja kuten myös antamalla tukea terveydenhuollon päätöksenteolle hoidon parantamiseksi
Lawera, Martin Lukas. "Futures prices: Data mining and modeling approaches." Thesis, 2000. http://hdl.handle.net/1911/19526.
Повний текст джерелаWilliams, James. "Unrealization approaches for privacy preserving data mining." Thesis, 2010. http://hdl.handle.net/1828/3156.
Повний текст джерелаLee, P. C., and 李博智. "The Data Mining Approaches to Predict Chronic Diseases." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/60943808979916347850.
Повний текст джерела元智大學
資訊管理學系
90
The objective of this paper is to construct a prediction model for chronic diseases such as Diabetes Mellitus, Hypertension, and Hyperlipidemia through the application of methods in Data Mining using three dimensional human body measurements as a new venture of this research filed. According to records from Department Of Health, Diabetes Mellitus, Hypertension, and Hyperlipidemia were major manifestations among Taiwanese population leading to deaths of top ten causes in Taiwan. These three indications had some characteristics in common as increasing risk with increasing age and sharing the same pool of risk factors in our living environment. Basically, they are such diseases closely related with people’s life-styles as one can predict by some predisposing factors. The ultimate goal of a prediction model is to foresee risk not normally judged by clinicians’ routine works. From the perspectives of preventive medicine, some risk factors were collected from active survey instead of biochemical tests or physical examinations. Especially, the body measurement, life-style variables, and family history of diseases play important roles in predicting a man’s health. As for clinicians’ points of view, a useful predicting model can greatly help on implementation of diagnosis, treatment, and health education. The role of preventive medicine became more important as health insurance system in Taiwan transforming into prospective payment systems. The central role of data Mining uses artificial intelligence, database, and statistical methods to extract meaningful information from puzzles of variables and data. This particular study utilizes both genetic algorithm and case base reasoning in hybrid data mining technology. The research suggests this approach to be easy an effective technique to acquire of knowledge from database. This study has collected 1370 subjects from department of health examination, Chang Gung Memorial Hospital from Jul. 2000 to Jul. 2001 years. Results from predicting selected chronic diseases by anthropometrical and three-dimension measurements are promising and innovative in field of biomedical sciences. Specifically, significant predictors for Hyperlipidemia, Diabetes Mellitus, and Hypertension are wait—hip ratio, waist-profile-area, waist-circum, trunk-surf-area, and left-arm-volume, respectively.
Lai-Chen, Chen, and 陳來成. "Credit Card Fraud Detection Using Data Mining Approaches." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/51921731728470410399.
Повний текст джерела元智大學
資訊管理學系
90
Credit card transactions and electronic commerce continue to grow in great number. The higher rate of fraudulent account numbers also growing fast in credit card industries that subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of commercial banks and the countries payment system. The prevention of credit card fraud is an application for prediction techniques. This paper shows how data mining techniques and artificial intelligence algorithms can be successfully to obtain a high fraud detection rate. We also describe an AI-based approach that construct and compare predict models separately by case-based reasoning, decision tree and neural network methods for detecting fraud pattern. To ensure proper model construction that concept had to be developed and tested on real credit card data of local bank. The prediction of user behavior and operation transaction can be integrated and implemented on the fraud detection models.
Yang, Kuo-Tung, and 楊國棟. "Several Heuristic Approaches to Privacy-Preserving Data Mining." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/51265621784508458978.
Повний текст джерела國立高雄大學
資訊工程學系碩士班
98
Data mining technology can help extract useful knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this thesis, we propose three approaches for modifying original databases in order to hide sensitive itemsets. The first one is called SIF-IDF, which is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. It uses the above concept to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The second one is a lattice-based approach, in which a lattice is built based on the relation of sensitive itemsets. The bottom-up deletion strategies is also used to gradually reduce the frequency of sensitive itemsets in the hiding process. The third one is an evolutionary privacy-preserving data mining method to find appropriate transactions to be hidden from a database. The proposed approach designs a flexible evaluation function with three factors, and different weights may be assigned to them depending on users’ preference. Besides, the concept of pre-large itemsets is used to reduce the cost of rescanning databases, thus speeding up the evaluation process of chromosomes. The three proposed approaches can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approaches.
Chang, Chieh-Hsiang, and 張傑翔. "Apply Data Mining Approaches in Financial Early Warning System." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/99158219835011321311.
Повний текст джерела華梵大學
資訊管理學系碩士班
95
Financial Early warning system can not only help the management of the financial institutions but also diagnose their common operations. Since the early 1970s, many related researches have already made. However, most of them use traditional statistic ways to build the early warning system until recent years. Because of the vigorous development of the data mining techniques, many researches begin to apply those techniques to various fields also including early warning system. Data mining doesn’t need to satisfy many statistical antecedent assumptions and can transform enormous original data into meaningful and useful information. To build the early warning system model, the related financial laws, data, and operation management rules need to be taken into consideration. However, the number of features is too large and not all of them are helpful to prediction. Data sets with unimportant, noisy or high correlated features will significant decrease the classification accuracy rate. By removing these features, the efficiency and accuracy rate can obtain a better result. Back-propagation neural network (BPN), support vector machine (SVM) and decision tree (DT) are well-known data mining techniques, which can be applied to various fields and have higher classification ability. However, data mining techniques may suffer the problem of parameters settings. Bad parameter setting of data mining techniques will result worse accuracy rate. Therefore, this paper utilize one meta-heuristic, particle swarm optimization (PSO), to obtain suitable parameter optimization and select a subset of feature without degrade the classification accuracy rate. By the meta-heuristic global search characteristic, the parameters of BPN, SVM and DT can be optimized and the feature selection can be done at the same time to obtain the minimum set of features which can result in higher accuracy effectively. In order to evaluate the proposed approach, this research taken the report of the Taiwan Ratings to be the authority. The “Condition and Performance of Domestic Banks” from the Central bank of China, Republic of China (Taiwan) and the “Statistics of Financial Institutions” from the Financial Supervisory commission, Executive Yuan are planed to be the source data. Banks will be classified as one of three categories ( ”well”, ”average”, and ”risky”). In the experiment, although BPN and SVM have the high accuracy of forecast, the processes among them are black-box testing. Professionals can’t take these results into their future judgments. By the tree structure which was obtained from the proposed PSO+DT architecture, experts can obtain the best decision rules and thus make further evaluation and correction of our early warning system model. The experiment results shown that our proposed approaches can reduce unnecessarily features and improve classification accuracy significantly.
Ni, Sheng-Fu, and 倪聖富. "Applying Data Mining Approaches to Churn Prediction in Retailing." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/73165743827586089960.
Повний текст джерела元智大學
企業管理學系
95
Recently, churn attracts increasing attention to customer relationship management (CRM). Moreover, retailers suffer from that customers can switch their suppliers without informing them. The issue of churn prediction has been extensively researched. However, few studies focus on the non-contractual environment like retailing. In this study, we not only apply several classification techniques, such as logistic regression, discriminant analysis, random forests, and artificial neural networks, but propose a combination model of discriminant analysis and back propagation neural network to churn prediction. The percentage correctly classified (PCC) and area under the receiver operating characteristic curve (AUROC) are used for model evaluation in this study. Moreover, we improve the definition of partial defection proposed by the previous literature to solve the problem of churn determination in non-contractual settings. Our findings suggest that: (1) the combination of two techniques outperforms the single technique; (2) variables like promotion, use of the loyalty points, customer interaction, and demographics are shown to be useful for churn prediction.
Lee, Hong-yu, and 李弘裕. "A Study on Efficient Approaches for Weighted Data Mining." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/95839345197873262085.
Повний текст джерела國立高雄大學
資訊工程學系碩士班
100
Weighted data mining in the field of data mining has been widely discussed in recent years due to its various practical applications. Different from traditional association-rule mining, an item on weighted data mining is flexibly given a suitable weight value to represent its importance in a database, and then weighted frequent itemsets can be found from a database. But, the downward-closure property in association-rule mining can be not kept in the weighted data mining. Although traditional upper-bound model can be applied to achieve the goal, lots of unpromising candidate itemsets still have to be generated by using the traditional model. To address this, we thus develop several efficient methods for mining weighted frequent itemsets and weighted sequential patterns. For the issue of weighted itemset mining, a new upper-bound model, which adopts the maximum weight in a transaction as upper-bound of the transaction, is first proposed to obtain more accurate upper-bound for itemsets. In addition, two effective strategies, pruning and filtering, are designed to further improve the model. To effectively utilize the model and strategies, the two efficient algorithms, projection-based weighted mining algorithms based on the improved upper-bound approach with the pruning strategy and projection-based weighted mining algorithms based on the improved upper-bound approach with effective strategies, are proposed for finding weighted frequent itemsets in databases. On the other hand, the proposed concepts on weighted itemset mining can be further extended to the problem of weighted sequential pattern mining. Finally, the experimental results on the synthetic and real datasets also show the performance of the proposed algorithms outperforms the traditional weighted mining algorithms under various parameter settings.
Lan, Guo-Cheng, and 藍國誠. "Efficient Approaches for the Filtration Mechanisms of Data Mining." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/57250897808963244132.
Повний текст джерела南台科技大學
資訊管理系
94
In recent years, the technology of data mining is generally applied to the various commercial domains. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. In this thesis, we aim at the four research subjects which are association rule, sequential pattern, traversal pattern, and the correlation about traversal path and purchasing merchandises to propose several efficient approaches. First, we propose new mining approaches of association rule such as EFI and GRA. The one of characters EFI (An Efficient Approach for Filtering Infrequent Itemsets) is the two phase filtration mechanisms. EFI only generates these itemsets which are the most possible to be frequent via the two filtration mechanisms. In the mining frequent itemsets, EFI does not generate candidate sets and scans the database four times, and then finish the mining task quickly. In addition, GRA (Gradation Reduction Approaches) is a level-wise technique. The one of the characters of GRA algorithm is the gradation filtration mechanisms, and the algorithm uses the simple mask method to generate itemsets. In the mining association rules, GRA can avoid generating a huge number of unnecessary itemsets via the gradation filtration mechanisms, and the algorithm does not need to generate the candidate sets, and then finish the mining task quickly. Besides, when the algorithms deal with the very large databases, EFI does not modify any mining process, and then perform the mining task. In addition, we propose another algorithm GRA-M (Gradation Reduction Approaches – Modified Version) which is modified from the GRA algorithm. EFI and GRA-M will first divide the large database into several sub-databases which are loaded in the memory by the algorithms. The algorithms only perform each sub-database four times I/O processes, and then finish the mining task. Next, we also propose the algorithms SFA (Mining Sequential Patterns Using Filtering Approaches) and GRS (Gradation Reduction Approaches for Mining Sequential Patterns) to discover the sequential patterns. Because SFA is extended from the EFI algorithm, the algorithm also only generates these subsequences which are the most possible to be frequent via the two phase filtration mechanisms, and the algorithm scans the database four times without generating candidate sets, and then finishes the mining task. In the same way, GRS is modified from the GRA algorithm. GRS can effectively reduce a huge number of unnecessary subsequences via the gradation filtration mechanisms, and the algorithm does not generate candidate set, and then finishes the mining task. Besides, when the algorithms deal with the very large databases, SFA does not modify any mining process, and then perform the mining task. In addition, we propose another algorithm GRS-M (Gradation Reduction Approaches for Mining Sequential Patterns – Modified Version) which is modified from the GRS algorithm. SFA and GRS-M will first divide the large database into several sub-databases which are loaded in the memory by the algorithms. The algorithms only perform each sub-database four times I/O processes, and then finish the mining task. Now, the e-commerce websites are growing fast at the surprising speed. If we can understand the behaviors of users’ traversal path in the websites, we can make a better target marketing. Therefore, we propose a new algorithm TFA (Mining Traversal Patterns Using Filtering Approaches) to discover the traversal patterns. The one of characters of TFA algorithm is the adjacency filtration mechanisms. TFA can effectively reduce a huge number of unnecessary continuity subsequences via the adjacency filtration mechanisms. The process of generating continuity subsequences is very simple, and the algorithm does not generate candidate sets, and then finishes the mining task. However, if we only consider the traversal path factor, the degree of accuracy is not enough. Some researchers later propose the correlation about traversal path and purchasing merchandises, and then increase the accuracy of patterns. Therefore, we aim at the subject to propose a new algorithm CFA (Mining the Correlation Using Filtering Approaches). The processes of CFA is combined the TFA algorithm with the EFI algorithm. CFA can reduce a huge number of unnecessary continuity subsequences and itemsets via the filtration mechanisms, and then get all of frequent combination patterns quickly. Besides, when the algorithms deal with very large databases, we do not modify the mining process of TFA or CFA, and then perform the mining task. In addition, the algorithms also use the method of dividing database, and the algorithm only perform each sub-database three times I/O processes, and then finish the mining task. In the experimental, we aim at each parameter to compare the performance of algorithms. From the analysis of various results, we can obviously discover that our proposed algorithms have a better the performance and the utility rate of memory.
Sukchotrat, Thuntee. "Data mining-driven approaches for process monitoring and diagnosis." 2008. http://hdl.handle.net/10106/1827.
Повний текст джерелаBradley, Paul S. "Mathematical programming approaches to machine learning and data mining." 1998. http://catalog.hathitrust.org/api/volumes/oclc/42583739.html.
Повний текст джерелаTypescript. eContent provider-neutral record in process. Description based on print version record. Includes bibliographical references (leaves 145-165).