Dissertations / Theses on the topic 'Data Mining and Knowledge DiscoveryID'

To see the other types of publications on this topic, follow the link: Data Mining and Knowledge DiscoveryID.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data Mining and Knowledge DiscoveryID.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Amado, Vanessa. "Knowledge discovery and data mining from freeway section traffic data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/5591.

Full text
Abstract:
Thesis (Ph. D.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on June 8, 2009) Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
2

Engels, Robert. "Component based user guidance in knowledge discovery and data mining /." Sankt Augustin : Infix, 1999. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=008752552&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ponsan, Christiane. "Computing with words for data mining." Thesis, University of Bristol, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.310744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Abedjan, Ziawasch. "Improving RDF data with data mining." Phd thesis, Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/7133/.

Full text
Abstract:
Linked Open Data (LOD) comprises very many and often large public data sets and knowledge bases. Those datasets are mostly presented in the RDF triple structure of subject, predicate, and object, where each triple represents a statement or fact. Unfortunately, the heterogeneity of available open data requires significant integration steps before it can be used in applications. Meta information, such as ontological definitions and exact range definitions of predicates, are desirable and ideally provided by an ontology. However in the context of LOD, ontologies are often incomplete or simply not available. Thus, it is useful to automatically generate meta information, such as ontological dependencies, range definitions, and topical classifications. Association rule mining, which was originally applied for sales analysis on transactional databases, is a promising and novel technique to explore such data. We designed an adaptation of this technique for min-ing Rdf data and introduce the concept of “mining configurations”, which allows us to mine RDF data sets in various ways. Different configurations enable us to identify schema and value dependencies that in combination result in interesting use cases. To this end, we present rule-based approaches for auto-completion, data enrichment, ontology improvement, and query relaxation. Auto-completion remedies the problem of inconsistent ontology usage, providing an editing user with a sorted list of commonly used predicates. A combination of different configurations step extends this approach to create completely new facts for a knowledge base. We present two approaches for fact generation, a user-based approach where a user selects the entity to be amended with new facts and a data-driven approach where an algorithm discovers entities that have to be amended with missing facts. As knowledge bases constantly grow and evolve, another approach to improve the usage of RDF data is to improve existing ontologies. Here, we present an association rule based approach to reconcile ontology and data. Interlacing different mining configurations, we infer an algorithm to discover synonymously used predicates. Those predicates can be used to expand query results and to support users during query formulation. We provide a wide range of experiments on real world datasets for each use case. The experiments and evaluations show the added value of association rule mining for the integration and usability of RDF data and confirm the appropriateness of our mining configuration methodology.
Linked Open Data (LOD) umfasst viele und oft sehr große öffentlichen Datensätze und Wissensbanken, die hauptsächlich in der RDF Triplestruktur bestehend aus Subjekt, Prädikat und Objekt vorkommen. Dabei repräsentiert jedes Triple einen Fakt. Unglücklicherweise erfordert die Heterogenität der verfügbaren öffentlichen Daten signifikante Integrationsschritte bevor die Daten in Anwendungen genutzt werden können. Meta-Daten wie ontologische Strukturen und Bereichsdefinitionen von Prädikaten sind zwar wünschenswert und idealerweise durch eine Wissensbank verfügbar. Jedoch sind Wissensbanken im Kontext von LOD oft unvollständig oder einfach nicht verfügbar. Deshalb ist es nützlich automatisch Meta-Informationen, wie ontologische Abhängigkeiten, Bereichs-und Domänendefinitionen und thematische Assoziationen von Ressourcen generieren zu können. Eine neue und vielversprechende Technik um solche Daten zu untersuchen basiert auf das entdecken von Assoziationsregeln, welche ursprünglich für Verkaufsanalysen in transaktionalen Datenbanken angewendet wurde. Wir haben eine Adaptierung dieser Technik auf RDF Daten entworfen und stellen das Konzept der Mining Konfigurationen vor, welches uns befähigt in RDF Daten auf unterschiedlichen Weisen Muster zu erkennen. Verschiedene Konfigurationen erlauben uns Schema- und Wertbeziehungen zu erkennen, die für interessante Anwendungen genutzt werden können. In dem Sinne, stellen wir assoziationsbasierte Verfahren für eine Prädikatvorschlagsverfahren, Datenvervollständigung, Ontologieverbesserung und Anfrageerleichterung vor. Das Vorschlagen von Prädikaten behandelt das Problem der inkonsistenten Verwendung von Ontologien, indem einem Benutzer, der einen neuen Fakt einem Rdf-Datensatz hinzufügen will, eine sortierte Liste von passenden Prädikaten vorgeschlagen wird. Eine Kombinierung von verschiedenen Konfigurationen erweitert dieses Verfahren sodass automatisch komplett neue Fakten für eine Wissensbank generiert werden. Hierbei stellen wir zwei Verfahren vor, einen nutzergesteuertenVerfahren, bei dem ein Nutzer die Entität aussucht die erweitert werden soll und einen datengesteuerten Ansatz, bei dem ein Algorithmus selbst die Entitäten aussucht, die mit fehlenden Fakten erweitert werden. Da Wissensbanken stetig wachsen und sich verändern, ist ein anderer Ansatz um die Verwendung von RDF Daten zu erleichtern die Verbesserung von Ontologien. Hierbei präsentieren wir ein Assoziationsregeln-basiertes Verfahren, der Daten und zugrundeliegende Ontologien zusammenführt. Durch die Verflechtung von unterschiedlichen Konfigurationen leiten wir einen neuen Algorithmus her, der gleichbedeutende Prädikate entdeckt. Diese Prädikate können benutzt werden um Ergebnisse einer Anfrage zu erweitern oder einen Nutzer während einer Anfrage zu unterstützen. Für jeden unserer vorgestellten Anwendungen präsentieren wir eine große Auswahl an Experimenten auf Realweltdatensätzen. Die Experimente und Evaluierungen zeigen den Mehrwert von Assoziationsregeln-Generierung für die Integration und Nutzbarkeit von RDF Daten und bestätigen die Angemessenheit unserer konfigurationsbasierten Methodologie um solche Regeln herzuleiten.
APA, Harvard, Vancouver, ISO, and other styles
5

Páircéir, Rónán. "Knowledge discovery from distributed aggregate data in data warehouses and statistical databases." Thesis, University of Ulster, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274398.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sharma, Sumana. "An Integrated Knowledge Discovery and Data Mining Process Model." VCU Scholars Compass, 2008. http://scholarscompass.vcu.edu/etd/1615.

Full text
Abstract:
Enterprise decision making is continuously transforming in the wake of ever increasing amounts of data. Organizations are collecting massive amounts of data in their quest for knowledge nuggets in form of novel, interesting, understandable patterns that underlie these data. The search for knowledge is a multi-step process comprising of various phases including development of domain (business) understanding, data understanding, data preparation, modeling, evaluation and ultimately, the deployment of the discovered knowledge. These phases are represented in form of Knowledge Discovery and Data Mining (KDDM) Process Models that are meant to provide explicit support towards execution of the complex and iterative knowledge discovery process. Review of existing KDDM process models reveals that they have certain limitations (fragmented design, only a checklist-type description of tasks, lack of support towards execution of tasks, especially those of the business understanding phase etc) which are likely to affect the efficiency and effectiveness with which KDDM projects are currently carried out. This dissertation addresses the various identified limitations of existing KDDM process models through an improved model (named the Integrated Knowledge Discovery and Data Mining Process Model) which presents an integrated view of the KDDM process and provides explicit support towards execution of each one of the tasks outlined in the model. We also evaluate the effectiveness and efficiency offered by the IKDDM model against CRISP-DM, a leading KDDM process model, in aiding data mining users to execute various tasks of the KDDM process. Results of statistical tests indicate that the IKDDM model outperforms the CRISP model in terms of efficiency and effectiveness; the IKDDM model also outperforms CRISP in terms of quality of the process model itself.
APA, Harvard, Vancouver, ISO, and other styles
7

DharaniK and Kalpana Gudikandula. "Actionable Knowledge Discovery using Multi-Step Mining." International Journal of Computer Science and Network (IJCSN), 2012. http://hdl.handle.net/10150/271493.

Full text
Abstract:
Data mining at enterprise level operates on huge amount of data such as government transactions, banks, insurance companies and so on. Inevitably, these businesses produce complex data that might be distributed in nature. When mining is made on such data with a single-step, it produces business intelligence as a particular aspect. However, this is not sufficient in enterprise where different aspects and standpoints are to be considered before taking business decisions. It is required that the enterprises perform mining based on multiple features, data sources and methods. This is known as combined mining. The combined mining can produce patterns that reflect all aspects of the enterprise. Thus the derived intelligence can be used to take business decisions that lead to profits. This kind of knowledge is known as actionable knowledge.
Data mining is a process of obtaining trends or patterns in historical data. Such trends form business intelligence that in turn leads to taking well informed decisions. However, data mining with a single technique does not yield actionable knowledge. This is because enterprises have huge databases and heterogeneous in nature. They also have complex data and mining such data needs multi-step mining instead of single step mining. When multiple approaches are involved, they provide business intelligence in all aspects. That kind of information can lead to actionable knowledge. Recently data mining has got tremendous usage in the real world. The drawback of existing approaches is that insufficient business intelligence in case of huge enterprises. This paper presents the combination of existing works and algorithms. We work on multiple data sources, multiple methods and multiple features. The combined patterns thus obtained from complex business data provide actionable knowledge. A prototype application has been built to test the efficiency of the proposed framework which combines multiple data sources, multiple methods and multiple features in mining process. The empirical results revealed that the proposed approach is effective and can be used in the real world.
APA, Harvard, Vancouver, ISO, and other styles
8

Atzmüller, Martin. "Knowledge-intensive subgroup mining : techniques for automatic and interactive discovery /." Berlin : Aka, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=2928288&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Butler, Patrick Julian Carey. "Knowledge Discovery in Intelligence Analysis." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/48422.

Full text
Abstract:
Intelligence analysts today are faced with many challenges, chief among them being the need to fuse disparate streams of data, as well as rapidly arrive at analytical decisions and quantitative predictions for use by policy makers. These problems are further exacerbated by the sheer volume of data that is available to intelligence analysts. Machine learning methods enable the automated transduction of such large datasets from raw feeds to actionable knowledge but successful use of such methods require integrated frameworks for contextualizing them within the work processes of the analyst. Intelligence analysts typically distinguish between three classes of problems: collections, analysis, and operations. This dissertation specifically focuses on two problems in analysis: i) the reconstruction of shredded documents using a visual analytic framework combining computer vision techniques and user input, and ii) the design and implementation of a system for event forecasting which allows an analyst to not just consume forecasts of significant societal events but also understand the rationale behind these alerts and the use of data ablation techniques to determine the strength of conclusions. This work does not attempt to replace the role of the analyst with machine learning but instead outlines several methods to augment the analyst with machine learning. In doing so this dissertation also explores the responsibilities of an analyst in evaluating complex models and decisions made by these models. Finally, this dissertation defines a list of responsibilities for models designed to aid the analyst's work in evaluating and verifying the models.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
10

Atzmüller, Martin. "Knowledge-intensive subgroup mining techniques for automatic and interactive discovery." Berlin Aka, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2928288&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Bani, Mustafa Ahmed Mahmood. "A knowledge discovery and data mining process model for metabolomics." Thesis, Aberystwyth University, 2012. http://hdl.handle.net/2160/6889468e-851f-47fd-bd44-fe65fe516c7a.

Full text
Abstract:
This thesis presents a novel knowledge discovery and data mining process model for metabolomics, which was successfully developed, implemented and applied to a number of metabolomics applications. The process model provides a formalised framework and a methodology for conducting justifiable and traceable data mining in metabolomics. It promotes the achievement of metabolomics analytical objectives and contributes towards the reproducibility of its results. The process model was designed to satisfy the requirements of data mining in metabolomics and to be consistent with the scientific nature of metabolomics investigations. It considers the practical aspects of the data mining process, covering management, human interaction, quality assurance and standards, in addition to other desired features such as visualisation, data exploration, knowledge presentation and automation. The development of the process model involved investigating data mining concepts, approaches and techniques; in addition to the popular data mining process models, which were critically analysed in order to utilise their better features and to overcome their shortcomings. Inspiration from process engineering, software engineering, machine learning and scientific methodology was also used in developing the process model along with the existing ontologies of scientific experiments and data mining. The process model was designed to support both data-driven and hypothesis-driven data mining. It provides a mechanism for defining the analytical objectives of metabolomics data mining, considering their achievability, feasibility, measurability and success criteria. The process model also provides a novel strategy for performing justifiable selection of data mining techniques, taking into consideration the achievement of the process's analytical objectives and taking into account the nature and quality of the metabolomics data, in addition to the requirements and feasibility of the selected data mining techniques. The model ensures validity and reproducibility of the outcomes by defining traceability and assessment mechanisms, which cover all the procedures applied and the deliveries generated throughout the process. The process also defines evaluation mechanisms, which cover not only the technical aspects of the data mining model, but also the contextual aspects of the acquired knowledge. The process model was implemented using a software environment, and was applied to four real-world metabolomics applications. The applications demonstrated the proposed process model's applicability to various data mining approaches, goals, tasks, and techniques. They also confirmed the process's applicability to various metabolomics investigations and approaches using data generated by a variety of data acquisition instruments. The outcomes of the process execution in these applications were used in evaluating the process model's design and its satisfaction of the requirements of metabolomics data mining.
APA, Harvard, Vancouver, ISO, and other styles
12

HE, AIJING. "UNSUPERVISED DATA MINING BY RECURSIVE PARTITIONING." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1026406153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Hayward, John T. "Mining Oncology Data: Knowledge Discovery in Clinical Performance of Cancer Patients." Worcester, Mass. : Worcester Polytechnic Institute, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-081606-083026/.

Full text
Abstract:
Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: Clinical Performance; Databases; Cancer; oncology; Knowledge Discovery in Databases; data mining. Includes bibliographical references (leaves 267-270).
APA, Harvard, Vancouver, ISO, and other styles
14

Nagao, Katashi, Katsuhiko Kaji, and Toshiyuki Shimizu. "Discussion Mining : Knowledge Discovery from Data on the Real World Activities." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ritchie, J. A. "Knowledge discovery and data mining : operation of the Ireland power system." Thesis, Queen's University Belfast, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.432508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Tetley, Michael Grant. "Constraining Earth’s plate tectonic evolution through data mining and knowledge discovery." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/18737.

Full text
Abstract:
Global reconstructions are reasonably well understood to ~200 Ma. However, two first-order uncertainties remain unresolved in their development: firstly, the critical dependency on a self-consistent global reference frame; and secondly, the fundamental difficulty in objectively predicting the location and type of tectonic paleo-boundaries. In this thesis I present three new studies directly addressing these fundamental geoscientific questions. Through the joint evaluation of global seafloor hotspot track observations (for times younger than 80 Ma), first-order geodynamic estimates of global net lithospheric rotation (NLR), and parameter estimation for paleo-trench migration (TM) behaviours, the first chapter presents a suite of new geodynamically consistent, data-optimised global absolute reference frames spanning from 220 Ma through to present-day. In the second chapter, using an updated paleomagnetic pole compilation to contain age uncertainties, I identify the optimal APWP pole configuration for 16 major cratonic blocks minimising both plate velocity and velocity gradients characteristic of eccentric changes in predicted plate motions, producing a new global reference frame for the Phanerozoic consistent with physical geodynamic principles. In the final chapter of my thesis I identify paleo-tectonic environments on Earth through a machine learning approach using global geochemical data, deriving a set of first-order discriminatory tectonic environment models for mid-ocean ridge (MOR), subduction (ARC), and oceanic hotspot (OIB) environments. Key discriminatory geochemical attributes unique to each first-order tectonic environment were identified, enabling a data-rich identification of samples of unknown affinity. Applying these models to Neoproterozoic data, 56 first-order tectonic paleo-boundaries associated with Rodinia supercontinent amalgamation and dispersal were identified and evaluated against published Neoproterozoic reconstructions.
APA, Harvard, Vancouver, ISO, and other styles
17

Fernandez, Sanchez Javier. "Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978.

Full text
Abstract:
Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients.
APA, Harvard, Vancouver, ISO, and other styles
18

Ur-Rahman, Nadeem. "Textual data mining applications for industrial knowledge management solutions." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/6373.

Full text
Abstract:
In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.
APA, Harvard, Vancouver, ISO, and other styles
19

Momtazpour, Marjan. "Knowledge Discovery for Sustainable Urban Mobility." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/65157.

Full text
Abstract:
Due to the rapid growth of urban areas, sustainable urbanization is an inevitable task for city planners to address major challenges in resource management across different sectors. Sustainable approaches of energy production, distribution, and consumption must take the place of traditional methods to reduce the negative impacts of urbanization such as global warming and fast consumption of fossil fuels. In order to enable the transition of cities to sustainable ones, we need to have a precise understanding of the city dynamics. The prevalence of big data has highlighted the importance of data-driven analysis on different parts of the city including human movement, physical infrastructure, and economic activities. Sustainable urban mobility (SUM) is the problem domain that addresses the sustainability issues in urban areas with respect to city dynamics and people movements in the city. Hence, to realize an integrated solution for SUM, we need to study the problems that lie at the intersection of energy systems and mobility. For instance, electric vehicle invention is a promising shift toward smart cities, however, the impact of high adoption of electric vehicles on different units such as electricity grid should be precisely addressed. In this dissertation, we use data analytics methods in order to tackle major issues in SUM. We focus on mobility and energy issues of SUM by characterizing transportation networks and energy networks. Data-driven methods are proposed to characterize the energy systems as well as the city dynamics. Moreover, we propose anomaly detection algorithms for control and management purposes in smart grids and in cities. In terms of applications, we specifically investigate the use of electrical vehicles for personal use and also for public transportation (i.e. electric taxis). We provide a data-driven framework to propose optimal locations for charging and storage installation for electric vehicles. Furthermore, adoption of electric taxi fleet in dense urban areas is investigated using multiple data sources.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
20

Howard, Craig M. "Tools and techniques for knowledge discovery." Thesis, University of East Anglia, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Wu, Qionglin 1964. "Data mining and knowledge discovery in financial research : empirical investigations into currency." Thesis, McGill University, 2001. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=31560.

Full text
Abstract:
Since there exist drawbacks for linear models such as those based on regression techniques, which have been the basis of traditional statistical forecasting models, neural networks are used in this thesis to train and test the input data. This thesis presents a feedforward backpropagation neural network approach to univariate time series analysis. Real world observations of foreign exchange rates in twenty currencies against U.S. dollars have been applied as a study in this experiment. Feedforward connectionist networks have been designed to model daily exchange rates over the period from January 4, 1999 to October 20, 2000. The values of the root mean square error (RMSE) is used as the criterion of selecting the parameters of the training set, testing set, the numbers of the hidden nodes and epochs, the momentum terms and the learning rates. The models obtained in this study by using this method can be used to forecast the movement of these exchange rates.
Before analyzing neural network techniques, data preprocessing and correlation analysis are presented. It is found there exist three correlation situations: the currencies between member countries in European Economic Community (EEC) have very strong correlation relationship; the correlations between Chinese Renminbi and the other currencies are very weak; and the correlations between the other currencies are variable with the change of the time period. They are related to the different finance policy, economic situation and the other factors of each country with the different time period.
APA, Harvard, Vancouver, ISO, and other styles
22

Gheyas, Iffat A. "Novel computationally intelligent machine learning algorithms for data mining and knowledge discovery." Thesis, University of Stirling, 2009. http://hdl.handle.net/1893/2152.

Full text
Abstract:
This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Xiaodong. "Temporal data mining : algorithms, language and system for temporal association rules." Thesis, Manchester Metropolitan University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.297977.

Full text
Abstract:
Studies on data mining are being pursued in many different research areas, such as Machine Learning, Statistics, and Databases. The work presented in this thesis is based on the database perspective of data mining. The main focuses are on the temporal aspects of data mining problems, especially association rule discovery, and issues on the integration of data mining and database systems. Firstly, a theoretical framework for temporal data mining is proposed in this thesis. Within this framework, not only potential patterns but also temporal features associated with the patterns are expected to be discovered. Calendar time expressions are suggested to represent temporal features and the minimum frequency of patterns is introduced as a new threshold in the model of temporal data mining. The framework also emphasises the necessary components to support temporal data mining tasks. As a specialisation of the proposed framework, the problem of mining temporal association rules is investigated. The methodology adopted in this thesis is eventually discovering potential temporal rules by alternatively using special search techniques for various restricted problems in an interactive and iterative process. Three forms of interesting mining tasks for temporal association rules with certain constraints are identified. These tasks are the discovery of valid time periods of association rules, the discovery of periodicities of association rules, and the discovery of association rules with temporal features. The search techniques and algorithms for those individual tasks are developed and presented in this thesis. Finally, an integrated query and mining system (IQMS) is presented in this thesis, covering the description of an interactive query and mining interface (IQMI) supplied by the IQMS system, the presentation of an SQL-like temporal mining language (TML) with the ability to express various data mining tasks for temporal association rules, and the suggestion of an IQMI-based interactive data mining process. The implementation of this system demonstrates an alternative approach for the integration of the DBMS and data mining functions.
APA, Harvard, Vancouver, ISO, and other styles
24

Wu, Sheng-Tang. "Knowledge discovery using pattern taxonomy model in text mining." Queensland University of Technology, 2007. http://eprints.qut.edu.au/16675/.

Full text
Abstract:
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
APA, Harvard, Vancouver, ISO, and other styles
25

Wu, Sheng-Tang. "Knowledge discovery using pattern taxonomy model in text mining." Thesis, Queensland University of Technology, 2007. https://eprints.qut.edu.au/16675/1/Sheng-Tang_Wu_Thesis.pdf.

Full text
Abstract:
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
APA, Harvard, Vancouver, ISO, and other styles
26

Dam, Hai Huong Information Technology &amp Electrical Engineering Australian Defence Force Academy UNSW. "A scalable evolutionary learning classifier system for knowledge discovery in stream data mining." Awarded by:University of New South Wales - Australian Defence Force Academy, 2008. http://handle.unsw.edu.au/1959.4/38865.

Full text
Abstract:
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
APA, Harvard, Vancouver, ISO, and other styles
27

Cloyd, James Dale. "Data mining with Newton's method." [Johnson City, Tenn. : East Tennessee State University], 2002. http://etd-submit.etsu.edu/etd/theses/available/etd-1101102-081311/unrestricted/CloydJ111302a.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Fukuda, Kyoko. "Computer-Enhanced Knowledge Discovery in Environmental Science." Thesis, University of Canterbury. Mathematics and Statistics, 2009. http://hdl.handle.net/10092/2140.

Full text
Abstract:
Encouraging the use of computer algorithms by developing new algorithms and introducing uncommonly known algorithms for use on environmental science problems is a significant contribution, as it provides knowledge discovery tools to extract new aspects of results and draw new insights, additional to those from general statistical methods. Conducting analysis with appropriately chosen methods, in terms of quality of performance and results, computation time, flexibility and applicability to data of various natures, will help decision making in the policy development and management process for environmental studies. This thesis has three fundamental aims and motivations. Firstly, to develop a flexibly applicable attribute selection method, Tree Node Selection (TNS), and a decision tree assessment tool, Tree Node Selection for assessing decision tree structure (TNS-A), both of which use decision trees pre-generated by the widely used C4.5 decision tree algorithm as their information source, to identify important attributes from data. TNS helps the cost effective and efficient data collection and policy making process by selecting fewer, but important, attributes, and TNS-A provides a tool to assess the decision tree structure to extract information on the relationship of attributes and decisions. Secondly, to introduce the use of new, theoretical or unknown computer algorithms, such as the K-Maximum Subarray Algorithm (K-MSA) and Ant-Miner, by adjusting and maximizing their applicability and practicality to assess environmental science problems to bring new insights. Additionally, the unique advanced statistical and mathematical method, Singular Spectrum Analysis (SSA), is demonstrated as a data pre-processing method to help improve C4.5 results on noisy measurements. Thirdly, to promote, encourage and motivate environmental scientists to use ideas and methods developed in this thesis. The methods were tested with benchmark data and various real environmental science problems: sea container contamination, the Weed Risk Assessment model and weed spatial analysis for New Zealand Biosecurity, air pollution, climate and health, and defoliation imagery. The outcome of this thesis will be to introduce the concept and technique of data mining, a process of knowledge discovery from databases, to environmental science researchers in New Zealand and overseas by collaborating on future research to achieve, together with future policy and management, to maintain and sustain a healthy environment to live in.
APA, Harvard, Vancouver, ISO, and other styles
29

Brown, Marvin Lane. "The Impact of Data Imputation Methodologies on Knowledge Discovery." Cleveland State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=csu1227054769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Sun, Xingzhi. "Knowledge discovery in long temporal event sequences /." [St. Lucia, Qld.], 2005. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe18601.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Iglesia, Beatriz de la. "The development and application of heuristic techniques for the data mining task of nugget discovery." Thesis, University of East Anglia, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Amirbekyan, Artak. "Protocols and Data Structures for Knowledge Discovery on Distributed Private Databases." Thesis, Griffith University, 2007. http://hdl.handle.net/10072/367447.

Full text
Abstract:
Data mining has developed many techniques for automatic analysis of today’s rapidly collected data. Yahoo collects 12 TB daily of query logs and this is a quarter of what Google collects. For many important problems, the data is actually collected in distributed format by different institutions and organisations, and it can relate to businesses and individuals. The accuracy of knowledge that data mining brings for decision making depends on considering the collective datasets that describe a phenomenon. But privacy, confidentiality and trust emerge as major issues in the analysis of partitioned datasets among competitors, governments and other data holders that have conflicts of interest. Managing privacy is of the utmost importance in the emergent applications of data mining. For example, data mining has been identified as one of the most useful tools for the global collective fight on terror and crime [80]. Parties holding partitions of the database are very interested in the results, but may not trust the others with their data, or may be reluctant to release their data freely without some assurances regarding privacy. Data mining technology that reveals patterns in large databases could compromise the information that an individual or an organisation regards as private. The aim is to find the right balance between maximising analysis results (that are useful for each party) and keeping the inferences that disclose private information about organisation or individuals at a minimum. We address two core data analysis tasks, namely clustering and regression. For these to be solvable in the privacy context, we focus on the protocol’s efficiency and practicality. Because associative queries are central to clustering (and to many other data mining tasks), we provide protocols for privacy-preserving knear neighbour (k-NN) queries. Our methods improve previous methods for k-NN queries in privacy-preserving data-mining (which are based on Fagin’s A0 algorithm) because we do leak at least an order of magnitude less candidates and we achieve logarithmic performance on average. The foundations of our methods for k-NN queries are two pillars, firstly data structures and secondly, metrics. This thesis provides protocols for privacy-preserving computation of various common metrics and for construction of necessary data structures. We present here new algorithms for secure-multiparty-computation of some basic operations (like a new solution for Yao’s comparison problem and new protocols to perform linear algebra, in particular the scalar product). These algorithms will be used for the construction of protocols for different metrics (we provide protocols for all Minkowski metrics, the cosine metrics and the chessboard metric) and for performing associative queries in the privacy context. In order to be efficient, our protocols for associative queries are supported by specific data structures. Thus, we present the construction of privacy-preserving data structures like R-Trees [42, 7], KD-Trees [8, 53, 33] and the SASH [8, 60]. We demonstrate the use of all these tools, and we provide a new version of the well known clustering algorithm DBSCAN [42, 7]. This new version is now suitable for applications that demand privacy. Similarly, we apply our machinery and provide new multi-linear regression protocols that are now suitable for privacy applications. Our algorithms are more efficient than earlier methods and protocols. In particular, the cost associated with ensuring privacy provides only a linear-cost overhead for most of the protocols presented here. That is, our methods are essentially as costly as concentrating all the data in one site, performing the data-mining task, and disregarding privacy. However, in some cases we make use of a third-trusted party. This is not a problem when more than two parties are involved, since there is always one party that can act as the third.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
33

Dopitová, Kateřina. "Empirické porovnání systémů dobývání znalostí z databází." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-18159.

Full text
Abstract:
Submitted diploma thesis considers empirical comparison of knowledge discovery in databases systems. Basic terms and methods of knowledge discovery in databases domain are defined and criterions used to system comparison are determined. Tested software products are also shortly described in the thesis. Results of real task processing are brought out for each system. The comparison of individual systems according to previously determined criterions and comparison of competitiveness of commercial and non-commercial knowledge discovery in databases systems are performed within the framework of thesis.
APA, Harvard, Vancouver, ISO, and other styles
34

Fu, Tianjun. "CSI in the Web 2.0 Age: Data Collection, Selection, and Investigation for Knowledge Discovery." Diss., The University of Arizona, 2011. http://hdl.handle.net/10150/217073.

Full text
Abstract:
The growing popularity of various Web 2.0 media has created massive amounts of user-generated content such as online reviews, blog articles, shared videos, forums threads, and wiki pages. Such content provides insights into web users' preferences and opinions, online communities, knowledge generation, etc., and presents opportunities for many knowledge discovery problems. However, several challenges need to be addressed: data collection procedure has to deal with unique characteristics and structures of various Web 2.0 media; advanced data selection methods are required to identify data relevant to specific knowledge discovery problems; interactions between Web 2.0 users which are often embedded in user-generated content also need effective methods to identify, model, and analyze. In this dissertation, I intend to address the above challenges and aim at three types of knowledge discovery tasks: (data) collection, selection, and investigation. Organized in this "CSI" framework, five studies which explore and propose solutions to these tasks for particular Web 2.0 media are presented. In Chapter 2, I study focused and hidden Web crawlers and propose a novel crawling system for Dark Web forums by addressing several unique issues to hidden web data collection. In Chapter 3 I explore the usage of both topical and sentiment information in web crawling. This information is also used to label nodes in web graphs that are employed by a graph-based tunneling mechanism to improve collection recall. Chapter 4 further extends the work in Chapter 3 by exploring the possibilities for other graph comparison techniques to be used in tunneling for focused crawlers. A subtree-based tunneling method which can scale up to large graphs is proposed and evaluated. Chapter 5 examines the usefulness of user-generated content in online video classification. Three types of text features are extracted from the collected user-generated content and utilized by several feature-based classification techniques to demonstrate the effectiveness of the proposed text-based video classification framework. Chapter 6 presents an algorithm to identify forum user interactions and shows how they can be used for knowledge discovery. The algorithm utilizes a bevy of system and linguistic features and adopts several similarity-based methods to account for interactional idiosyncrasies.
APA, Harvard, Vancouver, ISO, and other styles
35

Zhou, Mu. "Knowledge Discovery and Predictive Modeling from Brain Tumor MRIs." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5809.

Full text
Abstract:
Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care? This dissertation presents a set of computational models in the context of malignant brain tumors— Giloblastoma Multiforme (GBM), which is notoriously aggressive with a poor survival rate. In particular, this dissertation developed quantitative feature extraction approaches for tumor diagnosis from magnetic resonance imaging (MRI), including a multi-scale local computational feature and a novel regional habitat quantification analysis of tumors. In addition, we proposed a histogram-based representation to investigate biological features to characterize ecological dynamics, which is of great clinical interest in evaluating tumor cellular distributions. Furthermore, in regards to clinical systems, generic machine learning techniques are typically incapable of generalizing well to specific diagnostic problems. Therefore, quantitative analysis from a data-driven perspective is becoming critical. In this dissertation, we propose two specific data-driven models to tackle different types of clinical MRI data. First, we inspected cancer systems from a time-domain perspective. We propose a quantitative histogram-based approach that builds a prediction model, measuring the differences from pre- and post-treatment diagnostic MRI data. Second, we investigated the problem of mining knowledge from a skewed distribution—data samples of each survival group are unequally distributed. We proposed an algorithmic framework to effectively predict survival groups by jointly considering imbalanced distributions and classifier design. Our approach achieved an accuracy of 95.24%, suggesting it captures class-specific information in a challenging clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
36

Choudhary, Alok K. "Knowledge discovery for moderating collaborative projects." Thesis, Loughborough University, 2009. https://dspace.lboro.ac.uk/2134/8138.

Full text
Abstract:
In today's global market environment, enterprises are increasingly turning towards collaboration in projects to leverage their resources, skills and expertise, and simultaneously address the challenges posed in diverse and competitive markets. Moderators, which are knowledge based systems have successfully been used to support collaborative teams by raising awareness of problems or conflicts. However, the functioning of a moderator is limited to the knowledge it has about the team members. Knowledge acquisition, learning and updating of knowledge are the major challenges for a Moderator's implementation. To address these challenges a Knowledge discOvery And daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update the corresponding expert module. The architecture for the Universal Knowledge Moderator (UKM) shows how the existing moderators can be extended to support global manufacturing. A method for designing and developing the knowledge acquisition module of the Moderator for manual and semi-automatic update of knowledge is documented using the Unified Modelling Language (UML). UML has been used to explore the static structure and dynamic behaviour, and describe the system analysis, system design and system development aspects of the proposed KOATING framework. The proof of design has been presented using a case study for a collaborative project in the form of construction project supply chain. It has been shown that Moderators can "learn" by extracting various kinds of knowledge from Post Project Reports (PPRs) using different types of text mining techniques. Furthermore, it also proposed that the knowledge discovery integrated moderators can be used to support and enhance collaboration by identifying appropriate business opportunities and identifying corresponding partners for creation of a virtual organization. A case study is presented in the context of a UK based SME. Finally, this thesis concludes by summarizing the thesis, outlining its novelties and contributions, and recommending future research.
APA, Harvard, Vancouver, ISO, and other styles
37

Katarina, Gavrić. "Mining large amounts of mobile object data." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2017. https://www.cris.uns.ac.rs/record.jsf?recordId=105036&source=NDLTD&language=en.

Full text
Abstract:
Within this thesis, we examined the possibilities of using an increasing amount ofpublicly available metadata about locations and peoples' activities in order to gainnew knowledge and develop new models of behavior and movement of people. Thepurpose of the research conducted for this thesis was to solve practical problems,such as: analyzing attractive tourist sites, defining the most frequent routes peopleare taking, defining main ways of transportation, and discovering behavioralpatterns in terms of defining strategies to suppress expansion of virus infections. Inthis thesis, a practical study was carried out on the basis of protected (aggregatedand anonymous) CDR (Caller Data Records) data and metadata of geo-referencedmultimedia content.
Предмет и циљ истраживања докторске дисертације представља евалуацијамогућности коришћења све веће количине јавно доступних података олокацији и кретању људи, како би се дошло до нових сазнања, развили новимодели понашања и кретања људи који се могу применити за решавањепрактичних проблема као што су: анализа атрактивних туристичких локација,откривање путања кретања људи и средстава транспорта које најчешћекористе, као и откривање важних параметара на основу којих се можеразвити стратегија за заштиту нације од инфективних болести итд. У раду је уту сврхе спроведена практична студија на бази заштићених (агрегираних ианонимизираних) ЦДР података и метаподатака гео-референцираногмултимедијалног садржаја. Приступ је заснован на примени техникавештачке интелигенције и истраживања података.
Predmet i cilj istraživanja doktorske disertacije predstavlja evaluacijamogućnosti korišćenja sve veće količine javno dostupnih podataka olokaciji i kretanju ljudi, kako bi se došlo do novih saznanja, razvili novimodeli ponašanja i kretanja ljudi koji se mogu primeniti za rešavanjepraktičnih problema kao što su: analiza atraktivnih turističkih lokacija,otkrivanje putanja kretanja ljudi i sredstava transporta koje najčešćekoriste, kao i otkrivanje važnih parametara na osnovu kojih se možerazviti strategija za zaštitu nacije od infektivnih bolesti itd. U radu je utu svrhe sprovedena praktična studija na bazi zaštićenih (agregiranih ianonimiziranih) CDR podataka i metapodataka geo-referenciranogmultimedijalnog sadržaja. Pristup je zasnovan na primeni tehnikaveštačke inteligencije i istraživanja podataka.
APA, Harvard, Vancouver, ISO, and other styles
38

Raatikainen, M. (Mika). "Intelligent knowledge discovery on building energy and indoor climate data." Doctoral thesis, Oulun yliopisto, 2016. http://urn.fi/urn:isbn:9789526213804.

Full text
Abstract:
Abstract A future vision of enabling technologies for the needs of energy conservation as well as energy efficiency based on the most important megatrends identified, namely climate change, urbanization, and digitalization. In the United States and in the European Union, about 40% of total energy consumption goes into energy use by buildings. Moreover, indoor climate quality is recognized as a distinct health hazard. On account of these two factors, energy efficiency and healthy housing are active topics in international research. The main aims of this thesis are to study which elements affect indoor climate quality, how energy consumption describes building energy efficiency and to analyse the measured data using intelligent computational methods. The data acquisition technology used in the studies relies heavily on smart metering technologies based on Building Automation Systems (BAS), big data and the Internet of Things (IoT). The data refining process presented and used is called Knowledge Discovery in Databases (KDD). It contains methods for data acquisition, pre-processing, data mining, visualisation and interpretation of results, and transformation into knowledge and new information for end users. In this thesis, four examples of data analysis and knowledge deployment concerning small houses and school buildings are presented. The results of the case studies show that the data mining methods used in building energy efficiency and indoor climate quality analysis have a great potential for processing a large amount of multivariate data effectively. An innovative use of computational methods provides a good basis for researching and developing new information services. In the KDD process, researchers should co-operate with end users, such as building management and maintenance personnel as well as residents, to achieve better analysis results, easier interpretation and correct conclusions for exploiting the knowledge
Tiivistelmä Tulevaisuuden visio energiansäästön sekä energiatehokkuuden mahdollistavista teknologioista pohjautuu tärkeimpiin tunnistettuihin megatrendeihin, ilmastonmuutokseen, kaupungistumiseen ja digitalisoitumiseen. Yhdysvalloissa ja Euroopan unionissa käytetään noin 40 % kokonaisenergiankulutuksesta rakennusten käytön energiatarpeeseen. Myös rakennusten sisäilmaston on havaittu olevan ilmeinen terveysriski. Perustuen kahteen edellä mainittuun tekijään, energiatehokkuus ja asumisterveys ovat aktiivisia tutkimusaiheita kansainvälisessä tutkimuksessa. Tämän väitöskirjan päätavoitteena on ollut tutkia, mitkä elementit vaikuttavat sisäilmastoon ja rakennusten energiatehokkuuteen pääasiassa analysoimalla mittausdataa käyttäen älykkäitä laskennallisia menetelmiä. Tutkimuksissa käytetyt tiedonkeruuteknologiat perustuvat etäluentaan ja rakennusautomaatioon, big datan hyödyntämiseen ja esineiden internetiin (IoT). Väitöskirjassa esiteltävä tietämyksen muodostusprosessi (KDD) koostuu tiedonkeruusta,datan esikäsittelystä, tiedonlouhinnasta, visualisoinnista ja tutkimustulosten tulkinnasta sekä tietämyksen muodostamisesta ja oleellisen informaation esittämisestä loppukäyttäjille. Tässä väitöstutkimuksessa esitellään neljän data-analyysin ja niiden pohjalta muodostetun tietämyksen hyödyntämisen esimerkkiä, jotka liittyvät pientaloihin ja koulurakennuksiin. Esimerkkitapausten tulokset osoittavat, että käytetyillä tiedonlouhinnan menetelmillä sovellettuna rakennusten energiatehokkuus- ja sisäilmastoanalyyseihin on mahdollista jalostaa suuria monimuuttuja-aineistoja tehokkaasti. Laskennallisten menetelmien innovatiivinen käyttö antaa hyvät perusteet tutkia ja kehittää uusia informaatiopalveluja. Tutkijoiden tulee tehdä yhteistyötä loppukäyttäjinä toimivien kiinteistöhallinnan ja -ylläpidon henkilöstön sekä asukkaiden kanssa saavuttaakseen parempia analyysituloksia, helpompaa tulosten tulkintaa ja oikeita johtopäätöksiä tietämyksen hyödyntämiseksi
APA, Harvard, Vancouver, ISO, and other styles
39

Radovanovic, Aleksandar. "Concept Based Knowledge Discovery from Biomedical Literature." Thesis, Online access, 2009. http://etd.uwc.ac.za/usrfiles/modules/etd/docs/etd_gen8Srv25Nme4_9861_1272229462.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Elsilä, U. (Ulla). "Knowledge discovery method for deriving conditional probabilities from large datasets." Doctoral thesis, University of Oulu, 2007. http://urn.fi/urn:isbn:9789514286698.

Full text
Abstract:
Abstract In today's world, enormous amounts of data are being collected everyday. Thus, the problems of storing, handling, and utilizing the data are faced constantly. As the human mind itself can no longer interpret the vast datasets, methods for extracting useful and novel information from the data are needed and developed. These methods are collectively called knowledge discovery methods. In this thesis, a novel combination of feature selection and data modeling methods is presented in order to help with this task. This combination includes the methods of basic statistical analysis, linear correlation, self-organizing map, parallel coordinates, and k-means clustering. The presented method can be used, first, to select the most relevant features from even hundreds of them and, then, to model the complex inter-correlations within the selected ones. The capability to handle hundreds of features opens up the possibility to study more extensive processes instead of just looking at smaller parts of them. The results of k-nearest-neighbors study show that the presented feature selection procedure is valid and appropriate. A second advantage of the presented method is the possibility to use thousands of samples. Whereas the current rules of selecting appropriate limits for utilizing the methods are theoretically proved only for small sample sizes, especially in the case of linear correlation, this thesis gives the guidelines for feature selection with thousands of samples. A third positive aspect is the nature of the results: given that the outcome of the method is a set of conditional probabilities, the derived model is highly unrestrictive and rather easy to interpret. In order to test the presented method in practice, it was applied to study two different cases of steel manufacturing with hot strip rolling. In the first case, the conditional probabilities for different types of retentions were derived and, in the second case, the rolling conditions for the occurrence of wedge were revealed. The results of both of these studies show that steel manufacturing processes are indeed very complex and highly dependent on the various stages of the manufacturing. This was further confirmed by the fact that with studies of k-nearest-neighbors and C4.5, it was impossible to derive useful models concerning the datasets as a whole. It is believed that the reason for this lies in the nature of these two methods, meaning that they are unable to grasp such manifold inter-correlations in the data. On the contrary, the presented method of conditional probabilities allowed new knowledge to be gained of the studied processes, which will help to better understand these processes and to enhance them.
APA, Harvard, Vancouver, ISO, and other styles
41

Yu, Zhiguo. "Cooperative Semantic Information Processing for Literature-Based Biomedical Knowledge Discovery." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/33.

Full text
Abstract:
Given that data is increasing exponentially everyday, extracting and understanding the information, themes and relationships from large collections of documents is more and more important to researchers in many areas. In this paper, we present a cooperative semantic information processing system to help biomedical researchers understand and discover knowledge in large numbers of titles and abstracts from PubMed query results. Our system is based on a prevalent technique, topic modeling, which is an unsupervised machine learning approach for discovering the set of semantic themes in a large set of documents. In addition, we apply a natural language processing technique to transform the “bag-of-words” assumption of topic models to the “bag-of-important-phrases” assumption and build an interactive visualization tool using a modified, open-source, Topic Browser. In the end, we conduct two experiments to evaluate the approach. The first, evaluates whether the “bag-of-important-phrases” approach is better at identifying semantic themes than the standard “bag-of-words” approach. This is an empirical study in which human subjects evaluate the quality of the resulting topics using a standard “word intrusion test” to determine whether subjects can identify a word (or phrase) that does not belong in the topic. The second is a qualitative empirical study to evaluate how well the system helps biomedical researchers explore a set of documents to discover previously hidden semantic themes and connections. The methodology for this study has been successfully used to evaluate other knowledge-discovery tools in biomedicine.
APA, Harvard, Vancouver, ISO, and other styles
42

Durbha, Surya Srinivas. "Semantics-enabled framework for knowledge discovery from Earth observation data." Diss., Mississippi State : Mississippi State University, 2006. http://sun.library.msstate.edu/ETD-db/ETD-browse/browse.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Kulhavý, Lukáš. "Praktické uplatnění technologií data mining ve zdravotních pojišťovnách." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-77726.

Full text
Abstract:
This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
APA, Harvard, Vancouver, ISO, and other styles
44

Li, Xin. "Graph-based learning for information systems." Diss., The University of Arizona, 2009. http://hdl.handle.net/10150/193827.

Full text
Abstract:
The advance of information technologies (IT) makes it possible to collect a massive amount of data in business applications and information systems. The increasing data volumes require more effective knowledge discovery techniques to make the best use of the data. This dissertation focuses on knowledge discovery on graph-structured data, i.e., graph-based learning. Graph-structured data refers to data instances with relational information indicating their interactions in this study. Graph-structured data exist in a variety of application areas related to information systems, such as business intelligence, knowledge management, e-commerce, medical informatics, etc. Developing knowledge discovery techniques on graph-structured data is critical to decision making and the reuse of knowledge in business applications.In this dissertation, I propose a graph-based learning framework and identify four major knowledge discovery tasks using graph-structured data: topology description, node classification, link prediction, and community detection. I present a series of studies to illustrate the knowledge discovery tasks and propose solutions for these example applications. As to the topology description task, in Chapter 2 I examine the global characteristics of relations extracted from documents. Such relations are extracted using different information processing techniques and aggregated to different analytical unit levels. As to the node classification task, Chapter 3 and Chapter 4 study the patent classification problem and the gene function prediction problem, respectively. In Chapter 3, I model knowledge diffusion and evolution with patent citation networks for patent classification. In Chapter 4, I extend the context assumption in previous research and model context graphs in gene interaction networks for gene function prediction. As to the link prediction task, Chapter 5 presents an example application in recommendation systems. I frame the recommendation problem as link prediction on user-item interaction graphs, and propose capturing graph-related features to tackle this problem. Chapter 6 examines the community detection task in the context of online interactions. In this study, I propose to take advantage of the sentiments (agreements and disagreements) expressed in users' interactions to improve community detection effectiveness. All these examples show that the graph representation allows the graph structure and node/link information to be more effectively utilized in addressing the four knowledge discovery tasks.In general, the graph-based learning framework contributes to the domain of information systems by categorizing related knowledge discovery tasks, promoting the further use of the graph representation, and suggesting approaches for knowledge discovery on graph-structured data. In practice, the proposed graph-based learning framework can be used to develop a variety of IT artifacts that address critical problems in business applications.
APA, Harvard, Vancouver, ISO, and other styles
45

Trávníček, Petr. "Aplikace data miningu v podnikové praxi." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-164048.

Full text
Abstract:
Throughout last decades, knowledge discovery from databases as one of the information and communicaiton technologies' disciplines has developed into its current state being showed increasing interest not only by major business corporates. Presented diploma thesis deals with problematique of data mining while paying prime attention to its practical utilization within business environment. Thesis objective is to review possibilities of data mining applications and to decompose implementation techniques focusing on specific data mining methods and algorithms as well as adaptation of business processes. This objective is subject of theoretical part of thesis focusing on principles of data mining, knowledge discovery from databases process, data mining commonly used methods and algorithms and finally tasks typically implemented in this domain. Further objective consists in presenting data mining benefits on the model example that is being displayed in the practical part of the thesis. Besides created data mining models evalution, practical part contains also design of subsequent steps that would enable higher efficiency in some specific areas of given business. I believe previous point together with characterization of knowledge discovery in databases process to be considered as the most beneficial one's of the thesis.
APA, Harvard, Vancouver, ISO, and other styles
46

Asenjo, Juan C. "Data Masking, Encryption, and their Effect on Classification Performance: Trade-offs Between Data Security and Utility." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1010.

Full text
Abstract:
As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed that the use of data masking and encryption can impact the quality of data mining results. This dissertation investigated and compared the causal effects of data masking and encryption on classification performance as a measure of the quality of knowledge discovery. A review of the literature found a gap in the body of knowledge, indicating that this problem had not been studied before in an experimental setting. The objective of this dissertation was to gain an understanding of the trade-offs between data security and utility in the field of analytics and data mining. The research used a nationally recognized cancer incidence database, to show how masking and encryption of potentially sensitive demographic attributes such as patients’ marital status, race/ethnicity, origin, and year of birth, could have a statistically significant impact on the patients’ predicted survival. Performance parameters measured by four different classifiers delivered sizable variations in the range of 9% to 10% between a control group, where the select attributes were untouched, and two experimental groups where the attributes were substituted or suppressed to simulate the effects of the data protection techniques. In practice, this represented a corroboration of the potential risk involved when basing medical treatment decisions using data mining applications where attributes in the data sets are masked or encrypted for patient privacy and security concerns.
APA, Harvard, Vancouver, ISO, and other styles
47

Isik, Narin. "Fuzzy Spatial Data Cube Construction And Its Use In Association Rule Mining." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606056/index.pdf.

Full text
Abstract:
The popularity of spatial databases increases since the amount of the spatial data that need to be handled has increased by the use of digital maps, images from satellites, video cameras, medical equipment, sensor networks, etc. Spatial data are difficult to examine and extract interesting knowledge
hence, applications that assist decision-making about spatial data like weather forecasting, traffic supervision, mobile communication, etc. have been introduced. In this thesis, more natural and precise knowledge from spatial data is generated by construction of fuzzy spatial data cube and extraction of fuzzy association rules from it in order to improve decision-making about spatial data. This involves an extensive research about spatial knowledge discovery and how fuzzy logic can be used to develop it. It is stated that incorporating fuzzy logic to spatial data cube construction necessitates a new method for aggregation of fuzzy spatial data. We illustrate how this method also enhances the meaning of fuzzy spatial generalization rules and fuzzy association rules with a case-study about weather pattern searching. This study contributes to spatial knowledge discovery by generating more understandable and interesting knowledge from spatial data by extending spatial generalization with fuzzy memberships, extending the spatial aggregation in spatial data cube construction by utilizing weighted measures, and generating fuzzy association rules from the constructed fuzzy spatial data cube.
APA, Harvard, Vancouver, ISO, and other styles
48

He, Yuanchen. "Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/12.

Full text
Abstract:
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies.
APA, Harvard, Vancouver, ISO, and other styles
49

Junior, Jose Fernando Rodrigues. ""Desenvolvimento de um Framework para Análise Visual de Informações Suportando Data Mining"." Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26092003-122130/.

Full text
Abstract:
No presente documento são reunidas as colaborações de inúmeros trabalhos das áreas de Bancos de Dados, Descoberta de Conhecimento em Bases de Dados, Mineração de Dados, e Visualização de Informações Auxiliada por Computador que, juntos, estruturam o tema de pesquisa e trabalho da dissertação de Mestrado: a Visualização de Informações. A teoria relevante é revista e relacionada para dar suporte às atividades conclusivas teóricas e práticas relatadas no trabalho. O referido trabalho, embasado pela substância teórica pesquisada, faz diversas contribuições à ciência em voga, a Visualização de Informações, apresentando-as através de propostas formalizadas no decorrer deste texto e através de resultados práticos na forma de softwares habilitados à exploração visual de informações. As idéias apresentadas se baseiam na exibição visual de análises numéricas estatísticas básicas, frequenciais (Frequency Plot), e de relevância (Relevance Plot). São relatadas também as contribuições à ferramenta FastMapDB do Grupo de Bases de Dados e Imagens do ICMC-USP em conjunto com os resultados de sua utilização. Ainda, é apresentado o Arcabouço, previsto no projeto original, para construção de ferramentas visuais de análise, sua arquitetura, características e utilização. Por fim, é descrito o Pipeline de visualização decorrente da junção entre o Arcabouço de visualização e a ferramenta FastMapDB. O trabalho se encerra com uma breve análise da ciência de Visualização de Informações com base na literatura estudada, sendo traçado um cenário do estado da arte desta disciplina com sugestões de futuros trabalhos.
In the present document are joined the collaborations of many works from the fields of Databases, Knowledge Discovery in Databases, Data Mining, and Computer-based Information Visualization, collaborations that, together, define the structure of the research theme and the work of the Masters Dissertation presented herein. This research topic is the Information Visualization discipline, and its relevant theory is reviewed and related to support the concluding activities, both theoretical and practical, reported in this work. The referred work, anchored by the theoretical substance that was studied, makes several contributions to the science in investigation, the Information Visualization, presenting them through formalized proposals described across this text, and through practical results in the form of software enabled to the visual exploration of information. The presented ideas are based on the visual exhibition of numeric analysis, named basic statistics, frequency analysis (Frequency Plot), and according to a relevance analysis (Relevance Plot). There are also reported the contributions to the FastMapDB tool, a visual exploration tool built by the Grupo de Bases de Dados e Imagens do ICMC-USP, the performed enhancements are listed as achieved results in the text. Also, it is presented the Framework, as previewed in this work's original proposal, projected to allow the construction of visual analysis tools; besides its description are listed its architecture, characteristics and utilization. At last, it is described the visualization Pipeline that emerges from the joining of the visualization Framework and the FastMapDB tool. The work ends with a brief analysis of the Information Visualization science based on the studied literature, it is delineated a scenario of the state of the art of this discipline along with suggestions for future work.
APA, Harvard, Vancouver, ISO, and other styles
50

Bastos, Pedro. "Inferência de propriedades químicas do algodão através de técnicas de data mining." Master's thesis, Universidade do Minho, 2003. http://hdl.handle.net/10198/1048.

Full text
Abstract:
Este trabalho descreve a forma como a ferramenta de prospecção de dados (data mining) Clementine pode ser utilizada na extracção de conhecimento de dados relativos a propriedades físicas e químicas da fibra de algodão. Os resultados atingidos demonstram a forma como as técnicas de prospecção de dados podem ser usadas para estabelecer, de forma eficiente, relações existentes entre as propriedades das fibras. O desenvolvimento tecnológico tornou possível a medição das diferentes propriedades físicas das fibras de algodão tipo comprimento, micronaire, grau de uniformidade, alongamento, resistência, cor e grau de impureza. Isto é conseguido através da utilização de instrumentos HVI1, proporcionando resultados rápidos e fiáveis. No entanto, no que diz respeito ao estudo das propriedades químicas os resultados são obtidos usando métodos laboratoriais mais demorados e dispendiosos, assim por vezes são completamente ignorados pelos diferentes agentes englobados no processo de transformação da matéria prima em produto final. Assim sendo, o estudo de todas as possíveis relações existentes entre as diferentes propriedades físicas e químicas são descartadas. Este conhecimento é bastante importante, pois as propriedades químicas afectam em muito o processo de transformação das referidas fibras. Desta forma, através da utilização do Clementine, é possível obter relações entre os diferentes tipos de características da fibra apoiadas na geração de regras utilizando algoritmos de inteligência artificial. Neste estudo, são usadas várias técnicas de prospecção de dados existentes no Clementine. A prospecção de dados consiste num dos passos do processo de ECBD2, processo que tem como principal função a descoberta de conhecimento entre conjuntos de dados. A ferramenta inclui técnicas avançadas de modelação baseadas em inteligência artificial, extraindo dos dados, possíveis relações complexas existentes, bem como regras de associação entre eles. Isto ajuda a automatizar processos tipo predição, estimativa e classificação, que podem ser usadas como forma de proporcionar decisões de suporte especializado.
This work describes how the data mining tool named Clementine, can be used in knowleged extraction of phisical and chemical properties in cotton fibre. The resultes achieved demonstrate how the data mining tools can be used to establish, in a efficient way, all the existent relations between the fibre properties. The technological development enabled the measurement of cotton staple fibres properties like length, micronaire, uniformity, strength, elongation, color and trash contents. This is obtained through the use of HVI instruments, providing rapid and reliable results. However, with regard to the study of the chemical properties, the results are obtained by using more time consuming and much more expensive laboratory methods, although, sometimes they are completely ignored by all the agents envolved in the transformation process. This means that the studies about all existent relationships between the fisical and chemical properties are discarded. This knowledge is very important, because chemical properties affect the process of fiber transformation. In this way, by using Clementine it is possible to obtain relations between diferent types of cotton fibres, supported by the creation of rules using artificial intelligence. In this study, several data mining techniques available in the Clementine data mining system are used. Data mining consists of a step in the knowledge discovery process (KDD), a process that aims to discover associations between data sets. The tool includes advanced modelling techniques, based in artificial intelligence, extracting of the existing data, complex relationships, as well as possible association rules between them. This helps to automatize processes such as prediction, estimation and classification, that can be used to provide expert decision support.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography