Dissertations / Theses on the topic '080109 Pattern Recognition and Data Mining'

To see the other types of publications on this topic, follow the link: 080109 Pattern Recognition and Data Mining.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic '080109 Pattern Recognition and Data Mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Seevinck, Jennifer. "Emergence in interactive art." Thesis, University of Technology, Sydney, 2011.

Find full text
Abstract:
This thesis is concerned with creating and evaluating interactive art systems that facilitate emergent participant experiences. For the purposes of this research, interactive art is the computer based arts involving physical participation from the audience, while emergence is when a new form or concept appears that was not directly implied by the context from which it arose. This emergent ‘whole’ is more than a simple sum of its parts. The research aims to develop understanding of the nature of emergent experiences that might arise during participant interaction with interactive art systems. It also aims to understand the design issues surrounding the creation of these systems. The approach used is Practice-based, integrating practice, evaluation and theoretical research. Practice used methods from Reflection-in-action and Iterative design to create two interactive art systems: Glass Pond and +-now. Creation of +-now resulted in a novel method for instantiating emergent shapes. Both art works were also evaluated in exploratory studies. In addition, a main study with 30 participants was conducted on participant interaction with +-now. These sessions were video recorded and participants were interviewed about their experience. Recordings were transcribed and analysed using Grounded theory methods. Emergent participant experiences were identified and classified using a taxonomy of emergence in interactive art. This taxonomy draws on theoretical research. The outcomes of this Practice-based research are summarised as follows. Two interactive art systems, where the second work clearly facilitates emergent interaction, were created. Their creation involved the development of a novel method for instantiating emergent shapes and it informed aesthetic and design issues surrounding interactive art systems for emergence. A taxonomy of emergence in interactive art was also created. Other outcomes are the evaluation findings about participant experiences, including different types of emergence experienced and the coding schemes produced during data analysis.
APA, Harvard, Vancouver, ISO, and other styles
2

Nguyen, Thuy Thi Thu. "Predicting cardiovascular risks using pattern recognition and data mining." Thesis, University of Hull, 2009. http://hydra.hull.ac.uk/resources/hull:3051.

Full text
Abstract:
This thesis presents the use of pattern recognition and data mining techniques into risk prediction models in the clinical domain of cardiovascular medicine. The data is modelled and classified by using a number of alternative pattern recognition and data mining techniques in both supervised and unsupervised learning methods. Specific investigated techniques include multilayer perceptrons, radial basis functions, and support vector machines for supervised classification, and self organizing maps, KMIX and WKMIX algorithms for unsupervised clustering. The Physiological and Operative Severity Score for enUmeration of Mortality and morbidity (POSSUM), and Portsmouth POSSUM (PPOSSUM) are introduced as the risk scoring systems used in British surgery, which provide a tool for predicting risk adjustment and comparative audit. These systems could not detect all possible interactions between predictor variables whereas these may be possible through the use of pattern recognition techniques. The thesis presents KMIX and WKMIX as an improvement of the K-means algorithm; both use Euclidean and Hamming distances to measure the dissimilarity between patterns and their centres. The WKMIX is improved over the KMIX algorithm, and utilises attribute weights derived from mutual information values calculated based on a combination of Baye’s theorem, the entropy, and Kullback Leibler divergence. The research in this thesis suggests that a decision support system, for cardiovascular medicine, can be built utilising the studied risk prediction models and pattern recognition techniques. The same may be true for other medical domains.
APA, Harvard, Vancouver, ISO, and other styles
3

Kou, Yufeng. "Abnormal Pattern Recognition in Spatial Data." Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/30145.

Full text
Abstract:
In the recent years, abnormal spatial pattern recognition has received a great deal of attention from both industry and academia, and has become an important branch of data mining. Abnormal spatial patterns, or spatial outliers, are those observations whose characteristics are markedly different from their spatial neighbors. The identification of spatial outliers can be used to reveal hidden but valuable knowledge in many applications. For example, it can help locate extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumor cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs, and detect water pollution incidents. Numerous traditional outlier detection methods have been developed, but they cannot be directly applied to spatial data in order to extract abnormal patterns. Traditional outlier detection mainly focuses on "global comparison" and identifies deviations from the remainder of the entire data set. In contrast, spatial outlier detection concentrates on discovering neighborhood instabilities that break the spatial continuity. In recent years, a number of techniques have been proposed for spatial outlier detection. However, they have the following limitations. First, most of them focus primarily on single-attribute outlier detection. Second, they may not accurately locate outliers when multiple outliers exist in a cluster and correlate with each other. Third, the existing algorithms tend to abstract spatial objects as isolated points and do not consider their geometrical and topological properties, which may lead to inexact results. This dissertation reports a study of the problem of abnormal spatial pattern recognition, and proposes a suite of novel algorithms. Contributions include: (1) formal definitions of various spatial outliers, including single-attribute outliers, multi-attribute outliers, and region outliers; (2) a set of algorithms for the accurate detection of single-attribute spatial outliers; (3) a systematic approach to identifying and tracking region outliers in continuous meteorological data sequences; (4) a novel Mahalanobis-distance-based algorithm to detect outliers with multiple attributes; (5) a set of graph-based algorithms to identify point outliers and region outliers; and (6) extensive analysis of experiments on several spatial data sets (e.g., West Nile virus data and NOAA meteorological data) to evaluate the effectiveness and efficiency of the proposed algorithms.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
4

Gawande, Rashmi. "Evaluation of Automotive Data mining and Pattern Recognition Techniques for Bug Analysis." Master's thesis, Universitätsbibliothek Chemnitz, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-196770.

Full text
Abstract:
In an automotive infotainment system, while analyzing bug reports, developers have to spend significant time on reading log messages and trying to locate anomalous behavior before identifying its root cause. The log messages need to be viewed in a Traceviewer tool to read in a human readable form and have to be extracted to text files by applying manual filters in order to further analyze the behavior. There is a need to evaluate machine learning/data mining methods which could potentially assist in error analysis. One such method could be learning patterns for “normal” messages. “Normal” could even mean that they contain keywords like “exception”, “error”, “failed” but are harmless or not relevant to the bug that is currently analyzed. These patterns could then be applied as a filter, leaving behind only truly anomalous messages that are interesting for analysis. A successful application of the filter would reduce the noise, leaving only a few “anomalous” messages. After evaluation of the researched candidate algorithms, two algorithms namely GSP and FP Growth were found useful and thus implemented together in a prototype. The prototype implementation overall includes processes like pre-processing, creation of input, executing algorithms, creation of training set and analysis of new trace logs. Execution of prototype resulted in reducing manual effort thus achieving the objective of this thesis work.
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Guimei. "Supporting efficient and scalable frequent pattern mining /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LIUG.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Jianfei. "Vector-Item Pattern Mining Algorithms and their Applications." Diss., North Dakota State University, 2011. https://hdl.handle.net/10365/28841.

Full text
Abstract:
Advances in storage technology have long been driving the need for new data mining techniques. Not only are typical data sets becoming larger, but the diversity of available attributes is increasing in many problem domains. In biological applications for example, a single protein may have associated sequence-, text-, graph-, continuous and item data. Correspondingly, there is growing need for techniques to find patterns in such complex data. Many techniques exist for mapping specific types of data to vector space representations, such as the bag-of-words model for text [58] or embedding in vector spaces of graphs [94, 91]. However, there are few techniques that recognize the resulting vector space representations as units that may be combined and further processed. This research aims to mine important vector-item patterns hidden across multiple and diverse data sources. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. Two types of vector-item pattern mining algorithms have been developed, namely histogram-based vector-item pattern mining algorithms and point distribution vector-item pattern mining algorithms. In histogram-based vector-item pattern mining algorithms, a vector-item pattern is significant or important if its density histogram significantly differs from what is expected for a random subset of transactions, using ?? goodness-of-fit test or effect size analysis. For point distribution vector-item pattern mining algorithms, a vector-item pattern is significant if its probability density function (PDF) has a big KullbackLeibler divergence from random subsamples. We have applied the vector-item pattern mining algorithms to several application areas, and by comparing with other state-of-art algorithms we justify the effectiveness and efficiency of the algorithms.
APA, Harvard, Vancouver, ISO, and other styles
7

Ke, Yiping. "Efficient correlated pattern discovery in databases /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20KE.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Leighty, Brian David. "Data Mining for Induction of Adjacency Grammars and Application to Terrain Pattern Recognition." NSUWorks, 2009. http://nsuworks.nova.edu/gscis_etd/212.

Full text
Abstract:
The process of syntactic pattern recognition makes the analogy between the syntax of languages and the structure of spatial patterns. The recognition process is achieved by parsing a given pattern to determine if it is syntactically correct with respect to a defined grammar. The generation of pattern grammars can be a cumbersome process when many objects are involved. This has led to the problem of spatial grammar inference. Current approaches have used genetic algorithms and inductive techniques and have demonstrated limitations. Alternative approaches are needed that produce accurate grammars while remaining computationally efficient in light of the NP-hardness of the problem. Co-location rule mining techniques in the field of Knowledge Discovery and Data Mining address the complexity issue using neighborhood restrictions and pruning strategies based on monotonic Measures Of Interest. The goal of this research was to develop and evaluate an inductive method for inferring an adjacency grammar utilizing co-location rule mining techniques to gain efficiency while providing accurate and concise production sets. The method incrementally discovers, without supervision, adjacency patterns in spatial samples, relabels them via a production rule and repeats the procedure with the newly labeled regions. The resulting rules are used to form an adjacency grammar. Grammars were generated and evaluated within the context of a syntactic pattern recognition system that identifies landform patterns in terrain elevation datasets. The proposed method was tested using a k-fold cross-validation methodology. Two variations were also tested using unsupervised and supervised training, both with no rule pruning. Comparison of these variations with the proposed method demonstrated the effectiveness of rule pruning and rule discovery. Results showed that the proposed method of rule inference produced rulesets having recall, precision and accuracy values of 82.6%, 97.7% and 92.8%, respectively, which are similar to those using supervised training. These rulesets were also the smallest, had the lowest average number of rules fired in parsing, and had the shortest average parse time. The use of rule pruning substantially reduced rule inference time (104.4 s vs. 208.9 s). The neighborhood restriction used in adjacency calculations demonstrated linear complexity in the number of regions.
APA, Harvard, Vancouver, ISO, and other styles
9

Loekito, Elsa. "Mining simple and complex patterns efficiently using binary decision diagrams /." Connect to thesis, 2009. http://repository.unimelb.edu.au/10187/4378.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Freeman, Dane Fletcher. "A product family design methodology employing pattern recognition." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50267.

Full text
Abstract:
Sharing components in a product family requires a trade-off between the individual products' performances and overall family costs. It is critical for a successful family to identify which components are similar, so that sharing does not compromise the individual products' performances. This research formulates two commonality identification approaches for use in product family design and investigates their applicability in a generic product family design methodology. Having a commonality identification approach reduces the combinatorial sharing problem and allows for more quality family alternatives to be considered. The first is based on the pattern recognition technique of fuzzy c-means clustering in component subspaces. If components from different products are similar enough to be grouped into the same cluster, then those components could possibly become the same platform. Fuzzy equivalence relations that show the binary relationship from one products' component to a different products' component can be extracted from the cluster membership functions. The second approach builds a Bayesian network representing the joint distribution of a design space exploration. Using this model, a series of inferences can be made based on product performance and component constraints. Finally the posterior design variable distributions can be processed using a similarity metric like the earth mover distance to identify which products' components are similar to another's.
APA, Harvard, Vancouver, ISO, and other styles
11

Ratnayake, Uditha. "Application of the recommendation architecture model for text mining /." Access via Murdoch University Digital Theses Project, 2003. http://wwwlib.murdoch.edu.au/adt/browse/view/adt-MU20040713.113844.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Tang, Fung Michael, and 鄧峰. "Sequence classification and melody tracks selection." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B29742973.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Law, Hiu Chung. "Clustering, dimensionality reduction, and side information." Diss., Connect to online resource - MSU authorized users, 2006.

Find full text
Abstract:
Thesis (Ph. D.)--Michigan State University. Dept. of Computer Science & Engineering, 2006.
Title from PDF t.p. (viewed on June 19, 2009) Includes bibliographical references (p. 296-317). Also issued in print.
APA, Harvard, Vancouver, ISO, and other styles
14

Tang, Fung Michael. "Sequence classification and melody tracks selection /." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B25017470.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Lyra, Risto Matti Juhani. "Topical subcategory structure in text classification." Thesis, University of Sussex, 2019. http://sro.sussex.ac.uk/id/eprint/81340/.

Full text
Abstract:
Data sets with rich topical structure are common in many real world text classification tasks. A single data set often contains a wide variety of topics and, in a typical task, documents belonging to each class are dispersed across many of the topics. Often, a complex relationship exists between the topic a document discusses and the class label: positive or negative sentiment is expressed in documents from many different topics, but knowing the topic does not necessarily help in determining the sentiment label. We know from tasks such as Domain Adaptation that sentiment is expressed in different ways under different topics. Topical context can in some cases even reverse the sentiment polarity of words: to be sharp is a good quality for knives but bad for singers. This property can be found in many different document classification tasks. Standard document classification algorithms do not account for or take advantage of topical diversity; instead, classifiers are usually trained with the tacit assumption that topical diversity does not play a role. This thesis is focused on the interplay between the topical structure of corpora, how the target labels in a classification task distribute over the topics and how the topical structure can be utilised in building ensemble models for text classification. We show empirically that a dataset with rich topical structure can be problematic for single classifiers, and we develop two novel ensemble models to address the issues. We focus on two document classification tasks: document level sentiment analysis of product reviews and hierarchical categorisation of news text. For each task we develop a novel ensemble method that utilises topic models to address the shortcomings of traditional text classification algorithms. Our contribution is in showing empirically that the class association of document features is topic dependent. We show that using the topical context of documents for building ensembles is beneficial for some tasks, and present two new ensemble models for document classification. We also provide a fresh viewpoint for reasoning about the relationship of class labels, topical categories and document features.
APA, Harvard, Vancouver, ISO, and other styles
16

Minnen, David. "Unsupervised discovery of activity primitives from multivariate sensor data." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24623.

Full text
Abstract:
Thesis (Ph.D.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Thad Starner; Committee Member: Aaron Bobick; Committee Member: Bernt Schiele; Committee Member: Charles Isbell; Committee Member: Irfan Essa
APA, Harvard, Vancouver, ISO, and other styles
17

Jin, Ruoming. "New techniques for efficiently discovering frequent patterns." Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1121795612.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains xvii, 170 p.; also includes graphics. Includes bibliographical references (p. 160-170). Available online via OhioLINK's ETD Center
APA, Harvard, Vancouver, ISO, and other styles
18

Abdelhamid, Neda. "Deriving classifiers with single and multi-label rules using new Associative Classification methods." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/10120.

Full text
Abstract:
Associative Classification (AC) in data mining is a rule based approach that uses association rule techniques to construct accurate classification systems (classifiers). The majority of existing AC algorithms extract one class per rule and ignore other class labels even when they have large data representation. Thus, extending current AC algorithms to find and extract multi-label rules is promising research direction since new hidden knowledge is revealed for decision makers. Furthermore, the exponential growth of rules in AC has been investigated in this thesis aiming to minimise the number of candidate rules, and therefore reducing the classifier size so end-user can easily exploit and maintain it. Moreover, an investigation to both rule ranking and test data classification steps have been conducted in order to improve the performance of AC algorithms in regards to predictive accuracy. Overall, this thesis investigates different problems related to AC not limited to the ones listed above, and the results are new AC algorithms that devise single and multi-label rules from different applications data sets, together with comprehensive experimental results. To be exact, the first algorithm proposed named Multi-class Associative Classifier (MAC): This algorithm derives classifiers where each rule is connected with a single class from a training data set. MAC enhanced the rule discovery, rule ranking, rule filtering and classification of test data in AC. The second algorithm proposed is called Multi-label Classifier based Associative Classification (MCAC) that adds on MAC a novel rule discovery method which discovers multi-label rules from single label data without learning from parts of the training data set. These rules denote vital information ignored by most current AC algorithms which benefit both the end-user and the classifier's predictive accuracy. Lastly, the vital problem related to web threats called 'website phishing detection' was deeply investigated where a technical solution based on AC has been introduced in Chapter 6. Particularly, we were able to detect new type of knowledge and enhance the detection rate with respect to error rate using our proposed algorithms and against a large collected phishing data set. Thorough experimental tests utilising large numbers of University of California Irvine (UCI) data sets and a variety of real application data collections related to website classification and trainer timetabling problems reveal that MAC and MCAC generates better quality classifiers if compared with other AC and rule based algorithms with respect to various evaluation measures, i.e. error rate, Label-Weight, Any-Label, number of rules, etc. This is mainly due to the different improvements related to rule discovery, rule filtering, rule sorting, classification step, and more importantly the new type of knowledge associated with the proposed algorithms. Most chapters in this thesis have been disseminated or under review in journals and refereed conference proceedings.
APA, Harvard, Vancouver, ISO, and other styles
19

Santos, Jamilson Bispo dos. "Pesquisa de similaridades em imagens mamográficas com base na extração de características." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-23052014-010946/.

Full text
Abstract:
Este trabalho apresenta uma estratégia computacional para a consolidação do treinamento dos radiologistas residentes por meio da classificação de imagens mamográficas pela similaridade, analisando informações dos laudos realizados por médicos experientes, obtendo os atributos extraídos das imagens médicas. Para a descoberta de padrões que caracterizam a similaridade aplicam-se técnicas de processamento digital de imagens e de mineração de dados nas imagens mamográficas. O reconhecimento de padrões tem como objetivo realizar a classificação de determinados conjuntos de imagens em classes. A classificação dos achados mamográficos é realizada utilizando Redes Neurais Artificiais, por meio do classificador Self-Organizing Map (SOM). O presente trabalho utiliza a recuperação de imagens por conteúdo (CBIR- Content-Based Image Retrieval), considerando a similaridade em relação a uma imagem previamente selecionada para o treinamento. As imagens são classificadas de acordo com a similaridade, analisando-se informações dos atributos extraídos das imagens e dos laudos. A identificação da similaridade é obtida pela extração de características, com a utilização da transformada de wavelets.
This work presents a computational strategy to consolidate the training of residents radiologists through the classification of mammographic images by similarity, analyzing information from reports made by experienced physicians, obtaining the attributes extracted from medical images. For the discovery of patterns that characterize the similarity apply techniques of digital image processing and data mining in mammographic images. Pattern recognition aims to achieve the classification of certain sets of images in classes. The classification of mammographic is performed using Artificial Neural Networks, through the classifier Self-Organizing Map (SOM). This work uses the image retrieval (CBIR-Content- Based Image Retrieval), considering the similarity in relation to an image already selected for training. The images are classified according to similarity, analyzing attribute information extracted from the images and reports. The identification of similarity was obtained by feature extraction, using the technique of wavelet transform.
APA, Harvard, Vancouver, ISO, and other styles
20

Pirttikangas, S. (Susanna). "Routine Learning: from Reactive to Proactive Environments." Doctoral thesis, University of Oulu, 2004. http://urn.fi/urn:isbn:9514275659.

Full text
Abstract:
Abstract Technological development and various information services becoming common has had the effect that data from everyday situations is available. Utilizing this technology and the data it produces in an efficient manner is called context-aware or ubiquitous computing. The research includes the specifications of each application, the requirements of the communication systems, issues of privacy, and human - computer interaction, for example. The environment should learn from the user's behaviour and communicate with the user. The communication should not be only reactive, but proactive as well. This thesis is divided into two parts, both representing methodology for enabling intelligence in our everyday surroundings. In part one, three different applications are defined for studying context-recognition and routine learning: a health monitoring system, a context-aware health club application, and automatic device configuration in an office space. The path for routine learning is straight forward and it is closely related to pattern recognition research. Sensory data is collected from users in various different situations, the signals are pre-processed, and the contexts recognized from this sensory data. Furthermore, routine learning is realized through association rules. The routine learning paradigm developed here can utilize already recognized contexts despite their meaning in the real world. The user makes the final decision on whether the routine is important or not, and has authority over every action of the system. The second part of the thesis is built on experiments on identifying a person walking on a pressure-sensitive floor. Resolving the characteristics of the special sensor producing the measurements, which lies under the normal flooring, is one of the tasks of this research. The identification is tested with Hidden Markov models and Learning Vector Quantization. The methodology developed in this thesis offers a step along the long road towards functional and calm intelligent environments.
APA, Harvard, Vancouver, ISO, and other styles
21

Guzmán, Ponce Angelica. "Nuevos Algoritmos Basados en Grafos y Clustering para el Tratamiento de Complejidades de los Datos." Tesis de doctorado, Universidad Autónoma del Estado de México, 2021. http://hdl.handle.net/20.500.11799/110464.

Full text
Abstract:
Doctoral thesis
Nowadays, knowledge extraction from data is an essential task for decisionmaking in many areas. However, the data sets commonly present some negative problems (complexities) that decrease the performance in the knowledge extraction process. The imbalanced distribution of data between classes and the presence of noise and/or class overlap are data intrinsic characteristics that frequently decrease the performance of the knowledge extraction because data are assumed to keep a uniform distribution and free from any other problem. All these issues have been studied in Pattern Recognition and Data Mining, because of their impact on the performance of the learning models. Thus this Ph.D. thesis addresses class imbalance, class overlap and/or noise through techniques that reduce and clean the most represented class. Among the solutions to handle with the class imbalance problem, new algorithms based on graphs are proposed. This idea arises from the fact that many real-world problems (network analysis, chemical models, remote sensing, among others) have been tackled by using graph-based strategies, in which the problem is transformed in terms of vertices and edges. Keeping this in mind, the proposals presented in this Ph.D. thesis consider the most represented class as as a complete graph in such a way that a representative subset of majority class instances is obtained through reduction criteria. Regarding the data sets with class imbalance and class overlap and/or noise, the proposals include the use of clustering algorithms as a cleaning strategy. It is well known that these algorithms are used to group instances according to similar characteristics; however, the proposal here presented makes use of their ability to detect noisy instances. By this, the application of a clustering algorithm is carried out before facing the class imbalance. As a further extension to the proposals presented in this Ph.D. thesis and due to the growing interest in Big Data problems, the last part of this report introduces a graph-based algorithm to handle class imbalance in large-scale data sets.
Becas nacionales del CONACYT
APA, Harvard, Vancouver, ISO, and other styles
22

Sammouri, Wissam. "Data mining of temporal sequences for the prediction of infrequent failure events : application on floating train data for predictive maintenance." Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1041/document.

Full text
Abstract:
De nos jours, afin de répondre aux exigences économiques et sociales, les systèmes de transport ferroviaire ont la nécessité d'être exploités avec un haut niveau de sécurité et de fiabilité. On constate notamment un besoin croissant en termes d'outils de surveillance et d'aide à la maintenance de manière à anticiper les défaillances des composants du matériel roulant ferroviaire. Pour mettre au point de tels outils, les trains commerciaux sont équipés de capteurs intelligents envoyant des informations en temps réel sur l'état de divers sous-systèmes. Ces informations se présentent sous la forme de longues séquences temporelles constituées d'une succession d'événements. Le développement d'outils d'analyse automatique de ces séquences permettra d'identifier des associations significatives entre événements dans un but de prédiction d'événement signant l'apparition de défaillance grave. Cette thèse aborde la problématique de la fouille de séquences temporelles pour la prédiction d'événements rares et s'inscrit dans un contexte global de développement d'outils d'aide à la décision. Nous visons à étudier et développer diverses méthodes pour découvrir les règles d'association entre événements d'une part et à construire des modèles de classification d'autre part. Ces règles et/ou ces classifieurs peuvent ensuite être exploités pour analyser en ligne un flux d'événements entrants dans le but de prédire l'apparition d'événements cibles correspondant à des défaillances. Deux méthodologies sont considérées dans ce travail de thèse: La première est basée sur la recherche des règles d'association, qui est une approche temporelle et une approche à base de reconnaissance de formes. Les principaux défis auxquels est confronté ce travail sont principalement liés à la rareté des événements cibles à prédire, la redondance importante de certains événements et à la présence très fréquente de "bursts". Les résultats obtenus sur des données réelles recueillies par des capteurs embarqués sur une flotte de trains commerciaux permettent de mettre en évidence l'efficacité des approches proposées
In order to meet the mounting social and economic demands, railway operators and manufacturers are striving for a longer availability and a better reliability of railway transportation systems. Commercial trains are being equipped with state-of-the-art onboard intelligent sensors monitoring various subsystems all over the train. These sensors provide real-time flow of data, called floating train data, consisting of georeferenced events, along with their spatial and temporal coordinates. Once ordered with respect to time, these events can be considered as long temporal sequences which can be mined for possible relationships. This has created a neccessity for sequential data mining techniques in order to derive meaningful associations rules or classification models from these data. Once discovered, these rules and models can then be used to perform an on-line analysis of the incoming event stream in order to predict the occurrence of target events, i.e, severe failures that require immediate corrective maintenance actions. The work in this thesis tackles the above mentioned data mining task. We aim to investigate and develop various methodologies to discover association rules and classification models which can help predict rare tilt and traction failures in sequences using past events that are less critical. The investigated techniques constitute two major axes: Association analysis, which is temporal and Classification techniques, which is not temporal. The main challenges confronting the data mining task and increasing its complexity are mainly the rarity of the target events to be predicted in addition to the heavy redundancy of some events and the frequent occurrence of data bursts. The results obtained on real datasets collected from a fleet of trains allows to highlight the effectiveness of the approaches and methodologies used
APA, Harvard, Vancouver, ISO, and other styles
23

Lopes, Kelly Marques de Oliveira 1982. "Modelos baseados em data mining para classificação multitemporal de culturas no Mato Grosso utilizando dados de NDVI/MODIS." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/307578.

Full text
Abstract:
Orientadores: Laércio Luis Vendite, Stanley Robson de Medeiros Oliveira
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-23T13:40:19Z (GMT). No. of bitstreams: 1 Lopes_KellyMarquesdeOliveira_M.pdf: 10053877 bytes, checksum: 2126c76ce80f71b89ec947645274c384 (MD5) Previous issue date: 2013
Resumo: O desenvolvimento de estudos na área de geotecnologia e o aumento na capacidade de armazenar dados têm melhorado a exploração e os estudos de imagens de satélites obtidas através de sensores orbitais. O mapeamento da cobertura da terra, estimativas de produtividade de culturas e a previsão de safras são informações importantes para o agricultor e para o governo, pois essas informações são essenciais para subsidiar decisões relacionadas à produção, estimativas de compra e venda, e cálculos de importação e exportação. Uma das alternativas para analisar dados de uso e cobertura da terra, obtidos por meio de sensores, é o uso de técnicas de mineração de dados, uma vez que essas técnicas podem ser utilizadas para transformar dados e informações em conhecimentos que irão subsidiar decisões relativas ao planejamento agrícola. Neste trabalho, foram utilizados dados multitemporais sobre o índice de vegetação NDVI, derivados de imagens do sensor MODIS, para o monitoramento das culturas de algodão, soja e milho no estado do Mato Grosso, no período do ano-safra de 2008/2009. O conjunto de dados, fornecido pela Embrapa Informática Agropecuária, foi composto por 24 colunas e 728 linhas, onde as 23 primeiras colunas referem-se aos valores do NVDI, e a última, à cobertura do solo. A metodologia utilizada teve como base o modelo CRISP-DM (Cross Industry Standard Process for Data Mining). Modelos preditivos para classificar dados sobre essas culturas foram elaborados e avaliados por algoritmos de aprendizado de máquina, tais como árvores de decisão (J48 e PART), florestas aleatórias (Random Forest). A seleção de atributos melhorou os valores do índice Kappa e a acurácia dos modelos. Foram geradas regras de classificação para mapear as culturas estudadas (soja, milho e algodão). Os resultados revelaram que os algoritmos de aprendizado de máquina são promissores para o problema de classificação de cobertura do solo. Em particular o algoritmo J48, utilizado em conjunto com a seleção de atributos feito por meio de análise de componentes principais, destacou-se em relação ao demais pela simplicidade e pelos valores apresentados. Os resultados também evidenciaram a presença regiões de cultivo do algodão em outras áreas do estado, fora daquelas estudadas
Abstract: The development of studies in the field of geotechnology and increased ability to store data have improved the exploration and study of satellite images obtained by satellite sensors. The mapping of land cover, estimates of crop productivity and crop forecasting is important information for the farmer and for the government, because this information is essential to support decisions related to production, estimates of purchase and sale, import and calculations and export. An alternative use for data analysis and coverage will be obtained by means of sensors, is the use of data mining techniques since these techniques can be used to transform data and information on the knowledge that will support decisions on agricultural planning. In this work, we used data on the multitemporal vegetation index NDVI derived from MODIS images for monitoring crops of cotton, soybean and corn in the state of Mato Grosso, in the period of the crop year 2008/2009. The dataset supplied by Embrapa Agricultural Informatics, comprised 24 columns and 728 rows, where the 23 first columns refer to the values of NVDI, and the last, the soil cover. The methodology used was based on the model CRISP-DM (Cross Industry Standard Process for Data Mining). Predictive models to classify data on these cultures were prepared and analyzed by machine learning algorithms such as decision trees (J48 and PART), Random Forests (Random Forest). The feature selection improved the Kappa index values and accuracy of the models. Classification rules were generated to map the cultures studied (soy, corn and cotton). The results show that the machine learning algorithms are promising for the problem of classification of land cover. In particular, the J48 algorithm, used in conjunction with feature selection done by principal component analysis, stood out against the other by the simplicity and the values presented. The results also revealed the presence of regions of cotton cultivation in other areas of the state, out of those studied
Mestrado
Matematica Aplicada e Computacional
Mestra em Matemática Aplicada e Computacional
APA, Harvard, Vancouver, ISO, and other styles
24

Agarwal, Virat. "Algorithm design on multicore processors for massive-data analysis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34839.

Full text
Abstract:
Analyzing massive-data sets and streams is computationally very challenging. Data sets in systems biology, network analysis and security use network abstraction to construct large-scale graphs. Graph algorithms such as traversal and search are memory-intensive and typically require very little computation, with access patterns that are irregular and fine-grained. The increasing streaming data rates in various domains such as security, mining, and finance leaves algorithm designers with only a handful of clock cycles (with current general purpose computing technology) to process every incoming byte of data in-core at real-time. This along with increasing complexity of mining patterns and other analytics puts further pressure on already high computational requirement. Processing streaming data in finance comes with an additional constraint to process at low latency, that restricts the algorithm to use common techniques such as batching to obtain high throughput. The primary contributions of this dissertation are the design of novel parallel data analysis algorithms for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive streaming data, financial market data feed processing and analytics, and data transformation, that capture the machine-independent aspects, to guarantee portability with performance to future processors, with high performance implementations on multicore processors that embed processorspecific optimizations. Our breadth first search graph traversal algorithm demonstrates a capability to process massive graphs with billions of vertices and edges on commodity multicore processors at rates that are competitive with supercomputing results in the recent literature. We also present high performance scalable keyword scanning on streaming data using novel automata compression algorithm, a model of computation based on small software content addressable memories (CAMs) and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level algorithmic approach to process financial feeds we present a solution that decodes and normalizes option market data at rates an order of magnitude more than the current needs of the market, yet portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm design challenges to process massive-data and present solutions and techniques that we believe can be used and extended to solve future research problems in this domain.
APA, Harvard, Vancouver, ISO, and other styles
25

Kang, James M. "A query engine of novelty in video streams /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/977.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Xu, Yaomin. "New Clustering and Feature Selection Procedures with Applications to Gene Microarray Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=case1196144281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Fabbri, Renato. "Topological stability and textual differentiation in human interaction networks: statistical analysis, visualization and linked data." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-11092017-154706/.

Full text
Abstract:
This work reports on stable (or invariant) topological properties and textual differentiation in human interaction networks, with benchmarks derived from public email lists. Activity along time and topology were observed in snapshots in a timeline, and at different scales. Our analysis shows that activity is practically the same for all networks across timescales ranging from seconds to months. The principal components of the participants in the topological metrics space remain practically unchanged as different sets of messages are considered. The activity of participants follows the expected scale-free outline, thus yielding the hub, intermediary and peripheral classes of vertices by comparison against the Erdös-Rényi model. The relative sizes of these three sectors are essentially the same for all email lists and the same along time. Typically, 3-12% of the vertices are hubs, 15-45% are intermediary and 44-81% are peripheral vertices. Texts from each of such sectors are shown to be very different through direct measurements and through an adaptation of the Kolmogorov-Smirnov test. These properties are consistent with the literature and may be general for human interaction networks, which has important implications for establishing a typology of participants based on quantitative criteria. For guiding and supporting this research, we also developed a visualization method of dynamic networks through animations. To facilitate verification and further steps in the analyses, we supply a linked data representation of data related to our results.
Este trabalho relata propriedades topológicas estáveis (ou invariantes) e diferenciação textual em redes de interação humana, com referências derivadas de listas públicas de e-mail. A atividade ao longo do tempo e a topologia foram observadas em instantâneos ao longo de uma linha do tempo e em diferentes escalas. A análise mostra que a atividade é praticamente a mesma para todas as redes em escalas temporais de segundos a meses. As componentes principais dos participantes no espaço das métricas topológicas mantêm-se praticamente inalteradas quando diferentes conjuntos de mensagens são considerados. A atividade dos participantes segue o esperado perfil livre de escala, produzindo, assim, as classes de vértices dos hubs, dos intermediários e dos periféricos em comparação com o modelo Erdös-Rényi. Os tamanhos relativos destes três setores são essencialmente os mesmos para todas as listas de e-mail e ao longo do tempo. Normalmente, 3-12% dos vértices são hubs, 15-45% são intermediários e 44-81% são vértices periféricos. Os textos de cada um destes setores são considerados muito diferentes através de uma adaptação dos testes de Kolmogorov-Smirnov. Estas propriedades são consistentes com a literatura e podem ser gerais para redes de interação humana, o que tem implicações importantes para o estabelecimento de uma tipologia dos participantes com base em critérios quantitativos. De modo a guiar e apoiar esta pesquisa, também desenvolvemos um método de visualização para redes dinâmicas através de animações. Para facilitar a verificação e passos seguintes nas análises, fornecemos uma representação em dados ligados dos dados relacionados aos nossos resultados.
APA, Harvard, Vancouver, ISO, and other styles
28

Schwarz, Ivan. "Rozpoznávání aktivit z trajektorií pohybujících se objektů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236165.

Full text
Abstract:
The aim of this thesis is a development of a system for trajectory-based periodic pattern recognition and following GPS trajectory classification. This system is designed according to a performed analysis of techniques of data mining in moving object data and furthermore, on recent research on a subject of a trajectory-based activity recognition. This system is implemented in C++ programming language and experiments addresing its      effectiveness are performed.
APA, Harvard, Vancouver, ISO, and other styles
29

Li, Yunming. "Machine vision algorithms for mining equipment automation." Thesis, Queensland University of Technology, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
30

Sun, Hongliang, and University of Lethbridge Faculty of Arts and Science. "Implementation of a classification algorithm for institutional analysis." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2008, 2008. http://hdl.handle.net/10133/738.

Full text
Abstract:
The report presents an implemention of a classification algorithm for the Institutional Analysis Project. The algorithm used in this project is the decision tree classification algorithm which uses a gain ratio attribute selectionmethod. The algorithm discovers the hidden rules from the student records, which are used to predict whether or not other students are at risk of dropping out. It is shown that special rules exist in different data sets, each with their natural hidden knowledge. In other words, the rules that are obtained depend on the data that is used for classification. In our preliminary experiments, we show that between 55-78 percent of data with unknown class lables can be correctly classified, using the rules obtained from data whose class labels are known. We feel this is acceptable, given the large number of records, attributes, and attribute values that are used in the experiments. The project results are useful for large data set analysis.
viii, 38 leaves ; 29 cm. --
APA, Harvard, Vancouver, ISO, and other styles
31

Kuiaski, Diogo Rosa. "Segmentação de pele em imagens digitais para a detecção automática de conteúdo ofensivo." Universidade Tecnológica Federal do Paraná, 2010. http://repositorio.utfpr.edu.br/jspui/handle/1/1338.

Full text
Abstract:
CAPES; UOL
O presente trabalho tem como objetivo estudar meios de efetuar a detecção automática de conteúdo ofensivo (pornografia) em imagens digitais. Para tal estudou-se largamente segmentação de pixels de pele, espaços de cor e descritores de conteúdo. Esse trabalho tem um foco maior na segmentação de pele, pois é a etapa primordial nos trabalhos envolvendo detecção de conteúdo ofensivo. Testou-se quatro métodos de segmentação de pixels de pele e foi construído um banco de dados estruturado para o estudo de segmentação de pele, com meios de anotação de imagens para auxiliar na estruturação e no controle das características das imagens do banco. Com o auxílio das metainformações do banco de imagens, foram conduzidos estudos envolvendo as condições de iluminação e a segmentação de pele. Por fim, foi implementado um algoritmo de extração de características em sistemas de classificação pelo conteúdo de imagens (CBIR) para detecção de conteúdo ofensivo.
This work presents a study of suitable approaches for automatic detection of offensive content (pornography) in digital images. Extensive experiments were conducted for skin pixel segmentation, colour spaces and content descriptors. This work focus its efforts on skin pixel segmentation, since this segmentation is the pre-processing stage for almost every content-based offensive image classification methods in the literature. Four skin skin segmentation methods were tested in six colour spaces. Also, a structured image database was built to help improve studies in skin segmentation, with the possibility of adding meta-information to the images in the database, such as illumination conditions and camera standards. With the help of meta information from the image database, experimets involving illumination conditions and skin colour segmentation were also done. Finally, some feature extraction algorithms were implemented in order to apply content-based image retrieval (CBIR) algorithms to classify offensive images.
APA, Harvard, Vancouver, ISO, and other styles
32

Shakeel, Mohammad Danish. "Land Cover Classification Using Linear Support Vector Machines." Connect to resource online, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1231812653.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Hammal, Mohamed Ali. "Contribution à la découverte de sous-groupes corrélés : Application à l’analyse des systèmes territoriaux et des réseaux alimentaires." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI024.

Full text
Abstract:
Mieux nourrir les villes en quantité et en qualité, notamment les grandes agglomérations, constitue un défi majeur dont la résolution passe par une meilleure compréhension des relations entre les populations urbaines et leur alimentation. A l’échelle des systèmes alimentaires urbains, on a besoin de diagnostics ciblant la disponibilité des ressources alimentaires croisée avec les profils socio-économiques des territoires et l’on manque d’outils et de méthodes pour appréhender de façon systématique les relations entre les bassins de consommation, l’offre et les comportements alimentaires. L’objectif de cette thèse est de contribuer à l’élaboration de nouveaux outils informatiques pour traiter des données temporelles, hétérogènes et multi-sources afin d’identifier et de caractériser des comportements propres à une zone géographique. Pour cela, nous nous appuyons sur l’exploration conjointe de motifs graduels, identifiant des corrélations de rang, et de sous-groupes afin de découvrir des contextes pour lesquels les corrélations décrites par les motifs graduels sont exceptionnellement fortes par rapport au reste des données. Nous proposons un algorithme d’énumération s’appuyant sur des propriétés d’élagage avec des bornes supérieures, ainsi qu’un autre algorithme qui échantillonne les motifs selon la mesure de qualité. Ces approches sont validées non seulement sur des jeux de données de référence, mais aussi à travers une étude empirique de laformation des déserts alimentaires sur l’agglomération lyonnaise
Better feeding cities in quantity and quality, especially large cities, is a major challenge, whose resolution requires a better understanding of the relationships between urban populations and their food. On the scale of urban food systems, we need to understand the availability of food resources crossed with the socio-economic profiles of the territories. But we lack tools and methods to systematically understand the relationships between consumption basins, supply and eating habits. The objective of this thesis is to contribute to the development of new IT tools to process temporal, heterogeneous and multi-sources data in order to identify and characterize behaviors specific to a geographic area. For this, we rely on the joint exploration of gradual patterns, to discover rank correlations, and subgroups in order to find contexts for which the correlations described by the gradual patterns are exceptionally strong compared to the remaining of the data. We propose an enumeration algorithm based on pruning properties with upper bounds, as well as another algorithm which samples the patterns according to the quality measure. These approaches are validated not only on benchmark datasets, but also through an empirical study of the formation of food deserts in the Lyon urban area
APA, Harvard, Vancouver, ISO, and other styles
34

Macedo, Charles Mendes de. "Aplicação de algoritmos de agrupamento para descoberta de padrões de defeito em software JavaScript." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-29012019-152129/.

Full text
Abstract:
As aplicações desenvolvidas com a linguagem JavaScript, vêm aumentando a cada dia, não somente aquelas na web (client-side), como também as aplicações executadas no servidor (server-side) e em dispositivos móveis (mobile). Neste contexto, a existência de ferramentas para identicação de defeitos e code smells é fundamental, para auxiliar desenvolvedores durante a evoluçãp destas aplicações. A maioria dessas ferramentas utiliza uma lista de defeitos predenidos que são descobertos a partir da observação das melhores práticas de programação e a intuição do desenvolvedor. Para melhorar essas ferramentas, a descoberta automática de defeitos e code smells é importante, pois permite identicar quais ocorrem realmente na prática e de forma frequente. Uma ferramenta que implementa uma estratégia semiautomática para descobrir padrões de defeitos através de agrupamentos das mudanças realizadas no decorrer do desenvolvimento do projeto é a ferramenta BugAID. O objetivo deste trabalho é contribuir nessa ferramenta estendendo-a com melhorias na abordagem da extração de características, as quais são usadas pelos algoritmos de clusterização. O módulo estendido encarregado da extração de características é chamado de BugAIDExtract+ +. Além disso, neste trabalho é realizada uma avaliação de vários algoritmos de clusterização na descoberta dos padrõs de defeitos em software JavaScript
Applications developed with JavaScript language are increasing every day, not only for client-side, but also for server-side and for mobile devices. In this context, the existence of tools to identify faults is fundamental in order to assist developers during the evolution of their applications. Most of these tools use a list of predened faults that are discovered from the observation of the programming best practices and developer intuition. To improve these tools, the automatic discovery of faults and code smells is important because it allows to identify which ones actually occur in practice and frequently. A tool that implements a semiautomatic strategy for discovering bug patterns by grouping the changes made during the project development is the BugAID. The objective of this work is to contribute to the BugAID tool, extending this tool with improvements in the extraction of characteristics to be used by the clustering algorithm. The extended module that extracts the characteristics is called BE+. Additionally, an evaluation of the clustering algorithms used for discovering fault patterns in JavaScript software is performed
APA, Harvard, Vancouver, ISO, and other styles
35

Colla, Ernesto Coutinho. "Aplicação de modelos gráficos probabilísticos computacionais em economia." reponame:Repositório Institucional do FGV, 2009. http://hdl.handle.net/10438/4261.

Full text
Abstract:
Made available in DSpace on 2010-04-20T20:56:57Z (GMT). No. of bitstreams: 4 Ernesto_Colla.pdf.jpg: 21014 bytes, checksum: 4f059b37f39662752479b4c41e7d0ccd (MD5) Ernesto_Colla.pdf.txt: 293178 bytes, checksum: bbca88752988b32a6da9e503e9fbe5cf (MD5) license.txt: 4810 bytes, checksum: 4ca799e651215ccf5ee1c07a835ee897 (MD5) Ernesto_Colla.pdf: 1784465 bytes, checksum: 7c45a00d36db536ce2c8e1eff4a23b6b (MD5) Previous issue date: 2009-06-29T00:00:00Z
We develop a probabilistic model using Machine Learning tools to classify the trend of the Brazilian country risk expressed EMBI+ (Emerging Markets Bond Index Plus). The main goal is verify if Machine Learning is useful to build economic models which could be used as reasoning tools under uncertainty. Specifically we use Bayesian Networks to perform pattern recognition in observed macroeconomics and financial data. The results are promising. We get the main expected theoretical relationship between country risk and economic variables, as well as international economic context and market expectations.
O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do 'risco país' para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.
APA, Harvard, Vancouver, ISO, and other styles
36

Skabar, Andrew Alojz. "Inductive learning techniques for mineral potential mapping." Thesis, Queensland University of Technology, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
37

Bergfors, Anund. "Using machine learning to identify the occurrence of changing air masses." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-357939.

Full text
Abstract:
In the forecast data post-processing at the Swedish Meteorological and Hydrological Institute (SMHI) a regular Kalman filter is used to debias the two meter air temperature forecast of the physical models by controlling towards air temperature observations. The Kalman filter however diverges when encountering greater nonlinearities in shifting weather patterns, and can only be manually reset when a new air mass has stabilized itself within its operating region. This project aimed to automate this process by means of a machine learning approach. The methodology was at its base supervised learning, by first algorithmically labelling the air mass shift occurrences in the data, followed by training a logistic regression model. Observational data from the latest twenty years of the Uppsala automatic meteorological station was used for the analysis. A simple pipeline for loading, labelling, training on and visualizing the data was built. As a work in progress the operating regime was more of a semi-supervised one - which also in the long run could be a necessary and fruitful strategy. Conclusively the logistic regression appeared to be quite able to handle and infer from the dynamics of air temperatures - albeit non-robustly tested - being able to correctly classify 77% of the labelled data. This work was presented at Uppsala University in June 1st of 2018, and later in June 20th at SMHI.
APA, Harvard, Vancouver, ISO, and other styles
38

Sun, Le. "Data stream mining in medical sensor-cloud." Thesis, 2016. https://vuir.vu.edu.au/31032/.

Full text
Abstract:
Data stream mining has been studied in diverse application domains. In recent years, a population aging is stressing the national and international health care systems. Along with the advent of hundreds and thousands of health monitoring sensors, the traditional wireless sensor networks and anomaly detection techniques cannot handle huge amounts of information. Sensor-cloud makes the processing and storage of big sensor data much easier. Sensor-cloud is an extension of Cloud by connecting the Wireless Sensor Networks (WSNs) and the cloud through sensor and cloud gateways, which consistently collect and process a large amount of data from various sensors located in different areas. In this thesis, I will focus on analysing a large volume of medical sensor data streams collected from Sensor-cloud. To analyse the Medical data streams, I propose a medical data stream mining framework, which is targeted on tackling four main challenges ...
APA, Harvard, Vancouver, ISO, and other styles
39

"Fast frequent pattern mining." 2003. http://library.cuhk.edu.hk/record=b5891575.

Full text
Abstract:
Yabo Xu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 57-60).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Frequent Pattern Mining --- p.1
Chapter 1.2 --- Biosequence Pattern Mining --- p.2
Chapter 1.3 --- Organization of the Thesis --- p.4
Chapter 2 --- PP-Mine: Fast Mining Frequent Patterns In-Memory --- p.5
Chapter 2.1 --- Background --- p.5
Chapter 2.2 --- The Overview --- p.6
Chapter 2.3 --- PP-tree Representations and Its Construction --- p.7
Chapter 2.4 --- PP-Mine --- p.8
Chapter 2.5 --- Discussions --- p.14
Chapter 2.6 --- Performance Study --- p.15
Chapter 3 --- Fast Biosequence Patterns Mining --- p.20
Chapter 3.1 --- Background --- p.21
Chapter 3.1.1 --- Differences in Biosequences --- p.21
Chapter 3.1.2 --- Mining Sequential Patterns --- p.22
Chapter 3.1.3 --- Mining Long Patterns --- p.23
Chapter 3.1.4 --- Related Works in Bioinformatics --- p.23
Chapter 3.2 --- The Overview --- p.24
Chapter 3.2.1 --- The Problem --- p.24
Chapter 3.2.2 --- The Overview of Our Approach --- p.25
Chapter 3.3 --- The Segment Phase --- p.26
Chapter 3.3.1 --- Finding Frequent Segments --- p.26
Chapter 3.3.2 --- The Index-based Querying --- p.27
Chapter 3.3.3 --- The Compression-based Querying --- p.30
Chapter 3.4 --- The Pattern Phase --- p.32
Chapter 3.4.1 --- The Pruning Strategies --- p.34
Chapter 3.4.2 --- The Querying Strategies --- p.37
Chapter 3.5 --- Experiment --- p.40
Chapter 3.5.1 --- Synthetic Data Sets --- p.40
Chapter 3.5.2 --- Biological Data Sets --- p.46
Chapter 4 --- Conclusion --- p.55
Bibliography --- p.60
APA, Harvard, Vancouver, ISO, and other styles
40

Shie, Chang-Luen, and 謝昌倫. "Pattern Recognition of Wafer Bin Maps with Data Mining Techniques." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/48438397888618575147.

Full text
Abstract:
碩士
淡江大學
統計學系
91
The aim of this paper is to explore Data Mining techniques for classification of the patterns of Wafer Bin Maps. Six classification methods were discussed in the paper, including two Neural Network classification techniques, three Decision Tree methods, and one statistical classification. To compare the capability of these classification methods, Random samples of Wafer Bin Maps were generated from seven different patterns with various levels of random noises, in order to classify these WBMs and to compute the correct-classification rates of these methods. Our simulation shows that Closest Class Mean Classifier method (CCMC) is most suitable for classification of Wafer Bin Maps patterns.
APA, Harvard, Vancouver, ISO, and other styles
41

"Approach for mining multiple dependence structure with pattern recognition applications." 2003. http://library.cuhk.edu.hk/record=b6073568.

Full text
Abstract:
by Liu Zhiyong.
"June 2003."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (p. 125-136).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
APA, Harvard, Vancouver, ISO, and other styles
42

Bean, Kathryn Brenda. "Supervised and unsupervised machine learning for pattern recognition and time series prediction /." 2008. http://proquest.umi.com/pqdweb?did=1654492021&sid=3&Fmt=2&clientId=10361&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Lin, Ying-Tsu, and 林英足. "Pattern Recognition of Wafer Bin Maps with Data Mining Techniques and Machine Vision Methods." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/86525883931360540288.

Full text
Abstract:
碩士
淡江大學
統計學系碩士班
94
The human view-based methods are traditionally used in semi-conductor industry to trace production errors with the disadvantages such as time-wasting and subjectiveness. To enhance the accuracy of detection and the product rate, machine vision methods and Data Mining Techniques are applied in this study to develop a wafer-map analysis system. A two-phase method is adopted in our study. During the first phase, the ability of identification of the erroneous judgment by Support Vector Machine based method will be discussed. In the second phase, neural networks models and decision tree methos are adopted. Random samples of one-dimension and two-dimension wafer bin maps were generated from sixteen patterns with various levels of random noises to compare identification accuracy. Our study shows that the adoption of Support vector machine analysis increases the accuracy of identification. In the second phase, we find that mulit-layer perceptron neural network models functions best. Also, when the wafer data is converted to spatial data representation, both Neural Networks Model and Decision Tree Analysis Model increase the accuracy of identification.
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Yi-Rong, and 陳奕戎. "Pattern Recognition of Wafer Bin Maps with Data Mining Techniques and Spatial Statistic Methods." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/11626038245621905471.

Full text
Abstract:
碩士
淡江大學
統計學系
92
The aim of this paper is to explore data mining techniques for classification of the patterns of wafer bin maps. Several classification methods are discussed in the paper, including two neural network techniques, two decision tree methods, one statistical classification, spatial statistic, and discriminant Analysis. In addition, we also discuss spatial representation of wafer bin-map data. To compare the capability of these classification methods, random samples of wafer bin maps were generated from sixteen different patterns with various levels of random noises, in order to classify these WBMs and to compute the correct-classification rates of these methods. Our simulation shows that Closest Class Mean Classifier method (CCMC) is most suitable for classification of wafer bin maps patterns. Besides, with spatial data representation, the rates of the correct classification increase in decision tree and neural network methods.
APA, Harvard, Vancouver, ISO, and other styles
45

(6639122), Jihwan Lee. "Exploring Node Attributes for Data Mining in Attributed Graphs." Thesis, 2019.

Find full text
Abstract:
Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.

In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.
APA, Harvard, Vancouver, ISO, and other styles
46

Chen, Wei-Ju, and 陳薇如. "Automatic Similarity Matching of Defect Patterns in Wafer Bin Map using Data Mining and Pattern Recognition Approach." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/y2pv6y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Hsu, Hsiu-Wen, and 許琇雯. "Real-time Pattern Recognition of Control Charts Patterns in Autocorrelated Process by a Data Mining Based Approach." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/mnrg6u.

Full text
Abstract:
碩士
國立虎尾科技大學
工業工程與管理研究所
97
Statistical process control (SPC) is an important method for control process in industry. It can detect assignable cause during the process control which may occur and provide help to improve process and reduce unnecessary product cost. Hence, control chart is an important tool at statistical process control. Control charts can detect abnormal status during the process control which may occur at any time. Essentially, the judgement of the process states can be seen as a classification problem in artificial intelligence. Effectively recognizing control chart patterns (CCPs) is a critical issue in statistical process control, since unnatural CCPs indicate potential quality problems at an early stage, to avoid defects before they are produced. Recently, decision tree (DT) is generally used in classification pattern, and a lot of researches point out that DT have excellent performances. This study examines the feasibility of utilizing a data mining technique DT learning in on-line CCP recognition for process with various levels of autocorrelation. An empirical comparison using simulation indicates that the fast learning of the DT model gives the SPC user the potential for building an automated CCP recognition system that can not only be applied on-line but also be trained in real time. This feature could make the CCP recognition system more adaptable to a dynamic manufacturing scenario.
APA, Harvard, Vancouver, ISO, and other styles
48

(10710258), Tianshuai Guan. "MACHINE LEARNING BASED IDS LOG ANALYSIS." Thesis, 2021.

Find full text
Abstract:

With the rapid development of information technology, network traffic is also increasing dramatically. However, many cyber-attack records are buried in this large amount of network trafficking. Therefore, many Intrusion Detection Systems (IDS) that can extract those malicious activities have been developed. Zeek is one of them, and due to its powerful functions and open-source environment, Zeek has been adapted by many organizations. Information Technology at Purdue (ITaP), which uses Zeek as their IDS, captures netflow logs for all the network activities in the whole campus area but has not delved into effective use of the information. This thesis examines ways to help increase the performance of anomaly detection. As a result, this project intends to combine basic database concepts with several different machine learning algorithms and compare the result from different combinations to better find potential attack activities in log files.

APA, Harvard, Vancouver, ISO, and other styles
49

Haghtalab, Siavash. "An Unsupervised Consensus Control Chart Pattern Recognition Framework." Master's thesis, 2014. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/6101.

Full text
Abstract:
Early identification and detection of abnormal time series patterns is vital for a number of manufacturing. Slide shifts and alterations of time series patterns might be indicative of some anomaly in the production process, such as machinery malfunction. Usually due to the continuous flow of data monitoring of manufacturing processes requires automated Control Chart Pattern Recognition(CCPR) algorithms. The majority of CCPR literature consists of supervised classification algorithms. Less studies consider unsupervised versions of the problem. Despite the profound advantage of unsupervised methodology for less manual data labeling their use is limited due to the fact that their performance is not robust enough for practical purposes. In this study we propose the use of a consensus clustering framework. Computational results show robust behavior compared to individual clustering algorithms.
M.S.
Masters
Industrial Engineering and Management Systems
Engineering and Computer Science
Industrial Engineering; Systems Engineering Track
APA, Harvard, Vancouver, ISO, and other styles
50

Saad, A., E. Avineri, Keshav P. Dahal, M. Sarfraz, and R. Roy. "Soft Computing in Industrial Applications." 2007. http://hdl.handle.net/10454/2290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography