Dissertations / Theses on the topic 'Concept drift'

To see the other types of publications on this topic, follow the link: Concept drift.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Concept drift.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Beyene, Ayne, and Tewelle Welemariam. "Concept Drift in Surgery Prediction." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2330.

Full text
Abstract:
Context: In healthcare, the decision of patient referral evolves through time because of changes in scientific developments, and clinical practices. Existing decision support systems of patient referral are based on the expert systems approach. This usually requires manual updates when changes in clinical practices occur. Automatically updating the decision support system by identifying and handling so-called concept drift improves the efficiency of healthcare systems. In the stateof-the- art, there are only specific ways of handling concept drift; developing a more generic technique which works regardless of restrictions on how slow, fast, sudden, gradual, local, global, cyclical, noisy or otherwise changes in internal distribution, is still a challenge. Objectives: An algorithm that handles concept drift in surgery prediction is investigated. Concept drift detection techniques are evaluated to find out a suitable detection technique in the context of surgery prediction. Moreover, a plausible combination of detection and handling algorithms including the proposed algorithm, Trigger Based Ensemble (TBE)+, are evaluated on hospital data. Method: Experiments are conducted to investigates the impact of concept drift on prediction performance and to reduce concept drift impact. The experiments compare three existing methods (AWE, Active Classifier, Learn++) and the proposed algorithm, Trigger Based Ensemble(TBE). Real-world dataset from orthopedics department of Belkinge hospital and other domain dataset are used in the experiment. Results: The negative impact of concept drift in surgery prediction is investigated. The relationship between temporal changes in data distribution and surgery prediction concept drift is identified. Furthermore, the proposed algorithm is evaluated and compared with existing handling approaches. Conclusion: The proposed algorithm, Trigger Based Ensemble (TBE), is capable of detecting the occurrences of concept drifts and to adapt quickly to various changes. The Trigger Based Ensemble algorithm performed comparatively better or sometimes similar to the existing concept drift handling algorithms in the absence of noise. Moreover, the performance of Trigger Based Ensemble is consistent for small and large dataset. The research is of twofold contributions, in that it is improving surgery prediction performance as well as contributing one competitive concept drift handling algorithm to the area of computer science.
APA, Harvard, Vancouver, ISO, and other styles
2

Hoffmann, Nico, Matthias Kirmse, and Uwe Petersohn. "Approaching Concept Drift by Context Feature Partitioning." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-83954.

Full text
Abstract:
In this paper we present a new approach to handle concept drift using domain-specific knowledge. More precisely, we capitalize known context features to partition a domain into subdomains featuring static class distributions. Subsequently, we learn separate classifiers for each sub domain and classify new instances accordingly. To determine the optimal partitioning for a domain we apply a search algorithm aiming to maximize the resulting accuracy. In practical domains like fault detection concept drift often occurs in combination with imbalances data. As this issue gets more important learning models on smaller subdomains we additionally use sampling methods to handle it. Comparative experiments with artificial data sets showed that our approach outperforms a plain SVM regarding different performance measures. Summarized, the partitioning concept drift approach (PCD) is a possible way to handle concept drift in domains where the causing context features are at least partly known.
APA, Harvard, Vancouver, ISO, and other styles
3

Garnett, Roman. "Learning from data streams with concept drift." Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.711615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Marrs, Gary Russell. "Handling latency for online learning with concept drift." Thesis, University of Ulster, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.587478.

Full text
Abstract:
We live in a world of ever-increasing amounts of data. There is a need to devise better and increasingly automated systems for analyzing and utilising such data, from online data streams, for the purposes of classification and prediction. Across many domains such as banking, financial markets, network management and even in biomedical monitoring of pathogen sensitivity to drugs, the competitive edge is gained by those who act on their data fastest, most accurately and keep up to date with any changes occurring in their domain. This has led to the rise of research into online learners. These automated systems serve to train themselves on received data and discover rules for use in classification and prediction. They serve to keep those rules up to date as concept drift, i.e. changing of the underlying rules, occurs. However, to date, there has been little undertaken into research as to how latency in the data stream impacts upon such learning. This thesis examines the hypothesis that latency can have a substantial impact upon the performance of online learners operating on domains with concept drift, and, that key meta-data attributes describing example passage throughout the domain may help to resolve such issues. The thesis explores what it means to be a domain by developing a generic model. The assumptions that are applied in current research upon the nature of example arrival are considered and challenged. A framework, ELISE, for simulating various latency conditions for the purposes of experimenting with meta-data attributes relating to temporal events in the example life-cycle is developed. From this several online learner algorithmic and procedural approaches are tested as a potential solution to handling latency; based upon not just isolated examples but comprehension of the temporal nature of a data stream. Finally, future work is suggested for further improvements.
APA, Harvard, Vancouver, ISO, and other styles
5

AlShammeri, Mohammed. "Dynamic Committees for Handling Concept Drift in Databases (DCCD)." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23498.

Full text
Abstract:
Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset.
APA, Harvard, Vancouver, ISO, and other styles
6

Minku, Leandro Lei. "Online ensemble learning in the presence of concept drift." Thesis, University of Birmingham, 2011. http://etheses.bham.ac.uk//id/eprint/1334/.

Full text
Abstract:
In online learning, each training example is processed separately and then discarded. Environments that require online learning are often non-stationary and their underlying distributions may change over time (concept drift). Even though ensembles of learning machines have been used for handling concept drift, there has been no deep study of why they can be helpful for dealing with drifts and which of their features can contribute for that. The thesis mainly investigates how ensemble diversity affects accuracy in online learning in the presence of concept drift and how to use diversity in order to improve accuracy in changing environments. This is the first diversity study in the presence of concept drift. The main contributions of the thesis are: - An analysis of negative correlation in online learning. - A new concept drift categorisation to allow principled studies of drifts. - A better understanding of when, how and why ensembles of learning machines can help to handle concept drift in online learning. - Knowledge of how to use information learnt from the old concept to aid the learning of the new concept. - A new approach called Diversity for Dealing with Drifts (DDD), which is accurate both in the presence and absence of drifts.
APA, Harvard, Vancouver, ISO, and other styles
7

Widyantoro, Dwi Hendratmo. "Concept drift learning and its application to adaptive information filtering." Diss., Texas A&M University, 2003. http://hdl.handle.net/1969.1/170.

Full text
Abstract:
Tracking the evolution of user interests is a problem instance of concept drift learning. Keeping track of multiple interest categories is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. The first part of this dissertation presents a Multiple Three-Descriptor Representation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain. The learning process of the algorithm combines the long-term and short-term interest (concept) models in an attempt to benefit from the strength of both models. The MTDR algorithm improves over existing concept drift learning algorithms in the domain. Being able to track multiple target concepts with a few examples poses an even more important and challenging problem because casual users tend to be reluctant to provide the examples needed, and learning from a few labeled data is generally difficult. The second part presents a computational Framework for Extending Incomplete Labeled Data Stream (FEILDS). The system modularly extends the capability of an existing concept drift learner in dealing with incomplete labeled data stream. It expands the learner's original input stream with relevant unlabeled data; the process generates a new stream with improved learnability. FEILDS employs a concept formation system for organizing its input stream into a concept (cluster) hierarchy. The system uses the concept and cluster hierarchy to identify the instance's concept and unlabeled data relevant to a concept. It also adopts the persistence assumption in temporal reasoning for inferring the relevance of concepts. Empirical evaluation indicates that FEILDS is able to improve the performance of existing learners particularly when learning from a stream with a few labeled data. Lastly, a new concept formation algorithm, one of the key components in the FEILDS architecture, is presented. The main idea is to discover intrinsic hierarchical structures regardless of the class distribution and the shape of the input stream. Experimental evaluation shows that the algorithm is relatively robust to input ordering, consistently producing a hierarchy structure of high quality.
APA, Harvard, Vancouver, ISO, and other styles
8

ESCOVEDO, TATIANA. "NEUROEVOLUTIVE LEARNING AND CONCEPT DRIFT DETECTION IN NON-STATIONARY ENVIRONMENTS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2015. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=26748@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Os conceitos do mundo real muitas vezes não são estáveis: eles mudam com o tempo. Assim como os conceitos, a distribuição de dados também pode se alterar. Este problema de mudança de conceitos ou distribuição de dados é conhecido como concept drift e é um desafio para um modelo na tarefa de aprender a partir de dados. Este trabalho apresenta um novo modelo neuroevolutivo com inspiração quântica, baseado em um comitê de redes neurais do tipo Multi-Layer Perceptron (MLP), para a aprendizagem em ambientes não estacionários, denominado NEVE (Neuro-EVolutionary Ensemble). Também apresenta um novo mecanismo de detecção de concept drift, denominado DetectA (Detect Abrupt) com a capacidade de detectar mudanças tanto de forma proativa quanto de forma reativa. O algoritmo evolutivo com inspiração quântica binário-real AEIQ-BR é utilizado no NEVE para gerar automaticamente novos classificadores para o comitê, determinando a topologia mais adequada para a nova rede, selecionando as variáveis de entrada mais apropriadas e determinando todos os pesos da rede neural MLP. O algoritmo AEIQ-R determina os pesos de votação de cada rede neural membro do comitê, sendo possível utilizar votação por combinação linear, votação majoritária ponderada e simples. São implementadas quatro diferentes abordagens do NEVE, que se diferem uma da outra pela forma de detectar e tratar os drifts ocorridos. O trabalho também apresenta resultados de experimentos realizados com o método DetectA e com o modelo NEVE em bases de dados reais e artificiais. Os resultados mostram que o detector se mostrou robusto e eficiente para bases de dados de alta dimensionalidade, blocos de tamanho intermediário, bases de dados com qualquer proporção de drift e com qualquer balanceamento de classes e que, em geral, os melhores resultados obtidos foram usando algum tipo de detecção. Comparando a acurácia do NEVE com outros modelos consolidados da literatura, verifica-se que o NEVE teve acurácia superior na maioria dos casos. Isto reforça que a abordagem por comitê neuroevolutivo é uma escolha robusta para situações em que as bases de dados estão sujeitas a mudanças repentinas de comportamento.
Real world concepts are often not stable: they change with time. Just as the concepts, data distribution may change as well. This problem of change in concepts or distribution of data is known as concept drift and is a challenge for a model in the task of learning from data. This work presents a new neuroevolutive model with quantum inspiration called NEVE (Neuro- EVolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in non-stationary environments. It also presents a new concept drift detection mechanism, called DetectA (DETECT Abrupt) with the ability to detect changes both proactively as reactively. The evolutionary algorithm with binary-real quantum inspiration AEIQ-BR is used in NEVE to automatically generate new classifiers for the ensemble, determining the most appropriate topology for the new network and by selecting the most appropriate input variables and determining all the weights of the neural network. The AEIQ-R algorithm determines the voting weight of each neural network ensemble member, and you can use voting by linear combination and voting by weighted or simple majority. Four different approaches of NEVE are implemented and they differ from one another by the way of detecting and treating occurring drifts. The work also presents results of experiments conducted with the DetectA method and with the NEVE model in real and artificial databases. The results show that the detector has proved efficient and suitable for data bases with high-dimensionality, intermediate sized blocks, any proportion of drifts and with any class balancing. Comparing the accuracy of NEVE with other consolidated models in the literature, it appears that NEVE had higher accuracy in most cases. This reinforces that the neuroevolution ensemble approach is a robust choice to situations in which the databases are subject to sudden changes in behavior.
APA, Harvard, Vancouver, ISO, and other styles
9

Barakat, Lida. "A context-aware approach for handling concept drift in classification." Thesis, Lancaster University, 2018. http://eprints.lancs.ac.uk/124995/.

Full text
Abstract:
Adapting classification models to changes is one of the main challenges associated with learning from data in dynamic environments. In particular, the description of the target concept is not static and may change over time under the influence of varying environmental conditions (i.e. varying context). Although many adaptive learning approaches have been proposed in the literature to address such changes, these are limited in terms of the extent to which the contextual aspects are explicitly identified and utilised. Instead, existing approaches mostly rely on monitoring the effects of drift (in terms of the degradation of the classifier’s performance). Given this, to achieve more effective concept drift management, we propose incorporating context awareness when adapting the classification model to changes. Explicit identification and monitoring of the contextual aspects enable capturing the causes of drift, and hence facilitating more proactive adaptation. In particular, we propose an information-theoretic-based approach for systematic context identification, aiming to learn from data the contextual characteristics of the domain of interest by identifying the context variables contributing to concept changes. Such characteristics are then utilised as important clues guiding the adaptation process of the classification model. Specifically, knowledge of contextual variables are exploited to select the most relevant data for retraining the model via a data weighting model, and to signal the need for data re-selection via a change detection model. The experimental analyses on simulated, benchmark, and real-world datasets, show that such explicit identification and utilisation of contextual information result in a more effective data selection and drift detection strategies, and enable to produce more accurate predictions.
APA, Harvard, Vancouver, ISO, and other styles
10

RAMAMURTHY, SASTHAKUMAR. "TRACKING RECURRENT CONCEPT DRIFT IN STREAMING DATA USING ENSEMBLE CLASSIFIERS." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196103577.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ostovar, Alireza. "Business process drift: Detection and characterization." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/127157/1/Alireza_Ostovar_Thesis.pdf.

Full text
Abstract:
This research contributes a set of techniques for the early detection and characterization of process drifts, i.e. statistically significant changes in the behavior of business operations, as recorded in transactional data. Early detection and subsequent characterization of process drifts allows organizations to take prompt remedial actions and avoid potential repercussions resulting from unplanned changes in the behavior of their operations.
APA, Harvard, Vancouver, ISO, and other styles
12

Almeida, Paulo Ricardo Lisboa de. "Adapting the dynamic selection of classifiers approach for concept drift scenarios." reponame:Repositório Institucional da UFPR, 2017. http://hdl.handle.net/1884/52771.

Full text
Abstract:
Orientador : Luiz Eduardo S. de Oliveira
Coorientadores : Alceu de Souza Britto Jr. ; Robert Sabourin
Tese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 09/11/2017
Inclui referências : f. 143-154
Resumo: Muitos ambientes podem sofrer com mudanças nas distribuições ou nas probabilidades a posteriori com o decorrer do tempo, em um problema conhecido como Concept Drift. Nesses cenários, é imperativa a implementação de algum mecanismo para adaptar o sistema de classificação às mudanças no ambiente a fim de minimizar o impacto na acurácia. Em um ambiente estático, é comum a utilização da Seleção Dinâmica de Classificadores (Dynamic Classifier Selection - DCS) para selecionar classificadores/ensembles customizados para cada uma das instâncias de teste de acordo com sua vizinhança em um conjunto de validação, onde a seleção pode ser vista como sendo dependente da região. Neste trabalho, a fim de tratar concept drifts, o conceito geral dos métodos de Seleção Dinâmica de Classificadores é estendido a fim de se tornar não somente dependente de região, mas também dependente do tempo. Através da adição da dependência do tempo, é demonstrado que a maioria dos métodos de Seleção Dinâmica de Classificadores podem ser adaptados para cenários contendo concept drifts, beneficiando-se da dependência de região, já que classificadores treinados em conceitos passados podem, em princípio, se manter competentes no conceito corrente em algumas regiões do espaço de características que não sofreram com mudanças. Neste trabalho a dependência de tempo para os métodos de Seleção Dinâmica é definida de acordo com o tipo de concept drift sendo tratado, que pode afetar apenas a distribuição no espaço de características ou as probabilidades a posteriori. Considerando as adaptações necessárias, o framework Dynse é proposto como uma ferramenta modular capaz de adaptar a Seleção Dinâmica de Classificadores para cenários contendo concept drits. Além disso, uma configuração padrão para o framework é proposta e um protocolo experimental, contendo sete Métodos de Seleção Dinâmica e doze problemas envolvendo concept drifts com diferentes propriedades, mostra que a Seleção Dinâmica de Classificadores pode ser adaptada para diferentes cenários contendo concept drifts. Quando comparado ao estado da arte, o framework Dynse, através da Seleção Dinâmica de Classificadores, se sobressai principalmente em termos de estabilidade. Ou seja, o método apresenta uma boa performance na maioria dos cenários, e requer quase nenhum ajuste de parâmetros. Key-words: Reconhecimento de Padrões. Concept Drift. Concept Drift Virtual. Concept Drift Real. Conjunto de Classificadores. Seleção Dinâmica de Classificadores. Acurácia Local.
Abstract: Many environments may suffer from distributions or a posteriori probabilities changes over time, leading to a phenomenon known as concept drift. In these scenarios, it is crucial to implement a mechanism to adapt the classification system to the environment changes in order to minimize any accuracy loss. Under a static environment, a popular approach consists in using a Dynamic Classifier Selection (DCS)-based method to select a custom classifier/ensemble for each test instance according to its neighborhood in a validation set, where the selection can be considered region-dependent. In order to handle concept drifts, in this work the general idea of the DCS method is extended to be also time-dependent. Through this time-dependency, it is demonstrated that most neighborhood DCS-based methods can be adapted to handle concept drift scenarios and take advantage of the region-dependency, since classifiers trained under previous concepts may still be competent in some regions of the feature space. The time-dependency for the DCS methods is defined according to the concept drift nature, which may define if the changes affects the a posteriori probabilities or the distributions only. By taking the necessary modifications, the Dynse framework is proposed in this work as a modular tool capable of adapting the DCS approach to concept drift scenarios. A default configuration for the Dynse framework is proposed and an experimental protocol, containing seven well-known DCS methods and 12 concept drift problems with different properties, shows that the DCS approach can adapt to different concept drift scenarios. When compared to state-of-the-art concept drift methods, the DCS-based approach comes out ahead in terms of stability, i.e., it performs well in most cases, and requires almost no parameter tuning. Key-words: Pattern Recognition. Concept Drift. Virtual Concept Drift. Real Concept Drift. Ensemble. Dynamic Classifier Selection. Local Accuracy.
APA, Harvard, Vancouver, ISO, and other styles
13

Alzogbi, Anas [Verfasser], and Georg [Akademischer Betreuer] Lausen. "Recommending scientific publications: addressing the one-class problem and concept drift." Freiburg : Universität, 2019. http://d-nb.info/1185391312/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Pinagé, Felipe Azevedo, and 92-98187-1016. "Handling Concept Drift Based on Data Similarity and Dynamic Classifier Selection." Universidade Federal do Amazonas, 2017. http://tede.ufam.edu.br/handle/tede/5956.

Full text
Abstract:
Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-10-16T18:53:44Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese - Felipe A. Pinagé.pdf: 1786179 bytes, checksum: 25c2a867ba549f75fe4adf778d3f3ad0 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-10-16T18:54:52Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese - Felipe A. Pinagé.pdf: 1786179 bytes, checksum: 25c2a867ba549f75fe4adf778d3f3ad0 (MD5)
Made available in DSpace on 2017-10-16T18:54:52Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese - Felipe A. Pinagé.pdf: 1786179 bytes, checksum: 25c2a867ba549f75fe4adf778d3f3ad0 (MD5) Previous issue date: 2017-07-28
FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas
In real-world applications, machine learning algorithms can be employed to perform spam detection, environmental monitoring, fraud detection, web click stream, among others. Most of these problems present an environment that changes over time due to the dynamic generation process of the data and/or due to streaming data. The problem involving classification tasks of continuous data streams has become one of the major challenges of the machine learning domain in the last decades because, since data is not known in advance, it must be learned as it becomes available. In addition, fast predictions about data should be performed to support often real time decisions. Currently in the literature, methods based on accuracy monitoring are commonly used to detect changes explicitly. However, these methods may become infeasible in some real-world applications especially due to two aspects: they may need human operator feedback, and may depend on a significant decrease of accuracy to be able to detect changes. In addition, most of these methods are also incremental learning-based, since they update the decision model for every incoming example. However, this may lead the system to unnecessary updates. In order to overcome these problems, in this thesis, two semi-supervised methods based on estimating and monitoring a pseudo error are proposed to detect changes explicitly. The decision model is updated only after changing detection. In the first method, the pseudo error is calculated using similarity measures by monitoring the dissimilarity between past and current data distributions. The second proposed method employs dynamic classifier selection in order to improve the pseudo error measurement. As a consequence, this second method allows classifier ensemble online self-training. The experiments conducted show that the proposed methods achieve competitive results, even when compared to fully supervised incremental learning methods. The achievement of these methods, especially the second method, is relevant since they lead change detection and reaction to be applicable in several practical problems reaching high accuracy rates, where usually is not possible to generate the true labels of the instances fully and immediately after classification.
Em aplicações do mundo real, algoritmos de aprendizagem de máquina podem ser usados para detecção de spam, monitoramento ambiental, detecção de fraude, fluxo de cliques na Web, dentre outros. A maioria desses problemas apresenta ambientes que sofrem mudanças com o passar do tempo, devido à natureza dinâmica de geração dos dados e/ou porque envolvem dados que ocorrem em fluxo. O problema envolvendo tarefas de classificação em fluxo contínuo de dados tem se tornado um dos maiores desafios na área de aprendizagem de máquina nas últimas décadas, pois, como os dados não são conhecidos de antemão, eles devem ser aprendidos à medida que são processados. Além disso, devem ser feitas previsões rápidas a respeito desses dados para dar suporte à decisões muitas vezes tomadas em tempo real. Atualmente, métodos baseados em monitoramento da acurácia de classificação são geralmente usados para detectar explicitamente mudanças nos dados. Entretanto, esses métodos podem tornar-se inviáveis em aplicações práticas, especialmente devido a dois aspectos: a necessidade de uma realimentação do sistema por um operador humano, e a dependência de uma queda significativa da acurácia para que mudanças sejam detectadas. Além disso, a maioria desses métodos é baseada em aprendizagem incremental, onde modelos de predição são atualizados para cada instância de entrada, fato que pode levar a atualizações desnecessárias do sistema. A fim de tentar superar todos esses problemas, nesta tese são propostos dois métodos semi-supervisionados de detecção explícita de mudanças em dados, os quais baseiam-se na estimação e monitoramento de uma métrica de pseudo-erro. O modelo de decisão é atualizado somente após a detecção de uma mudança. No primeiro método proposto, o pseudo-erro é monitorado a partir de métricas de similaridade calculadas entre a distribuição atual e distribuições anteriores dos dados. O segundo método proposto utiliza seleção dinâmica de classificadores para aumentar a precisão do cálculo do pseudo-erro. Como consequência, nosso método possibilita que conjuntos de classificadores online sejam criados a partir de auto-treinamento. Os experimentos apresentaram resultados competitivos quando comparados inclusive com métodos baseados em aprendizagem incremental totalmente supervisionada. A proposta desses dois métodos, especialmente do segundo, é relevante por permitir que tarefas de detecção e reação a mudanças sejam aplicáveis em diversos problemas práticos alcançando altas taxas de acurácia, dado que, na maioria dos problemas práticos, não é possível obter o rótulo de uma instância imediatamente após sua classificação feita pelo sistema.
APA, Harvard, Vancouver, ISO, and other styles
15

Conca, Piero. "An adaptive framework for classification of concept drift with limited supervision." Thesis, University of York, 2012. http://etheses.whiterose.ac.uk/5587/.

Full text
Abstract:
This thesis deals with the problem of classification of data affected by concept drift. In particular, it investigates the area of unsupervised model updating in which a classification model is updated without using information about the changing distributions of the classes. An adaptive framework that contains an ensemble of classifiers is developed. These can be mature or naive. In particular, only mature classifiers generate decisions, through majority voting, while naive classifiers are candidate to become mature. The first novelty of the proposed framework is a technique of feedback that combines concepts from ensemble-learning with concepts from self-training. In particular, naive classifiers are trained using unlabelled data and labels generated by mature classifiers over that data, by means of voting. This technique allows updates of the model of the framework in absence of supervision, namely, without using the true classes of the data. The second novelty is a technique that infers the presence of concept drift by measuring the similarity between the decisions of mature classifiers and the decisions of naive classifiers. When concept drift is inferred, a naive classifier is selected to become mature, and a mature classifier is deleted. A series of experiments are performed. They show that the framework can classify data with Gaussian distribution, and that this capability regards different classification techniques. The experiments also reveal that the framework cannot deal with the concept drift of a uniformly distributed dataset. Moreover, further experiments show that the inference of drift combines quick adapation with low false detections, thus leading to higher classification performance than comparative methods. However, this technique is not able to detect concept drift if the classes are separable.
APA, Harvard, Vancouver, ISO, and other styles
16

Roded, Keren. "The concept of drift and operationalization of its detection in simulated data." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/63135.

Full text
Abstract:
In this paper, the phenomenon of changes in item characteristics over time (often referred to as drift) is discussed from several theoretical perspectives, and a new procedure for the detection of Item Parameter Drift (IPD) is proposed. An initial evaluation of the utility of the proposed procedure is conducted using simulated data modeled by the 2-Parameter Logistic (2PL) Item Response Theory (IRT) model. In addition to the proposed procedure, an IPD analysis of the simulated data is conducted using two known methods: Kim, Cohen, and Park's (1995) extension of Lord's (1980) Chi-square test of Differential Item Functioning (DIF) to multiple groups, and logistic regression. The results indicate high agreement and accuracy in the detection of true IPD using the two known methods, but poor performance of the proposed procedure. Possible explanations of the findings and future directions are discussed.
Education, Faculty of
Educational and Counselling Psychology, and Special Education (ECPS), Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
17

SANTOS, Silas Garrido Teixeira de Carvalho. "Avaliação criteriosa dos algoritmos de detecção de concept drifts." Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/17310.

Full text
Abstract:
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-11T12:33:28Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5)
Made available in DSpace on 2016-07-11T12:33:28Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) Previous issue date: 2015-02-27
FACEPE
A extração de conhecimento em ambientes com fluxo contínuo de dados é uma atividade que vem crescendo progressivamente. Diversas são as situações que necessitam desse mecanismo, como o monitoramento do histórico de compras de clientes; a detecção de presença por meio de sensores; ou o monitoramento da temperatura da água. Desta maneira, os algoritmos utilizados para esse fim devem ser atualizados constantemente, buscando adaptar-se às novas instâncias e levando em consideração as restrições computacionais. Quando se trabalha em ambientes com fluxo contínuo de dados, em geral não é recomendável supor que sua distribuição permanecerá estacionária. Diversas mudanças podem ocorrer ao longo do tempo, desencadeando uma situação geralmente conhecida como mudança de conceito (concept drift). Neste trabalho foi realizado um estudo comparativo entre alguns dos principais métodos de detecção de mudanças: ADWIN, DDM, DOF, ECDD, EDDM, PL e STEPD. Para execução dos experimentos foram utilizadas bases artificiais – simulando mudanças abruptas, graduais rápidas, e graduais lentas – e também bases com problemas reais. Os resultados foram analisados baseando-se na precisão, tempo de execução, uso de memória, tempo médio de detecção das mudanças, e quantidade de falsos positivos e negativos. Já os parâmetros dos métodos foram definidos utilizando uma versão adaptada de um algoritmo genético. De acordo com os resultados do teste de Friedman juntamente com Nemenyi, em termos de precisão, DDM se mostrou o método mais eficiente com as bases utilizadas, sendo estatisticamente superior ao DOF e ECDD. Já EDDM foi o método mais rápido e também o mais econômico no uso da memória, sendo superior ao DOF, ECDD, PL e STEPD, em ambos os casos. Conclui-se então que métodos mais sensíveis às detecções de mudanças, e consequentemente mais propensos a alarmes falsos, obtêm melhores resultados quando comparados a métodos menos sensíveis e menos suscetíveis a alarmes falsos.
Knowledge extraction from data streams is an activity that has been progressively receiving an increased demand. Examples of such applications include monitoring purchase history of customers, movement data from sensors, or water temperatures. Thus, algorithms used for this purpose must be constantly updated, trying to adapt to new instances and taking into account computational constraints. When working in environments with a continuous flow of data, there is no guarantee that the distribution of the data will remain stationary. On the contrary, several changes may occur over time, triggering situations commonly known as concept drift. In this work we present a comparative study of some of the main drift detection methods: ADWIN, DDM, DOF, ECDD, EDDM, PL and STEPD. For the execution of the experiments, artificial datasets were used – simulating abrupt, fast gradual, and slow gradual changes – and also datasets with real problems. The results were analyzed based on the accuracy, runtime, memory usage, average time to change detection, and number of false positives and negatives. The parameters of methods were defined using an adapted version of a genetic algorithm. According to the Friedman test with Nemenyi results, in terms of accuracy, DDM was the most efficient method with the datasets used, and statistically superior to DOF and ECDD. EDDM was the fastest method and also the most economical in memory usage, being statistically superior to DOF, ECDD, PL and STEPD, in both cases. It was concluded that more sensitive change detection methods, and therefore more prone to false alarms, achieve better results when compared to less sensitive and less susceptible to false alarms methods.
APA, Harvard, Vancouver, ISO, and other styles
18

D'Ettorre, Sarah. "Fine-Grained, Unsupervised, Context-based Change Detection and Adaptation for Evolving Categorical Data." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/35518.

Full text
Abstract:
Concept drift detection, the identfication of changes in data distributions in streams, is critical to understanding the mechanics of data generating processes and ensuring that data models remain representative through time [2]. Many change detection methods utilize statistical techniques that take numerical data as input. However, many applications produce data streams containing categorical attributes. In this context, numerical statistical methods are unavailable, and different approaches are required. Common solutions use error monitoring, assuming that fluctuations in the error measures of a learning system correspond to concept drift [4]. There has been very little research, though, on context-based concept drift detection in categorical streams. This approach observes changes in the actual data distribution and is less popular due to the challenges associated with categorical data analysis. However, context-based change detection is arguably more informative as it is data-driven, and more widely applicable in that it can function in an unsupervised setting [4]. This study offers a contribution to this gap in the research by proposing a novel context-based change detection and adaptation algorithm for categorical data, namely Fine-Grained Change Detection in Categorical Data Streams (FG-CDCStream). This unsupervised method exploits elements of ensemble learning, a technique whereby decisions are made according to the majority vote of a set of models representing different random subspaces of the data [5]. These ideas are applied to a set of concept drift detector objects and merged with concepts from a recent, state-of-the-art, context-based change detection algorithm, the so-called Change Detection in Categorical Data Streams (CDCStream) [4]. FG-CDCStream is proposed as an extension of the batch-based CDCStream, providing instance-by-instance analysis and improving its change detection capabilities especially in data streams containing abrupt changes or a combination of abrupt and gradual changes. FG-CDCStream also enhances the adaptation strategy of CDCStream producing more representative post-change models.
APA, Harvard, Vancouver, ISO, and other styles
19

Henke, Márcia. "Deteção de Spam baseada na evolução das características com presença de Concept Drift." Universidade Federal do Amazonas, 2015. http://tede.ufam.edu.br/handle/tede/4708.

Full text
Abstract:
Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-11-12T20:17:58Z No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-11-16T18:36:36Z (GMT) No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-11-16T18:43:03Z (GMT) No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5)
Made available in DSpace on 2015-11-16T18:43:03Z (GMT). No. of bitstreams: 1 Tese - Márcia Henke.pdf: 2984974 bytes, checksum: a103355c1a7895956d40d4fa9422347a (MD5) Previous issue date: 2015-03-30
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Electronic messages (emails) are still considered the most significant tools in business and personal applications due to their low cost and easy access. However, e-mails have become a major problem owing to the high amount of junk mail, named spam, which fill the e-mail boxes of users. Among the many problems caused by spam messages, we may highlight the fact that it is currently the main vector for the spread of malicious activities such as viruses, worms, trojans, phishing, botnets, among others. Such activities allow the attacker to have illegal access to penetrating data, trade secrets or to invade the privacy of the sufferers to get some advantage. Several approaches have been proposed to prevent sending unsolicited e-mail messages, such as filters implemented in e-mail servers, spam message classification mechanisms for users to define when particular issue or author is a source of spread of spam and even filters implemented in network electronics. In general, e-mail filter approaches are based on analysis of message content to determine whether or not a message is spam. A major problem with this approach is spam detection in the presence of concept drift. The literature defines concept drift as changes occurring in the concept of data over time, as the change in the features that describe an attack or occurrence of new features. Numerous Intrusion Detection Systems (IDS) use machine learning techniques to monitor the classification error rate in order to detect change. However, when detection occurs, some damage has been caused to the system, a fact that requires updating the classification process and the system operator intervention. To overcome the problems mentioned above, this work proposes a new changing detection method, named Method oriented to the Analysis of the Development of Attacks Characteristics (MECA). The proposed method consists of three steps: 1) classification model training; 2) concept drift detection; and 3) transfer learning. The first step generates classification models as it is commonly conducted in machine learning. The second step introduces two new strategies to avoid concept drift: HFS (Historical-based Features Selection) that analyzes the evolution of the features based on over time historical; and SFS (Similarity-based Features Selection) that analyzes the evolution of the features from the level of similarity obtained between the features vectors of the source and target domains. Finally, the third step focuses on the following questions: what, how and when to transfer acquired knowledge. The answer to the first question is provided by the concept drift detection strategies that identify the new features and store them to be transferred. To answer the second question, the feature representation transfer approach is employed. Finally, the transfer of new knowledge is executed as soon as changes that compromise the classification task performance are identified. The proposed method was developed and validated using two public databases, being one of the datasets built along this thesis. The results of the experiments shown that it is possible to infer a threshold to detect changes in order to ensure the classification model is updated through knowledge transfer. In addition, MECA architecture is able to perform the classification task, as well as the concept drift detection, as two parallel and independent tasks. Finally, MECA uses SVM machine learning algorithm (Support Vector Machines), which is less adherent to the training samples. The results obtained with MECA showed that it is possible to detect changes through feature evolution monitoring before a significant degradation in classification models is achieved.
As mensagens eletrônicas (e-mails) ainda são consideradas as ferramentas de maior prestígio no meio empresarial e pessoal, pois apresentam baixo custo e facilidade de acesso. Por outro lado, os e-mails tornaram-se um grande problema devido à elevada quantidade de mensagens não desejadas, denominadas spam, que lotam as caixas de emails dos usuários. Dentre os diversos problemas causados pelas mensagens spam, destaca-se o fato de ser atualmente o principal vetor de propagação de atividades maliciosas como vírus, worms, cavalos de Tróia, phishing, botnets, dentre outros. Tais atividades permitem ao atacante acesso indevido a dados sigilosos, segredos de negócios ou mesmo invadir a privacidade das vítimas para obter alguma vantagem. Diversas abordagens, comerciais e acadêmicas, têm sido propostas para impedir o envio de mensagens de e-mails indesejados como filtros implementados nos servidores de e-mail, mecanismos de classificação de mensagens de spam para que os usuários definam quando determinado assunto ou autor é fonte de propagação de spam e até mesmo filtros implementados em componentes eletrônicos de rede. Em geral, as abordagens de filtros de e-mail são baseadas na análise do conteúdo das mensagens para determinar se tal mensagem é ou não um spam. Um dos maiores problemas com essa abordagem é a deteção de spam na presença de concept drift. A literatura conceitua concept drift como mudanças que ocorrem no conceito dos dados ao longo do tempo como a alteração das características que descrevem um ataque ou ocorrência de novas características. Muitos Sistemas de Deteção de Intrusão (IDS) usam técnicas de aprendizagem de máquina para monitorar a taxa de erro de classificação no intuito de detetar mudança. Entretanto, quando a deteção ocorre, algum dano já foi causado ao sistema, fato que requer atualização do processo de classificação e a intervenção do operador do sistema. Com o objetivo de minimizar os problemas mencionados acima, esta tese propõe um método de deteção de mudança, denominado Método orientado à Análise da Evolução das Características de Ataques (MECA). O método proposto é composto por três etapas: 1) treino do modelo de classificação; 2) deteção de mudança; e 3) transferência do aprendizado. A primeira etapa emprega modelos de classificação comumente adotados em qualquer método que utiliza aprendizagem de máquina. A segunda etapa apresenta duas novas estratégias para contornar concept drift: HFS (Historical-based Features Selection) que analisa a evolução das características com base no histórico ao longo do tempo; e SFS (Similarity based Features Selection) que observa a evolução das características a partir do nível de similaridade obtido entre os vetores de características dos domínios fonte e alvo. Por fim, a terceira etapa concentra seu objetivo nas seguintes questões: o que, como e quando transferir conhecimento adquirido. A resposta à primeira questão é fornecida pelas estratégias de deteção de mudança, que identificam as novas características e as armazenam para que sejam transferidas. Para responder a segunda questão, a abordagem de transferência de representação de características é adotada. Finalmente, a transferência do novo conhecimento é realizada tão logo mudanças que comprometam o desempenho da tarefa de classificação sejam identificadas. O método MECA foi desenvolvido e validado usando duas bases de dados públicas, sendo que uma das bases foi construída ao longo desta tese. Os resultados dos experimentos indicaram que é possível inferir um limiar para detetar mudanças a fim de garantir o modelo de classificação sempre atualizado por meio da transferência de conhecimento. Além disso, um diferencial apresentado no método MECA é a possibilidade de executar a tarefa de classificação em paralelo com a deteção de mudança, sendo as duas tarefas independentes. Por fim, o MECA utiliza o algoritmo de aprendizagem de máquina SVM (Support Vector Machines), que é menos aderente às amostras de treinamento. Os resultados obtidos com o MECA mostraram que é possível detetar mudanças por meio da evolução das características antes de ocorrer uma degradação significativa no modelo de classificação utilizado.
APA, Harvard, Vancouver, ISO, and other styles
20

Black, Michaela. "Learning to classify from temporal data in the presence of concept drift and noise." Thesis, University of Ulster, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232851.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

CAVALCANTE, Rodolfo Carneiro. "An adaptive learning system for time series forecasting in the presence of concept drift." Universidade Federal de Pernambuco, 2017. https://repositorio.ufpe.br/handle/123456789/25349.

Full text
Abstract:
Submitted by Pedro Barros (pedro.silvabarros@ufpe.br) on 2018-08-01T20:38:56Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Rodolfo Carneiro Cavalcante.pdf: 4472525 bytes, checksum: b8913f87ac611abb2701ce3e4918cbcb (MD5)
Approved for entry into archive by Alice Araujo (alice.caraujo@ufpe.br) on 2018-08-02T20:05:14Z (GMT) No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Rodolfo Carneiro Cavalcante.pdf: 4472525 bytes, checksum: b8913f87ac611abb2701ce3e4918cbcb (MD5)
Made available in DSpace on 2018-08-02T20:05:14Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) TESE Rodolfo Carneiro Cavalcante.pdf: 4472525 bytes, checksum: b8913f87ac611abb2701ce3e4918cbcb (MD5) Previous issue date: 2017-03-13
FACEPE
A time series is a collection of observations measured sequentially in time. Several realworld dynamic processes can be modeled as time series. One of the main problems of time series analysis is the forecasting of future values. As a special kind of data stream, a time series may present concept drifts, which are changes in the underlying data generation process from time to time. The concept drift phenomenon affects negatively the forecasting methods which are based on observing past behaviors of the time series to forecast future values. Despite the fact that concept drift is not a new research area, the effects of concept drifts in time series are not widely studied. Some approaches proposed in the literature to handle concept drift in time series are passive methods that successive update the learned model to the observations that arrive from the data stream. These methods present no transparency to the user and present a potential waste of computational resources. Other approaches are active methods that implement a detect-and-adapt scheme, in which the learned model is adapted just after the explicit detection of a concept drift. By using explicit detection, the learned model is updated or retrained just in the presence of drifts, which can reduce the space and computational complexity of the learning system. These methods are generally based on monitoring the residuals of a fitted model or on monitoring the raw time series observations directly. However, these two sources of information (residuals and raw observations) may not be so reliable for a concept drift detection method applied to time series. Residuals of a fitted model may be influenced by problems in training. Raw observations may present some variations that do not represent significant changes in the time series data stream. The main contribution of this work is an active adaptive learning system which is able to handle concept drift in time series. The proposed method, called Feature Extraction and Weighting for Explicit Concept Drift Detection (FW-FEDD) considers a set of time series features to detect concept drifts in time series in a more reliable way, being trustworthy and transparent to users. The features considered are weighted according to their importance to define concept drifts at each instant. A concept drift test is then used to detect drifts in a more reliable way. FW-FEDD also implements a forecasting module composed by a pool of forecasting models in which each model is specialized in a different time series concept. Several computational experiments on both artificial and real-world time series showed that the proposed method is able to improve the concept drift detection accuracy compared to methods based on monitoring raw time series observations and residual-based methods. Results also showed the superiority of FW-FEDD compared to other passive and active adaptive learning systems in terms of forecasting performance.
Uma série temporal é uma coleção de observações medidas sequencialmente no tempo. Diversos processos dinâmicos reais podem ser modelados como uma série temporal. Um dos principais problemas no contexto de séries temporais é a previsão de valores futuros. Sendo um tipo especial de fluxo de dados, uma série temporal pode apresentar mudança de conceito, que é a mudança no processo gerador dos dados. O fenômeno da mudança de conceito afeta negativamente os métodos de previsão baseados na observação do comportamento passado da série para prever valores futuros. Apesar de que mudança de conceito não é uma nova área, os efeitos da mudança de conceito em séries temporais ainda não foram amplamente estudados. Algumas abordagens propostas na literatura para tratar esse problema em séries temporais são métodos passivos que atualizam sucessivamente o modelo aprendido com novas observações que chegam do fluxo de dados. Estes métodos não são transparentes para o usuário e apresentam um potencial consumo de recursos computacionais. Outras abordagens são métodos ativos que implementam um esquema de detectar-e-adaptar, no qual o modelo aprendido é adaptado somente após a detecção explícita de uma mudança. Utilizando detecção explícita, o modelo aprendido é atualizado ou retreinado somente na presença de mudanças, reduzindo a complexidade computacional e de espaço do sistema de aprendizado. Estes método são geralmente baseados na monitoração dos resíduos de um modelo ajustado ou na monitoração dos dados da série diretamente. No entanto, estas duas fontes de informação (resíduos e dados crus) podem não ser tão confiáveis para um método de detecção de mudanças. Resíduos de um modelo ajustado podem ser influenciados por problemas no treinamento. Observações cruas podem apresentar variações que não representam mudanças significativas no fluxo de dados. A principal contribuição deste trabalho é um sistema de aprendizado adaptativo ativo capaz de tratar mudanças de conceito em séries temporais. O método proposto, chamado de Feature Extraction and Weighting for Explicit Concept Drift Detection (FW-FEDD) considera um conjunto de características da série temporal para detectar mudança de conceito de uma forma mais confiável, sendo transparente ao usuário. As características consideradas são ponderadas de acordo com sua importância para a definição das mudanças em cada instante. Um teste de mudança de conceito é utilizado para detectar as mudanças de forma mais confiável. FW-FEDD também implementa um módulo de previsão composto por um conjunto de modelos de previsão onde cada modelo é especializado em um conceito diferente. Diversos experimentos computacionais usando séries reais e artificiais mostram que o método proposto é capaz de melhorar a detecção de mudança de conceito comparado com métodos baseados na monitoração de dados crus da série e métodos baseados em resíduos. Resultados também mostraram a superioridade do FW-FEDD comparado com outros métodos de aprendizado adaptativo ativos e passivos em termos de acurácia de predição.
APA, Harvard, Vancouver, ISO, and other styles
22

Costa, Fausto Guzzo da. "Employing nonlinear time series analysis tools with stable clustering algorithms for detecting concept drift on data streams." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-13112017-105506/.

Full text
Abstract:
Several industrial, scientific and commercial processes produce open-ended sequences of observations which are referred to as data streams. We can understand the phenomena responsible for such streams by analyzing data in terms of their inherent recurrences and behavior changes. Recurrences support the inference of more stable models, which are deprecated by behavior changes though. External influences are regarded as the main agent actuacting on the underlying phenomena to produce such modifications along time, such as new investments and market polices impacting on stocks, the human intervention on climate, etc. In the context of Machine Learning, there is a vast research branch interested in investigating the detection of such behavior changes which are also referred to as concept drifts. By detecting drifts, one can indicate the best moments to update modeling, therefore improving prediction results, the understanding and eventually the controlling of other influences governing the data stream. There are two main concept drift detection paradigms: the first based on supervised, and the second on unsupervised learning algorithms. The former faces great issues due to the labeling infeasibility when streams are produced at high frequencies and large volumes. The latter lacks in terms of theoretical foundations to provide detection guarantees. In addition, both paradigms do not adequately represent temporal dependencies among data observations. In this context, we introduce a novel approach to detect concept drifts by tackling two deficiencies of both paradigms: i) the instability involved in data modeling, and ii) the lack of time dependency representation. Our unsupervised approach is motivated by Carlsson and Memolis theoretical framework which ensures a stability property for hierarchical clustering algorithms regarding to data permutation. To take full advantage of such framework, we employed Takens embedding theorem to make data statistically independent after being mapped to phase spaces. Independent data were then grouped using the Permutation-Invariant Single-Linkage Clustering Algorithm (PISL), an adapted version of the agglomerative algorithm Single-Linkage, respecting the stability property proposed by Carlsson and Memoli. Our algorithm outputs dendrograms (seen as data models), which are proven to be equivalent to ultrametric spaces, therefore the detection of concept drifts is possible by comparing consecutive ultrametric spaces using the Gromov-Hausdorff (GH) distance. As result, model divergences are indeed associated to data changes. We performed two main experiments to compare our approach to others from the literature, one considering abrupt and another with gradual changes. Results confirm our approach is capable of detecting concept drifts, both abrupt and gradual ones, however it is more adequate to operate on complicated scenarios. The main contributions of this thesis are: i) the usage of Takens embedding theorem as tool to provide statistical independence to data streams; ii) the implementation of PISL in conjunction with GH (called PISLGH); iii) a comparison of detection algorithms in different scenarios; and, finally, iv) an R package (called streamChaos) that provides tools for processing nonlinear data streams as well as other algorithms to detect concept drifts.
Diversos processos industriais, científicos e comerciais produzem sequências de observações continuamente, teoricamente infinitas, denominadas fluxos de dados. Pela análise das recorrências e das mudanças de comportamento desses fluxos, é possível obter informações sobre o fenômeno que os produziu. A inferência de modelos estáveis para tais fluxos é suportada pelo estudo das recorrências dos dados, enquanto é prejudicada pelas mudanças de comportamento. Essas mudanças são produzidas principalmente por influências externas ainda desconhecidas pelos modelos vigentes, tal como ocorre quando novas estratégias de investimento surgem na bolsa de valores, ou quando há intervenções humanas no clima, etc. No contexto de Aprendizado de Máquina (AM), várias pesquisas têm sido realizadas para investigar essas variações nos fluxos de dados, referidas como mudanças de conceito. Sua detecção permite que os modelos possam ser atualizados a fim de apurar a predição, a compreensão e, eventualmente, controlar as influências que governam o fluxo de dados em estudo. Nesse cenário, algoritmos supervisionados sofrem com a limitação para rotular os dados quando esses são gerados em alta frequência e grandes volumes, e algoritmos não supervisionados carecem de fundamentação teórica para prover garantias na detecção de mudanças. Além disso, algoritmos de ambos paradigmas não representam adequadamente as dependências temporais entre observações dos fluxos. Nesse contexto, esta tese de doutorado introduz uma nova metodologia para detectar mudanças de conceito, na qual duas deficiências de ambos paradigmas de AM são confrontados: i) a instabilidade envolvida na modelagem dos dados, e ii) a representação das dependências temporais. Essa metodologia é motivada pelo arcabouço teórico de Carlsson e Memoli, que provê uma propriedade de estabilidade para algoritmos de agrupamento hierárquico com relação à permutação dos dados. Para usufruir desse arcabouço, as observações são embutidas pelo teorema de imersão de Takens, transformando-as em independentes. Esses dados são então agrupados pelo algoritmo Single-Linkage Invariante à Permutação (PISL), o qual respeita a propriedade de estabilidade de Carlsson e Memoli. A partir dos dados de entrada, esse algoritmo gera dendrogramas (ou modelos), que são equivalentes a espaços ultramétricos. Modelos sucessivos são comparados pela distância de Gromov-Hausdorff a fim de detectar mudanças de conceito no fluxo. Como resultado, as divergências dos modelos são de fato associadas a mudanças nos dados. Experimentos foram realizados, um considerando mudanças abruptas e o outro mudanças graduais. Os resultados confirmam que a metodologia proposta é capaz de detectar mudanças de conceito, tanto abruptas quanto graduais, no entanto ela é mais adequada para cenários mais complicados. As contribuições principais desta tese são: i) o uso do teorema de imersão de Takens para transformar os dados de entrada em independentes; ii) a implementação do algoritmo PISL em combinação com a distância de Gromov-Hausdorff (chamado PISLGH); iii) a comparação da metodologia proposta com outras da literatura em diferentes cenários; e, finalmente, iv) a disponibilização de um pacote em R (chamado streamChaos) que provê tanto ferramentas para processar fluxos de dados não lineares quanto diversos algoritmos para detectar mudanças de conceito.
APA, Harvard, Vancouver, ISO, and other styles
23

Schnackenberg, Sarah Anna [Verfasser], Uwe [Akademischer Betreuer] Ligges, and Claus [Gutachter] Weihs. "Online Diskriminanzanalyse für Datensituationen mit Concept Drift / Sarah Anna Schnackenberg ; Gutachter: Claus Weihs ; Betreuer: Uwe Ligges." Dortmund : Universitätsbibliothek Dortmund, 2020. http://d-nb.info/1228214336/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Bridle, Robert Angus, and robert bridle@gmail com. "Adaptive User Interfaces for Mobile Computing Devices." The Australian National University. College of Engineering and Computer Sciences, 2008. http://thesis.anu.edu.au./public/adt-ANU20081117.184430.

Full text
Abstract:
This thesis examines the use of adaptive user interface elements on a mobile phone and presents two adaptive user interface approaches. The approaches attempt to increase the efficiency with which a user interacts with a mobile phone, while ensuring the interface remains predictable to a user. ¶ An adaptive user interface approach is presented that predicts the menu item a user will select. When a menu is opened, the predicted menu item is highlighted instead of the top-most menu item. The aim is to maintain the layout of the menu and to save the user from performing scrolling key presses. A machine learning approach is used to accomplish the prediction task. However, learning in the mobile phone environment produces several difficulties. These are limited availability of training examples, concept drift and limited computational resources. A novel learning approach is presented that addresses these difficulties. This learning approach addresses limited training examples and limited computational resources by employing a highly restricted hypothesis space. Furthermore, the approach addresses concept drift by determining the hypothesis that has been consistent for the longest run of training examples into the past. Under certain concept drift restrictions, an analysis of this approach shows it to be superior to approaches that use a fixed window of training examples. An experimental evaluation on data collected from several users interacting with a mobile phone was used to assess this learning approach in practice. The results of this evaluation are reported in terms of the average number of key presses saved. The benefit of menu-item prediction can clearly be seen, with savings of up to three key presses on every menu interaction. ¶ An extension of the menu-item prediction approach is presented that removes the need to manually specify a restricted hypothesis space. The approach uses a decision-tree learner to generate hypotheses online and uses the minimum description length principle to identify the occurrence of concept shifts. The identification of concept shifts is used to guide the hypothesis generation process. The approach is compared with the original menu-item prediction approach in which hypotheses are manually specified. Experimental results using the same datasets are reported. ¶ Another adaptive user interface approach is presented that induces shortcuts on a mobile phone interface. The approach is based on identifying shortcuts in the form of macros, which can automate a sequence of actions. A means of specifying relevant action sequences is presented, together with several learning approaches for predicting which shortcut to present to a user. A small subset of the possible shortcuts on a mobile phone was considered. This subset consisted of shortcuts that automated the actions of making a phone call or sending a text message. The results of an experimental evaluation of the shortcut prediction approaches are presented. The shortcut prediction process was evaluated in terms of predictive accuracy and stability, where stability was defined as the rate at which predicted shortcuts changed over time. The importance of stability is discussed, and is used to question the advantages of using sophisticated learning approaches for achieving adaptive user interfaces on mobile phones. Finally, several methods for combining accuracy and stability measures are presented, and the learning approaches are compared with these methods.
APA, Harvard, Vancouver, ISO, and other styles
25

Joe-Yen, Stefan. "Performance Envelopes of Adaptive Ensemble Data Stream Classifiers." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1014.

Full text
Abstract:
This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as “concept drift,” often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities of each in terms of accuracy and throughput. For each< run of the experiment, the percentage of correct classifications was measured using prequential analysis, a well-established methodology in the evaluation of streaming classifiers. Throughput was measured in classifications performed per second as timed by the CPU clock. Two main experimental variables were manipulated to investigate and compare the range of accuracy and throughput exhibited by each algorithm under various conditions. The number of attributes in the instances to be classified and the speed at which the definitions of labeled data drifted were varied across six total combinations of drift-speed and dimensionality. The implications of results are used to recommend improved methods for working with stream-based data sources. The typical approach to counteract concept drift is to update the classification models with new data. In the stream paradigm, classifiers are continuously exposed to new data that may serve as representative examples of the current situation. However, updating the ensemble classifier in order to maintain or improve accuracy can be computationally costly and will negatively impact throughput. In a real-time system, this could lead to an unacceptable slow-down. The results of this research showed that,among several algorithms for reducing the effect of concept drift, adaptive decision trees maintained the highest accuracy without slowing down with respect to the no-drift condition. Adaptive ensemble techniques were also able to maintain reasonable accuracy in the presence of drift without much change in the throughput. However, the overall throughput of the adaptive methods is low and may be unacceptable for extremely time-sensitive applications. The performance visualization methodology utilized in this study gives a clear and intuitive visual summary that allows system designers to evaluate candidate algorithms with respect to their performance needs.
APA, Harvard, Vancouver, ISO, and other styles
26

Pesaranghader, Ali. "A Reservoir of Adaptive Algorithms for Online Learning from Evolving Data Streams." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38190.

Full text
Abstract:
Continuous change and development are essential aspects of evolving environments and applications, including, but not limited to, smart cities, military, medicine, nuclear reactors, self-driving cars, aviation, and aerospace. That is, the fundamental characteristics of such environments may evolve, and so cause dangerous consequences, e.g., putting people lives at stake, if no reaction is adopted. Therefore, learning systems need to apply intelligent algorithms to monitor evolvement in their environments and update themselves effectively. Further, we may experience fluctuations regarding the performance of learning algorithms due to the nature of incoming data as it continuously evolves. That is, the current efficient learning approach may become deprecated after a change in data or environment. Hence, the question 'how to have an efficient learning algorithm over time against evolving data?' has to be addressed. In this thesis, we have made two contributions to settle the challenges described above. In the machine learning literature, the phenomenon of (distributional) change in data is known as concept drift. Concept drift may shift decision boundaries, and cause a decline in accuracy. Learning algorithms, indeed, have to detect concept drift in evolving data streams and replace their predictive models accordingly. To address this challenge, adaptive learners have been devised which may utilize drift detection methods to locate the drift points in dynamic and changing data streams. A drift detection method able to discover the drift points quickly, with the lowest false positive and false negative rates, is preferred. False positive refers to incorrectly alarming for concept drift, and false negative refers to not alarming for concept drift. In this thesis, we introduce three algorithms, called as the Fast Hoeffding Drift Detection Method (FHDDM), the Stacking Fast Hoeffding Drift Detection Method (FHDDMS), and the McDiarmid Drift Detection Methods (MDDMs), for detecting drift points with the minimum delay, false positive, and false negative rates. FHDDM is a sliding window-based algorithm and applies Hoeffding’s inequality (Hoeffding, 1963) to detect concept drift. FHDDM slides its window over the prediction results, which are either 1 (for a correct prediction) or 0 (for a wrong prediction). Meanwhile, it compares the mean of elements inside the window with the maximum mean observed so far; subsequently, a significant difference between the two means, upper-bounded by the Hoeffding inequality, indicates the occurrence of concept drift. The FHDDMS extends the FHDDM algorithm by sliding multiple windows over its entries for a better drift detection regarding the detection delay and false negative rate. In contrast to FHDDM/S, the MDDM variants assign weights to their entries, i.e., higher weights are associated with the most recent entries in the sliding window, for faster detection of concept drift. The rationale is that recent examples reflect the ongoing situation adequately. Then, by putting higher weights on the latest entries, we may detect concept drift quickly. An MDDM algorithm bounds the difference between the weighted mean of elements in the sliding window and the maximum weighted mean seen so far, using McDiarmid’s inequality (McDiarmid, 1989). Eventually, it alarms for concept drift once a significant difference is experienced. We experimentally show that FHDDM/S and MDDMs outperform the state-of-the-art by representing promising results in terms of the adaptation and classification measures. Due to the evolving nature of data streams, the performance of an adaptive learner, which is defined by the classification, adaptation, and resource consumption measures, may fluctuate over time. In fact, a learning algorithm, in the form of a (classifier, detector) pair, may present a significant performance before a concept drift point, but not after. We define this problem by the question 'how can we ensure that an efficient classifier-detector pair is present at any time in an evolving environment?' To answer this, we have developed the Tornado framework which runs various kinds of learning algorithms simultaneously against evolving data streams. Each algorithm incrementally and independently trains a predictive model and updates the statistics of its drift detector. Meanwhile, our framework monitors the (classifier, detector) pairs, and recommends the efficient one, concerning the classification, adaptation, and resource consumption performance, to the user. We further define the holistic CAR measure that integrates the classification, adaptation, and resource consumption measures for evaluating the performance of adaptive learning algorithms. Our experiments confirm that the most efficient algorithm may differ over time because of the developing and evolving nature of data streams.
APA, Harvard, Vancouver, ISO, and other styles
27

Baier, Lucas [Verfasser], and G. [Akademischer Betreuer] Satzger. "Concept Drift Handling in Information Systems: Preserving the Validity of Deployed Machine Learning Models / Lucas Baier ; Betreuer: G. Satzger." Karlsruhe : KIT-Bibliothek, 2021. http://d-nb.info/1241189250/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Rakitianskaia, A. S. (Anastassia Sergeevna). "Using particle swarm optimisation to train feedforward neural networks in dynamic environments." Diss., University of Pretoria, 2011. http://hdl.handle.net/2263/28618.

Full text
Abstract:
The feedforward neural network (NN) is a mathematical model capable of representing any non-linear relationship between input and output data. It has been succesfully applied to a wide variety of classification and function approximation problems. Various neural network training algorithms were developed, including the particle swarm optimiser (PSO), which was shown to outperform the standard back propagation training algorithm on a selection of problems. However, it was usually assumed that the environment in which a NN operates is static. Such an assumption is often not valid for real life problems, and the training algorithms have to be adapted accordingly. Various dynamic versions of the PSO have already been developed. This work investigates the applicability of dynamic PSO algorithms to NN training in dynamic environments, and compares the performance of dynamic PSO algorithms to the performance of back propagation. Three popular dynamic PSO variants are considered. The extent of adaptive properties of back propagation and dynamic PSO under different kinds of dynamic environments is determined. Dynamic PSO is shown to be a viable alternative to back propagation, especially under the environments exhibiting infrequent gradual changes. Copyright 2011, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. Please cite as follows: Rakitianskaia, A 2011, Using particle swarm optimisation to train feedforward neural networks in dynamic environments, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-02132012-233212 / > C12/4/406/gm
Dissertation (MSc)--University of Pretoria, 2011.
Computer Science
Unrestricted
APA, Harvard, Vancouver, ISO, and other styles
29

Belcin, Andrei. "Smart Cube Predictions for Online Analytic Query Processing in Data Warehouses." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/41956.

Full text
Abstract:
A data warehouse (DW) is a transformation of many sources of transactional data integrated into a single collection that is non-volatile and time-variant that can provide decision support to managerial roles within an organization. For this application, the database server needs to process multiple users’ queries by joining various datasets and loading the result in main memory to begin calculations. In current systems, this process is reactionary to users’ input and can be undesirably slow. In previous studies, it was shown that a personalization scheme of a single user’s query patterns and loading the smaller subset into main memory the query response time significantly shortened the query response time. The LPCDA framework developed in this research handles multiple users’ query demands, and the query patterns are subject to change (so-called concept drift) and noise. To this end, the LPCDA framework detects changes in user behaviour and dynamically adapts the personalized smart cube definition for the group of users. Numerous data mart (DM)s, as components of the DW, are subject to intense aggregations to assist analytics at the request of automated systems and human users’ queries. Subsequently, there is a growing need to properly manage the supply of data into main memory that is in closest proximity to the CPU that computes the query in order to reduce the response time from the moment a query arrives at the DW server. As a result, this thesis proposes an end-to-end adaptive learning ensemble for resource allocation of cuboids within a a DM to achieve a relevant and timely constructed smart cube before the time in need, as a way of adopting the just-in-time inventory management strategy applied in other real-world scenarios. The algorithms comprising the ensemble involve predictive methodologies from Bayesian statistics, data mining, and machine learning, that reflect the changes in the data-generating process using a number of change detection algorithms. Therefore, given different operational constraints and data-specific considerations, the ensemble can, to an effective degree, determine the cuboids in the lattice of a DM to pre-construct into a smart cube ahead of users submitting their queries, thereby benefiting from a quicker response than static schema views or no action at all.
APA, Harvard, Vancouver, ISO, and other styles
30

Floyd, Sean Louis Alan. "Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39273.

Full text
Abstract:
In this thesis, learning refers to the intelligent computational extraction of knowledge from data. Supervised learning tasks require data to be annotated with labels, whereas for unsupervised learning, data is not labelled. Semi-supervised learning deals with data sets that are partially labelled. A major issue with supervised and semi-supervised learning of data streams is late-arriving or missing class labels. Assuming that correctly labelled data will always be available and timely is often unfeasible, and, as such, supervised methods are not directly applicable in the real world. Therefore, real-world problems usually require the use of semi-supervised or unsupervised learning techniques. For instance, when considering a spam detection task, it is not reasonable to assume that all spam will be identified (correctly labelled) prior to learning. Additionally, in semi-supervised learning, "the instances having the highest [predictive] confidence are not necessarily the most useful ones" [41]. We investigate how self-training performs without its selective heuristic in a streaming setting. This leads us to our contributions. We extend an existing concept drift detector to operate without any labelled data, by using a sliding window of our ensemble's prediction confidence, instead of a boolean indicating whether the ensemble's predictions are correct. We also extend selective self-training, a semi-supervised learning method, by using all predictions, and not only those with high predictive confidence. Finally, we introduce a novel windowing type for ensembles, as sliding windows are very time consuming and regular tumbling windows are not a suitable replacement. Our windowing technique can be considered a hybrid of the two: we train each sub-classifier in the ensemble with tumbling windows, but delay training in such a way that only one sub-classifier can update its model per iteration. We found, through statistical significance tests, that our framework is (roughly 160 times) faster than current state of the art techniques, and achieves comparable predictive accuracy. That being said, more research is needed to further reduce the quantity of labelled data used for training, while also increasing its predictive accuracy.
APA, Harvard, Vancouver, ISO, and other styles
31

Jaber, Ghazal. "An approach for online learning in the presence of concept changes." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00907486.

Full text
Abstract:
Learning from data streams is emerging as an important application area. When the environment changes, it is necessary to rely on on-line learning with the capability to adapt to changing conditions a.k.a. concept drifts. Adapting to concept drifts entails forgetting some or all of the old acquired knowledge when the concept changes while accumulating knowledge regarding the supposedly stationary underlying concept. This tradeoff is called the stability-plasticity dilemma. Ensemble methods have been among the most successful approaches. However, the management of the ensemble which ultimately controls how past data is forgotten has not been thoroughly investigated so far. Our work shows the importance of the forgetting strategy by comparing several approaches. The results thus obtained lead us to propose a new ensemble method with an enhanced forgetting strategy to adapt to concept drifts. Experimental comparisons show that our method compares favorably with the well-known state-of-the-art systems. The majority of previous works focused only on means to detect changes and to adapt to them. In our work, we go one step further by introducing a meta-learning mechanism that is able to detect relevant states of the environment, to recognize recurring contexts and to anticipate likely concepts changes. Hence, the method we suggest, deals with both the challenge of optimizing the stability-plasticity dilemma and with the anticipation and recognition of incoming concepts. This is accomplished through an ensemble method that controls a ensemble of incremental learners. The management of the ensemble of learners enables one to naturally adapt to the dynamics of the concept changes with very few parameters to set, while a learning mechanism managing the changes in the ensemble provides means for the anticipation of, and the quick adaptation to, the underlying modification of the context.
APA, Harvard, Vancouver, ISO, and other styles
32

Mohammad, Rami Mustafa A. "An ensemble self-structuring neural network approach to solving classification problems with virtual concept drift and its application to phishing websites." Thesis, University of Huddersfield, 2016. http://eprints.hud.ac.uk/id/eprint/30188/.

Full text
Abstract:
Classification in data mining is one of the well-known tasks that aim to construct a classification model from a labelled input data set. Most classification models are devoted to a static environment where the complete training data set is presented to the classification algorithm. This data set is assumed to cover all information needed to learn the pertinent concepts (rules and patterns) related to how to classify unseen examples to predefined classes. However, in dynamic (non-stationary) domains, the set of features (input data attributes) may change over time. For instance, some features that are considered significant at time Ti might become useless or irrelevant at time Ti+j. This situation results in a phenomena called Virtual Concept Drift. Yet, the set of features that are dropped at time Ti+j might return to become significant again in the future. Such a situation results in the so-called Cyclical Concept Drift, which is a direct result of the frequently called catastrophic forgetting dilemma. Catastrophic forgetting happens when the learning of new knowledge completely removes the previously learned knowledge. Phishing is a dynamic classification problem where a virtual concept drift might occur. Yet, the virtual concept drift that occurs in phishing might be guided by some malevolent intelligent agent rather than occurring naturally. One reason why phishers keep changing the features combination when creating phishing websites might be that they have the ability to interpret the anti-phishing tool and thus they pick a new set of features that can circumvent it. However, besides the generalisation capability, fault tolerance, and strong ability to learn, a Neural Network (NN) classification model is considered as a black box. Hence, if someone has the skills to hack into the NN based classification model, he might face difficulties to interpret and understand how the NN processes the input data in order to produce the final decision (assign class value). In this thesis, we investigate the problem of virtual concept drift by proposing a framework that can keep pace with the continuous changes in the input features. The proposed framework has been applied to phishing websites classification problem and it shows competitive results with respect to various evaluation measures (Harmonic Mean (F1-score), precision, accuracy, etc.) when compared to several other data mining techniques. The framework creates an ensemble of classifiers (group of classifiers) and it offers a balance between stability (maintaining previously learned knowledge) and plasticity (learning knowledge from the newly offered training data set). Hence, the framework can also handle the cyclical concept drift. The classifiers that constitute the ensemble are created using an improved Self-Structuring Neural Networks algorithm (SSNN). Traditionally, NN modelling techniques rely on trial and error, which is a tedious and time-consuming process. The SSNN simplifies structuring NN classifiers with minimum intervention from the user. The framework evaluates the ensemble whenever a new data set chunk is collected. If the overall accuracy of the combined results from the ensemble drops significantly, a new classifier is created using the SSNN and added to the ensemble. Overall, the experimental results show that the proposed framework affords a balance between stability and plasticity and can effectively handle the virtual concept drift when applied to phishing websites classification problem. Most of the chapters of this thesis have been subject to publication.
APA, Harvard, Vancouver, ISO, and other styles
33

Malik, Muhammad Hamza. "Information extraction and mapping for KG construction with learned concepts from scientic documents : Experimentation with relations data for development of concept learner." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285572.

Full text
Abstract:
Systematic review of research manuscripts is a common procedure in which research studies pertaining a particular field or domain are classified and structured in a methodological way. This process involves, between other steps, an extensive review and consolidation of scientific metrics and attributes of the manuscripts, such as citations, type or venue of publication. The extraction and mapping of relevant publication data, evidently, is a very laborious task if performed manually. Automation of such systematic mapping steps intend to reduce the human effort required and therefore can potentially reduce the time required for this process.The objective of this thesis is to automate the data extraction and mapping steps when systematically reviewing studies. The manual process is replaced by novel graph modelling techniques for effective knowledge representation, as well as novel machine learning techniques that aim to learn these representations. This eventually automates this process by characterising the publications on the basis of certain sub-properties and qualities that give the reviewer a quick high-level overview of each research study. The final model is a concept learner that predicts these sub-properties which in addition addresses the inherent concept-drift of novel manuscripts over time. Different models were developed and explored in this research study for the development of concept learner.Results show that: (1) Graph reasoning techniques which leverage the expressive power in modern graph databases are very effective in capturing the extracted knowledge in a so-called knowledge graph, which allows us to form concepts that can be learned using standard machine learning techniques like logistic regression, decision trees and neural networks etc. (2) Neural network models and ensemble models outperformed other standard machine learning techniques like logistic regression and decision trees based on the evaluation metrics. (3) The concept learner is able to detect and avoid concept drift by retraining the model.
Systematisk granskning av forskningsmanuskript är en vanlig procedur där forskningsstudier inom ett visst område klassificeras och struktureras på ett metodologiskt sätt. Denna process innefattar en omfattande granskning och sammanförande av vetenskapliga mätvärden och attribut för manuskriptet, såsom citat, typ av manuskript eller publiceringsplats. Framställning och kartläggning av relevant publikationsdata är uppenbarligen en mycket mödosam uppgift om den utförs manuellt. Avsikten med automatiseringen av processen för denna typ av systematisk kartläggning är att minska den mänskliga ansträngningen, och den tid som krävs kan på så sätt minskas. Syftet med denna avhandling är att automatisera datautvinning och stegen för kartläggning vid systematisk granskning av studier. Den manuella processen ersätts av avancerade grafmodelleringstekniker för effektiv kunskapsrepresentation, liksom avancerade maskininlärningstekniker som syftar till att lära maskinen dessa representationer. Detta automatiserar så småningom denna process genom att karakterisera publikationerna beserat på vissa subjektiva egenskaper och kvaliter som ger granskaren en snabb god översikt över varje forskningsstudie. Den slutliga modellen är ett inlärningskoncept som förutsäger dessa subjektiva egenskaper och dessutom behandlar den inneboende konceptuella driften i manuskriptet över tiden. Olika modeller utvecklades och undersöktes i denna forskningsstudie för utvecklingen av inlärningskonceptet. Resultaten visar att: (1) Diagrammatiskt resonerande som uttnytjar moderna grafdatabaser är mycket effektiva för att fånga den framställda kunskapen i en så kallad kunskapsgraf, och gör det möjligt att vidareutveckla koncept som kan läras med hjälp av standard tekniker för maskininlärning. (2) Neurala nätverksmodeller och ensemblemodeller överträffade andra standard maskininlärningstekniker baserat på utvärderingsvärdena. (3) Inlärningskonceptet kan detektera och undvika konceptuell drift baserat på F1-poäng och omlärning av algoritmen.
APA, Harvard, Vancouver, ISO, and other styles
34

Diaz, Jorge Cristhian Chamby. "An incremental gaussian mixture network for data stream classification in non-stationary environments." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/174484.

Full text
Abstract:
Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito.
Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift.
APA, Harvard, Vancouver, ISO, and other styles
35

Dong, Yue. "Higher Order Neural Networks and Neural Networks for Stream Learning." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/35731.

Full text
Abstract:
The goal of this thesis is to explore some variations of neural networks. The thesis is mainly split into two parts: a variation of the shaping functions in neural networks and a variation of learning rules in neural networks. In the first part, we mainly investigate polynomial perceptrons - a perceptron with a polynomial shaping function instead of a linear one. We prove the polynomial perceptron convergence theorem and illustrate the notion by showing that a higher order perceptron can learn the XOR function through empirical experiments with implementation. In the second part, we propose three models (SMLP, SA, SA2) for stream learning and anomaly detection in streams. The main technique allowing these models to perform at a level comparable to the state-of-the-art algorithms in stream learning is the learning rule used. We employ mini-batch gradient descent algorithm and stochastic gradient descent algorithm to speed up the models. In addition, the use of parallel processing with multi-threads makes the proposed methods highly efficient in dealing with streaming data. Our analysis shows that all models have linear runtime and constant memory requirement. We also demonstrate empirically that the proposed methods feature high detection rate, low false alarm rate, and fast response. The paper on the first two models (SMLP, SA) is published in the 29th Canadian AI Conference and won the best paper award. The invited journal paper on the third model (SA2) for Computational Intelligence is under peer review.
APA, Harvard, Vancouver, ISO, and other styles
36

Olorunnimbe, Muhammed. "Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32340.

Full text
Abstract:
In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
APA, Harvard, Vancouver, ISO, and other styles
37

Oliveira, Luan Soares. "Classificação de fluxos de dados não estacionários com algoritmos incrementais baseados no modelo de misturas gaussianas." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06042016-143503/.

Full text
Abstract:
Aprender conceitos provenientes de fluxos de dados é uma tarefa significamente diferente do aprendizado tradicional em lote. No aprendizado em lote, existe uma premissa implicita que os conceitos a serem aprendidos são estáticos e não evoluem significamente com o tempo. Por outro lado, em fluxos de dados os conceitos a serem aprendidos podem evoluir ao longo do tempo. Esta evolução é chamada de mudança de conceito, e torna a criação de um conjunto fixo de treinamento inaplicável neste cenário. O aprendizado incremental é uma abordagem promissora para trabalhar com fluxos de dados. Contudo, na presença de mudanças de conceito, conceitos desatualizados podem causar erros na classificação de eventos. Apesar de alguns métodos incrementais baseados no modelo de misturas gaussianas terem sido propostos na literatura, nota-se que tais algoritmos não possuem uma política explicita de descarte de conceitos obsoletos. Nesse trabalho um novo algoritmo incremental para fluxos de dados com mudanças de conceito baseado no modelo de misturas gaussianas é proposto. O método proposto é comparado com vários algoritmos amplamente utilizados na literatura, e os resultados mostram que o algoritmo proposto é competitivo com os demais em vários cenários, superando-os em alguns casos.
Learning concepts from data streams differs significantly from traditional batch learning. In batch learning there is an implicit assumption that the concept to be learned is static and does not evolve significantly over time. On the other hand, in data stream learning the concepts to be learned may evolve over time. This evolution is called concept drift, and makes the creation of a fixed training set be no longer applicable. Incremental learning paradigm is a promising approach for learning in a data stream setting. However, in the presence of concept drifts, out dated concepts can cause misclassifications. Several incremental Gaussian mixture models methods have been proposed in the literature, but these algorithms lack an explicit policy to discard outdated concepts. In this work, a new incremental algorithm for data stream with concept drifts based on Gaussian Mixture Models is proposed. The proposed methodis compared to various algorithms widely used in the literature, and the results show that it is competitive with them invarious scenarios, overcoming them in some cases.
APA, Harvard, Vancouver, ISO, and other styles
38

Dal, Pozzolo Andrea. "Adaptive Machine Learning for Credit Card Fraud Detection." Doctoral thesis, Universite Libre de Bruxelles, 2015. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/221654.

Full text
Abstract:
Billions of dollars of loss are caused every year by fraudulent credit card transactions. The design of efficient fraud detection algorithms is key for reducing these losses, and more and more algorithms rely on advanced machine learning techniques to assist fraud investigators. The design of fraud detection algorithms is however particularly challenging due to the non-stationary distribution of the data, the highly unbalanced classes distributions and the availability of few transactions labeled by fraud investigators. At the same time public data are scarcely available for confidentiality issues, leaving unanswered many questions about what is the best strategy. In this thesis we aim to provide some answers by focusing on crucial issues such as: i) why and how undersampling is useful in the presence of class imbalance (i.e. frauds are a small percentage of the transactions), ii) how to deal with unbalanced and evolving data streams (non-stationarity due to fraud evolution and change of spending behavior), iii) how to assess performances in a way which is relevant for detection and iv) how to use feedbacks provided by investigators on the fraud alerts generated. Finally, we design and assess a prototype of a Fraud Detection System able to meet real-world working conditions and that is able to integrate investigators’ feedback to generate accurate alerts.
Doctorat en Sciences
info:eu-repo/semantics/nonPublished
APA, Harvard, Vancouver, ISO, and other styles
39

Žliobaitė, Indrė. "Adaptive Training Set Formation." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20100416_094953-42662.

Full text
Abstract:
Nowadays, when the environment is changing rapidly and dynamically, there is a particular need for adaptive data mining methods. `Spam' filters, personalized recommender and marketing systems, network intrusion detection systems, business prediction and decision support systems need to be regularly retrained to take into account changing nature of the data. In the stationary settings the more data is at hand, the more accurate model can be trained. In the changing environment an old data decreases the accuracy. In such a case only a subset of the historical data might be selected to form a training set. For instance, the training window strategy uses only the newest historical instances. In the thesis adaptive data mining methods are addressed, which are based on selective training set formation. The thesis improves the training strategies under sudden, gradual and recurring concept drifts. Four adaptive training set formation algorithms are developed and experimentally validated, which allow to increase the generalization performance of the base models under each of the three concept drift types. Experimental evaluation using generated and real data confirms improvement of the classification and prediction accuracies as compared to using all the historical data as well as the selected existing adaptive learning algorithms from the recent literature. A tailored method for an industrial boiler application, which unifies several drift types, is developed.
Šiandieninėje, dinamiškai besikeičiančioje aplinkoje reikalingi adaptyvūs duomenų gavybos metodai. Nepageidaujamų laiškų klasifikatoriai, asmeninio rekomendavimo ir rinkodaros, įsilaužimų į kompiuterinius tinklus aptikimo, verslo rodiklių prognozavimo bei sprendimų priėmimo sistemos turi nuolat “persimokyti”, reaguoti į besikeičiančius duomenis. Stacionarioje aplinkoje kuo daugiau mokymo duomenų - tuo tikslesnis modelis. Besikeičiančioje aplinkoje seni duomenys blogina tikslumą. Tokiu atveju, vietoje visų turimų istorinių duomenų panaudojimo, gali būti tikslingai išrenkama tik tam tikra jų dalis, pvz. naudojamas mokymo langas (tik naujausi duomenys). Tiriamojo darbo objektas yra adaptyvūs mokymo metodai, kurie remiasi kryptingu mokymo imties formavimu. Darbe patobulintos mokymo strategijos esant staigiems, palaipsniams ir pasikartojantiems pokyčiams. Sukurti ir eksperimentiškai aprobuoti keturi adaptyvaus mokymo imties formavimo algoritmai, kurie leidžia pagerinti klasifikavimo bei prognozavimo tikslumą besikeičiančiose aplinkose, esant atitinkamai kiekvienam iš trijų pokyčių tipų. Naudojant generuotus bei realius duomenis eksperimentiškai parodytas klasifikavimo bei prognozavimo tikslumo pagerėjimas, lyginant su visų istorinių duomenų naudojimu mokymui, bei žinomais šioje srityje naudojamais adaptyviais mokymo algoritmais. Sukurta metodika pritaikyta pramoninio katilo atvejui, jungiančiam kelis aplinkos pokyčių tipus.
APA, Harvard, Vancouver, ISO, and other styles
40

Žliobaitė, Indrė. "Adaptyvus mokymo imties formavimas." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20100416_095003-09795.

Full text
Abstract:
Šiandieninėje, dinamiškai besikeičiančioje aplinkoje reikalingi adaptyvūs duomenų gavybos metodai. Nepageidaujamų laiškų klasifikatoriai, asmeninio rekomendavimo ir rinkodaros, įsilaužimų į kompiuterinius tinklus aptikimo, verslo rodiklių prognozavimo bei sprendimų priėmimo sistemos turi nuolat “persimokyti”, reaguoti į besikeičiančius duomenis. Stacionarioje aplinkoje kuo daugiau mokymo duomenų - tuo tikslesnis modelis. Besikeičiančioje aplinkoje seni duomenys blogina tikslumą. Tokiu atveju, vietoje visų turimų istorinių duomenų panaudojimo, gali būti tikslingai išrenkama tik tam tikra jų dalis, pvz. naudojamas mokymo langas (tik naujausi duomenys). Tiriamojo darbo objektas yra adaptyvūs mokymo metodai, kurie remiasi kryptingu mokymo imties formavimu. Darbe patobulintos mokymo strategijos esant staigiems, palaipsniams ir pasikartojantiems pokyčiams. Sukurti ir eksperimentiškai aprobuoti keturi adaptyvaus mokymo imties formavimo algoritmai, kurie leidžia pagerinti klasifikavimo bei prognozavimo tikslumą besikeičiančiose aplinkose, esant atitinkamai kiekvienam iš trijų pokyčių tipų. Naudojant generuotus bei realius duomenis eksperimentiškai parodytas klasifikavimo bei prognozavimo tikslumo pagerėjimas, lyginant su visų istorinių duomenų naudojimu mokymui, bei žinomais šioje srityje naudojamais adaptyviais mokymo algoritmais. Sukurta metodika pritaikyta pramoninio katilo atvejui, jungiančiam kelis aplinkos pokyčių tipus.
Nowadays, when the environment is changing rapidly and dynamically, there is a particular need for adaptive data mining methods. `Spam' filters, personalized recommender and marketing systems, network intrusion detection systems, business prediction and decision support systems need to be regularly retrained to take into account changing nature of the data. In the stationary settings the more data is at hand, the more accurate model can be trained. In the changing environment an old data decreases the accuracy. In such a case only a subset of the historical data might be selected to form a training set. For instance, the training window strategy uses only the newest historical instances. In the thesis adaptive data mining methods are addressed, which are based on selective training set formation. The thesis improves the training strategies under sudden, gradual and recurring concept drifts. Four adaptive training set formation algorithms are developed and experimentally validated, which allow to increase the generalization performance of the base models under each of the three concept drift types. Experimental evaluation using generated and real data confirms improvement of the classification and prediction accuracies as compared to using all the historical data as well as the selected existing adaptive learning algorithms from the recent literature. A tailored method for an industrial boiler application, which unifies several drift types, is developed.
APA, Harvard, Vancouver, ISO, and other styles
41

Reis, Denis Moreira dos. "Classificação de fluxos de dados com mudança de conceito e latência de verificação." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-13012017-095800/.

Full text
Abstract:
Apesar do grau relativamente alto de maturidade existente na área de pesquisa de aprendizado supervisionado em lote, na qual são utilizados dados originários de problemas estacionários, muitas aplicações reais lidam com fluxos de dados cujas distribuições de probabilidade se alteram com o tempo, ocasionando mudanças de conceito. Diversas pesquisas vêm sendo realizadas nos últimos anos com o objetivo de criar modelos precisos mesmo na presença de mudanças de conceito. A maioria delas, no entanto, assume que tão logo um evento seja classificado pelo algoritmo de aprendizado, seu rótulo verdadeiro se torna conhecido. Este trabalho explora as situações complementares, com revisão dos trabalhos mais importantes publicados e análise do impacto de atraso na disponibilidade dos rótulos verdadeiros ou sua não disponibilização. Ainda, propõe um novo algoritmo que reduz drasticamente a complexidade de aplicação do teste de hipótese não-paramétrico Kolmogorov-Smirnov, tornado eficiente seu uso em algoritmos que analisem fluxos de dados. A exemplo, mostramos sua potencial aplicação em um método de detecção de mudança de conceito não-supervisionado que, em conjunto com técnicas de Aprendizado Ativo e Aprendizado por Transferência, reduz a necessidade de rótulos verdadeiros para manter boa performance de um classificador ao longo do tempo, mesmo com a ocorrência de mudanças de conceito.
Despite the relatively maturity of batch-mode supervised learning research, in which the data typifies stationary problems, many real world applications deal with data streams whose statistical distribution changes over time, causing what is known as concept drift. A large body of research has been done in the last years, with the objective of creating new models that are accurate even in the presence of concept drifts. However, most of them assume that, once the classification algorithm labels an event, its actual label become readily available. This work explores the complementary situations, with a review of the most important published works and an analysis over the impact of delayed true labeling, including no true label availability at all. Furthermore, this work proposes a new algorithm that heavily reduces the complexity of applying Kolmogorov- Smirnov non-parametric hypotheis test, turning it into an uselful tool for analysis on data streams. As an instantiation of its usefulness, we present an unsupervised drift-detection method that, along with Active Learning and Transfer Learning approaches, decreases the number of true labels that are required to keep good classification performance over time, even in the presence of concept drifts.
APA, Harvard, Vancouver, ISO, and other styles
42

Montiel, López Jacob. "Fast and slow machine learning." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT014/document.

Full text
Abstract:
L'ère du Big Data a révolutionné la manière dont les données sont créées et traitées. Dans ce contexte, de nombreux défis se posent, compte tenu de la quantité énorme de données disponibles qui doivent être efficacement gérées et traitées afin d’extraire des connaissances. Cette thèse explore la symbiose de l'apprentissage en mode batch et en flux, traditionnellement considérés dans la littérature comme antagonistes, sur le problème de la classification à partir de flux de données en évolution. L'apprentissage en mode batch est une approche bien établie basée sur une séquence finie: d'abord les données sont collectées, puis les modèles prédictifs sont créés, finalement le modèle est appliqué. Par contre, l’apprentissage par flux considère les données comme infinies, rendant le problème d’apprentissage comme une tâche continue (sans fin). De plus, les flux de données peuvent évoluer dans le temps, ce qui signifie que la relation entre les caractéristiques et la réponse correspondante peut changer. Nous proposons un cadre systématique pour prévoir le surendettement, un problème du monde réel ayant des implications importantes dans la société moderne. Les deux versions du mécanisme d'alerte précoce (batch et flux) surpassent les performances de base de la solution mise en œuvre par le Groupe BPCE, la deuxième institution bancaire en France. De plus, nous introduisons une méthode d'imputation évolutive basée sur un modèle pour les données manquantes dans la classification. Cette méthode présente le problème d'imputation sous la forme d'un ensemble de tâches de classification / régression résolues progressivement.Nous présentons un cadre unifié qui sert de plate-forme d'apprentissage commune où les méthodes de traitement par batch et par flux peuvent interagir de manière positive. Nous montrons que les méthodes batch peuvent être efficacement formées sur le réglage du flux dans des conditions spécifiques. Nous proposons également une adaptation de l'Extreme Gradient Boosting algorithme aux flux de données en évolution. La méthode adaptative proposée génère et met à jour l'ensemble de manière incrémentielle à l'aide de mini-lots de données. Enfin, nous présentons scikit-multiflow, un framework open source en Python qui comble le vide en Python pour une plate-forme de développement/recherche pour l'apprentissage à partir de flux de données en évolution
The Big Data era has revolutionized the way in which data is created and processed. In this context, multiple challenges arise given the massive amount of data that needs to be efficiently handled and processed in order to extract knowledge. This thesis explores the symbiosis of batch and stream learning, which are traditionally considered in the literature as antagonists. We focus on the problem of classification from evolving data streams.Batch learning is a well-established approach in machine learning based on a finite sequence: first data is collected, then predictive models are created, then the model is applied. On the other hand, stream learning considers data as infinite, rendering the learning problem as a continuous (never-ending) task. Furthermore, data streams can evolve over time, meaning that the relationship between features and the corresponding response (class in classification) can change.We propose a systematic framework to predict over-indebtedness, a real-world problem with significant implications in modern society. The two versions of the early warning mechanism (batch and stream) outperform the baseline performance of the solution implemented by the Groupe BPCE, the second largest banking institution in France. Additionally, we introduce a scalable model-based imputation method for missing data in classification. This method casts the imputation problem as a set of classification/regression tasks which are solved incrementally.We present a unified framework that serves as a common learning platform where batch and stream methods can positively interact. We show that batch methods can be efficiently trained on the stream setting under specific conditions. The proposed hybrid solution works under the positive interactions between batch and stream methods. We also propose an adaptation of the Extreme Gradient Boosting (XGBoost) algorithm for evolving data streams. The proposed adaptive method generates and updates the ensemble incrementally using mini-batches of data. Finally, we introduce scikit-multiflow, an open source framework in Python that fills the gap in Python for a development/research platform for learning from evolving data streams
APA, Harvard, Vancouver, ISO, and other styles
43

Montiel, López Jacob. "Fast and slow machine learning." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT014.

Full text
Abstract:
L'ère du Big Data a révolutionné la manière dont les données sont créées et traitées. Dans ce contexte, de nombreux défis se posent, compte tenu de la quantité énorme de données disponibles qui doivent être efficacement gérées et traitées afin d’extraire des connaissances. Cette thèse explore la symbiose de l'apprentissage en mode batch et en flux, traditionnellement considérés dans la littérature comme antagonistes, sur le problème de la classification à partir de flux de données en évolution. L'apprentissage en mode batch est une approche bien établie basée sur une séquence finie: d'abord les données sont collectées, puis les modèles prédictifs sont créés, finalement le modèle est appliqué. Par contre, l’apprentissage par flux considère les données comme infinies, rendant le problème d’apprentissage comme une tâche continue (sans fin). De plus, les flux de données peuvent évoluer dans le temps, ce qui signifie que la relation entre les caractéristiques et la réponse correspondante peut changer. Nous proposons un cadre systématique pour prévoir le surendettement, un problème du monde réel ayant des implications importantes dans la société moderne. Les deux versions du mécanisme d'alerte précoce (batch et flux) surpassent les performances de base de la solution mise en œuvre par le Groupe BPCE, la deuxième institution bancaire en France. De plus, nous introduisons une méthode d'imputation évolutive basée sur un modèle pour les données manquantes dans la classification. Cette méthode présente le problème d'imputation sous la forme d'un ensemble de tâches de classification / régression résolues progressivement.Nous présentons un cadre unifié qui sert de plate-forme d'apprentissage commune où les méthodes de traitement par batch et par flux peuvent interagir de manière positive. Nous montrons que les méthodes batch peuvent être efficacement formées sur le réglage du flux dans des conditions spécifiques. Nous proposons également une adaptation de l'Extreme Gradient Boosting algorithme aux flux de données en évolution. La méthode adaptative proposée génère et met à jour l'ensemble de manière incrémentielle à l'aide de mini-lots de données. Enfin, nous présentons scikit-multiflow, un framework open source en Python qui comble le vide en Python pour une plate-forme de développement/recherche pour l'apprentissage à partir de flux de données en évolution
The Big Data era has revolutionized the way in which data is created and processed. In this context, multiple challenges arise given the massive amount of data that needs to be efficiently handled and processed in order to extract knowledge. This thesis explores the symbiosis of batch and stream learning, which are traditionally considered in the literature as antagonists. We focus on the problem of classification from evolving data streams.Batch learning is a well-established approach in machine learning based on a finite sequence: first data is collected, then predictive models are created, then the model is applied. On the other hand, stream learning considers data as infinite, rendering the learning problem as a continuous (never-ending) task. Furthermore, data streams can evolve over time, meaning that the relationship between features and the corresponding response (class in classification) can change.We propose a systematic framework to predict over-indebtedness, a real-world problem with significant implications in modern society. The two versions of the early warning mechanism (batch and stream) outperform the baseline performance of the solution implemented by the Groupe BPCE, the second largest banking institution in France. Additionally, we introduce a scalable model-based imputation method for missing data in classification. This method casts the imputation problem as a set of classification/regression tasks which are solved incrementally.We present a unified framework that serves as a common learning platform where batch and stream methods can positively interact. We show that batch methods can be efficiently trained on the stream setting under specific conditions. The proposed hybrid solution works under the positive interactions between batch and stream methods. We also propose an adaptation of the Extreme Gradient Boosting (XGBoost) algorithm for evolving data streams. The proposed adaptive method generates and updates the ensemble incrementally using mini-batches of data. Finally, we introduce scikit-multiflow, an open source framework in Python that fills the gap in Python for a development/research platform for learning from evolving data streams
APA, Harvard, Vancouver, ISO, and other styles
44

Loeffel, Pierre-Xavier. "Algorithmes de machine learning adaptatifs pour flux de données sujets à des changements de concept." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066496/document.

Full text
Abstract:
Dans cette thèse, nous considérons le problème de la classification supervisée sur un flux de données sujets à des changements de concepts. Afin de pouvoir apprendre dans cet environnement, nous pensons qu’un algorithme d’apprentissage doit combiner plusieurs caractéristiques. Il doit apprendre en ligne, ne pas faire d’hypothèses sur le concept ou sur la nature des changements de concepts et doit être autorisé à s’abstenir de prédire lorsque c’est nécessaire. Les algorithmes en ligne sont un choix évident pour traiter les flux de données. De par leur structure, ils sont capables de continuellement affiner le modèle appris à l’aide des dernières observations reçues. La structure instance based a des propriétés qui la rende particulièrement adaptée pour traiter le problème des flux de données sujet à des changements de concept. En effet, ces algorithmes font très peu d’hypothèses sur la nature du concept qu’ils essaient d’apprendre ce qui leur donne une flexibilité qui les rend capable d’apprendre un vaste éventail de concepts. Une autre force est que stocker certaines des observations passées dans la mémoire peux amener de précieuses meta-informations qui pourront être utilisées par la suite par l’algorithme. Enfin, nous mettons en valeur l’importance de permettre à un algorithme d’apprentissage de s’abstenir de prédire lorsque c’est nécessaire. En effet, les changements de concepts peuvent être la source de beaucoup d’incertitudes et, parfois, l’algorithme peux ne pas avoir suffisamment d’informations pour donner une prédiction fiable
In this thesis, we investigate the problem of supervised classification on a data stream subject to concept drifts. In order to learn in this environment, we claim that a successful learning algorithm must combine several characteristics. It must be able to learn and adapt continuously, it shouldn’t make any assumption on the nature of the concept or the expected type of drifts and it should be allowed to abstain from prediction when necessary. On-line learning algorithms are the obvious choice to handle data streams. Indeed, their update mechanism allows them to continuously update their learned model by always making use of the latest data. The instance based (IB) structure also has some properties which make it extremely well suited to handle the issue of data streams with drifting concepts. Indeed, IB algorithms make very little assumptions about the nature of the concept they are trying to learn. This grants them a great flexibility which make them likely to be able to learn from a wide range of concepts. Another strength is that storing some of the past observations into memory can bring valuable meta-informations which can be used by an algorithm. Furthermore, the IB structure allows the adaptation process to rely on hard evidences of obsolescence and, by doing so, adaptation to concept changes can happen without the need to explicitly detect the drifts. Finally, in this thesis we stress the importance of allowing the learning algorithm to abstain from prediction in this framework. This is because the drifts can generate a lot of uncertainties and at times, an algorithm might lack the necessary information to accurately predict
APA, Harvard, Vancouver, ISO, and other styles
45

Albuquerque, Regis Antonio Saraiva, and 68999536833. "Seleção dinâmica de comitês de classificadores baseada em diversidade e acurácia para detecção de mudança de conceitos." Universidade Federal do Amazonas, 2018. https://tede.ufam.edu.br/handle/tede/6480.

Full text
Abstract:
Submitted by Regis Albuquerque (regis.albuquerque1@gmail.com) on 2018-06-20T21:40:28Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao_regis_corrigida_final.pdf: 2557634 bytes, checksum: b48eb7c37fd9dd633c4489a7f0f041a4 (MD5)
Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-06-20T21:52:37Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao_regis_corrigida_final.pdf: 2557634 bytes, checksum: b48eb7c37fd9dd633c4489a7f0f041a4 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-06-21T13:29:00Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao_regis_corrigida_final.pdf: 2557634 bytes, checksum: b48eb7c37fd9dd633c4489a7f0f041a4 (MD5)
Made available in DSpace on 2018-06-21T13:29:01Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao_regis_corrigida_final.pdf: 2557634 bytes, checksum: b48eb7c37fd9dd633c4489a7f0f041a4 (MD5) Previous issue date: 2018-06-08
FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas
Many machine learning applications have to deal with classification problems in dynamic environments. This type of environment may be affected by concept drift, which may reduce the accuracy of classification systems significantly. In this context, methods using ensemble of classifiers are interesting due to the fact that ensembles of classifiers allow the design of strategies for drift detection and reaction more accurate and robust to changes. A classification system based on ensemble of classifiers may be divided into three main phases: classifier generation; single classifier or subset of classifier selection; and classifier fusion. The selection phase may be performed as a dynamic process. In this case, for each unknown sample, the individual classifier or classifier ensemble most likely to be correct is chosen to assign a label to the sample. In this work, it is proposed a method for concept drift detection and reaction based on dynamic classifier ensemble selection. The proposed method choses the expert classifier ensemble according to diversity and accuracy values. Focusing on evaluating the impact of dynamic ensemble selection guided by diversity and accuracy in terms of concept drift detection and reaction, four series of experiments were carried in this work using both synthetic and real datasets. In addition, since the proposed method is broken down into four phases: pool of ensemble classifiers generation; dynamic ensemble selection; drift detection; and drift reaction, different versions of the proposed method were investigated by varying the parameters of each phase. The results show that, in general, all these different versions attain very similar accuracy values. Besides, when compared to two baselines: (1) DDM - single classifier-based; and (2) Leveraging Bagging - classifier ensemble-based, our method outperforms both baselines since it achieved higher accuracy, lower detection delay and false detection rates, and it did not present missing detection. However, both baselines present lower time complexity. Therefore, this work shows that dynamic classifier ensemble selection guided by diversity and accuracy helps to improve detection precision and the general accuracy of classification systems employed in problems with concept drift.
Muitas aplicações de aprendizado de máquina estão relacionadas com problemas de classificação em ambientes dinâmicos. Mudança de conceito figura nesse tipo de ambiente e pode prejudicar muito a acurácia de sistemas de classificação. Nesse contexto, a utilização de comitês de classificadores é interessante porque possibilita a implementação de processos de detecção e de reação à mudança mais acurados e robustos. Sistemas de classificação que utilizam comitês podem possuir três grandes fases: geração; seleção; e integração de classificadores. A etapa de seleção pode ser feita de forma dinâmica, isto é, para cada instância desconhecida, o classificador ou comitê de classificadores com maior probabilidade de acerto é escolhido para atribuir uma classe à essa instância. Neste trabalho, é proposto um método para detecção e reação à mudança de conceito que utiliza seleção dinâmica de comitês de classificadores. O método proposto escolhe o comitê especialista com base nos valores de diversidade e de acurácia de cada comitê candidato. A fim de avaliar o impacto do uso de seleção dinâmica guiada por diversidade e acurácia nas tarefas de detecção e reação a mudança de conceito, foram realizadas quatro séries de experimentos com bases sintéticas e reais. Além disso, como o método proposto é dividido em quatro fases: geração da população de comitês; seleção dinâmica do comitê especialista; detecção de mudanças; e reação à mudança, diferentes versões desse método foram investigadas em função da definição de parâmetros de cada fase. Os resultados dos experimentos mostraram que, de maneira geral, as versões estudadas são bem equivalentes em termos de acurácia média final. Adicionalmente, quando comparado a dois baselines: (1) DDM - que utiliza um único classificador; e (2) Leveraging Bagging - que utiliza um comitê de classificadores, o método proposto alcançou melhores taxas de acurácia, menores taxas de atraso de detecção, não deixou de detectar as mudanças conhecidas nas bases e produziu reduzidas taxas de falsa detecção, apesar de apresentar maior complexidade computacional. Portanto, o trabalho mostra que o uso de seleção dinâmica guiada por diversidade e acurácia melhora a precisão de detecção, bem como a acurácia geral de sistemas de classificação utilizados em problemas que apresentam mudança de conceitos.
APA, Harvard, Vancouver, ISO, and other styles
46

Loeffel, Pierre-Xavier. "Algorithmes de machine learning adaptatifs pour flux de données sujets à des changements de concept." Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066496.

Full text
Abstract:
Dans cette thèse, nous considérons le problème de la classification supervisée sur un flux de données sujets à des changements de concepts. Afin de pouvoir apprendre dans cet environnement, nous pensons qu’un algorithme d’apprentissage doit combiner plusieurs caractéristiques. Il doit apprendre en ligne, ne pas faire d’hypothèses sur le concept ou sur la nature des changements de concepts et doit être autorisé à s’abstenir de prédire lorsque c’est nécessaire. Les algorithmes en ligne sont un choix évident pour traiter les flux de données. De par leur structure, ils sont capables de continuellement affiner le modèle appris à l’aide des dernières observations reçues. La structure instance based a des propriétés qui la rende particulièrement adaptée pour traiter le problème des flux de données sujet à des changements de concept. En effet, ces algorithmes font très peu d’hypothèses sur la nature du concept qu’ils essaient d’apprendre ce qui leur donne une flexibilité qui les rend capable d’apprendre un vaste éventail de concepts. Une autre force est que stocker certaines des observations passées dans la mémoire peux amener de précieuses meta-informations qui pourront être utilisées par la suite par l’algorithme. Enfin, nous mettons en valeur l’importance de permettre à un algorithme d’apprentissage de s’abstenir de prédire lorsque c’est nécessaire. En effet, les changements de concepts peuvent être la source de beaucoup d’incertitudes et, parfois, l’algorithme peux ne pas avoir suffisamment d’informations pour donner une prédiction fiable
In this thesis, we investigate the problem of supervised classification on a data stream subject to concept drifts. In order to learn in this environment, we claim that a successful learning algorithm must combine several characteristics. It must be able to learn and adapt continuously, it shouldn’t make any assumption on the nature of the concept or the expected type of drifts and it should be allowed to abstain from prediction when necessary. On-line learning algorithms are the obvious choice to handle data streams. Indeed, their update mechanism allows them to continuously update their learned model by always making use of the latest data. The instance based (IB) structure also has some properties which make it extremely well suited to handle the issue of data streams with drifting concepts. Indeed, IB algorithms make very little assumptions about the nature of the concept they are trying to learn. This grants them a great flexibility which make them likely to be able to learn from a wide range of concepts. Another strength is that storing some of the past observations into memory can bring valuable meta-informations which can be used by an algorithm. Furthermore, the IB structure allows the adaptation process to rely on hard evidences of obsolescence and, by doing so, adaptation to concept changes can happen without the need to explicitly detect the drifts. Finally, in this thesis we stress the importance of allowing the learning algorithm to abstain from prediction in this framework. This is because the drifts can generate a lot of uncertainties and at times, an algorithm might lack the necessary information to accurately predict
APA, Harvard, Vancouver, ISO, and other styles
47

Nunes, André Luís. "Um estudo investigativo de algoritmos de regressão para data streams." Universidade do Vale do Rio dos Sinos, 2017. http://www.repositorio.jesuita.org.br/handle/UNISINOS/6345.

Full text
Abstract:
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-13T14:22:04Z No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5)
Made available in DSpace on 2017-06-13T14:22:04Z (GMT). No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) Previous issue date: 2017-03-28
Nenhuma
A explosão no volume de dados e a sua velocidade de expansão tornam as tarefas de descoberta do conhecimento e a análise de dados desafiantes, ainda mais quando consideradas bases não-estacionárias. Embora a predição de valores futuros exerça papel fundamental em áreas como: o clima, problemas de roteamentos e economia, entre outros, a classificação ainda parece ser a tarefa mais explorada. Recentemente, alguns algoritmos voltados à regressão de valores foram lançados, como por exemplo: FIMT-DD, AMRules, IBLStreams e SFNRegressor, entretanto seus estudos investigativos exploraram mais aspectos de inovação e análise do erro de predição, do que explorar suas capacidades mediante critérios apontados como fundamentais para data stream, como tempo de execução e memória. Dessa forma, o objetivo deste trabalho é apresentar um estudo investigativo sobre estes algoritmos que tratam regressão, considerando ambientes dinâmicos, utilizando bases de dados massivas, além de explorar a capacidade de adaptação dos algoritmos com a presença de concept drift. Para isto três bases de dados foram analisadas e estendidas para explorar os principais critérios de avaliação adotados, sendo realizada uma ampla experimentação que produziu uma comparação dos resultados obtidos frente aos algoritmos escolhidos, possibilitando gerar indicativos do comportamento de cada um mediante os diferentes cenários a que foram expostos. Assim, como principais contribuições deste trabalho são destacadas: a avaliação de critérios fundamentais: memória, tempo de execução e poder de generalização, relacionados a regressão para data stream; produção de uma análise crítica dos algoritmos investigados; e a possibilidade de reprodução e extensão dos estudos realizados pela disponibilização das parametrizações empregadas
The explosion of data volume and its expansion speed make tasks of finding knowledge and analyzing data challenging, even more so when non-stationary bases are considered. Although the future values prediction plays a fundamental role in areas such as climate, routing problems and economics, among others, classification seems to be still the most exploited task. Recently, some value-regression algorithms have been launched, for example: FIMT-DD, AMRules, IBLStreams and SFNRegressor; however, their investigative studies have explored more aspects of innovation and analysis of error prediction than exploring their capabilities through criteria that are considered fundamental to data stream, such as elapsed time and memory. In this way, the objective of this work is to present an investigative study about these algorithms that treat regression considering dynamic environments, using massive databases, and also explore the algorithm's adaptability capacity with the presence of concept drift. In order to do this, three databases were analyzed and extended to explore the main evaluation criteria adopted. A wide experiment was carried out, which produced a comparison of the results obtained with the chosen algorithms, allowing to generate behavior indication of each one through the different scenarios to which were exposed. Thus, the main contributions of this work are: evaluation of fundamental criteria: memory, execution time and power of generalization, related to regression to data stream; production of a critical analysis of the algorithms investigated; and the possibility of reproducing and extending the studies carried out by making available the parametrizations applyed.
APA, Harvard, Vancouver, ISO, and other styles
48

Ellis, Mathys. "Regularised feed forward neural networks for streamed data classification problems." Diss., University of Pretoria, 2020. http://hdl.handle.net/2263/75804.

Full text
Abstract:
Streamed data classification problems (SDCPs) require classifiers with the ability to learn and to adjust to the underlying relationships in data streams, in real-time. This requirement poses a challenge to classifiers, because the learning task is no longer just to find the optimal decision boundaries, but also to track changes in the decision boundaries as new training data is received. The challenge is due to concept drift, i.e. the changing of decision boundaries over time. Changes include disappearing, appearing, or shifting decision boundaries. This thesis proposes an online learning approach for feed forward neural networks (FFNNs) that meets the requirements of SDCPs. The approach uses regularisation to optimise the architecture via the weights, and quantum particle swarm optimisation (QPSO) to dynamically adjust the weights. The learning approach is applied to a FFNN, which uses rectified linear activation functions, to form a novel SDCP classifier. The classifier is empirically investigated on several SDCPs. Both weight decay (WD) and weight elimination (WE) are investigated as regularisers. Empirical results show that using QPSO with no regularisation, causes the classifier to completely saturate. However, using QPSO with regularisation enables the classifier to dynamically adapt both its implicit architecture and weights as decision boundaries change. Furthermore, the results favour WE over WD as a regulariser for QPSO.
Dissertation (MSc)--University of Pretoria, 2020.
National Research Foundation (NRF)
Computer Science
MSc
Unrestricted
APA, Harvard, Vancouver, ISO, and other styles
49

Jarosch, Martin. "Klasifikace v proudu dat pomocí souboru klasifikátorů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-235468.

Full text
Abstract:
This master's thesis deals with knowledge discovery and is focused on data stream classification. Three ensemble classification methods are described here. These methods are implemented in practical part of this thesis and are included in the classification system. Extensive measurements and experimentation were used for method analysis and comparison. Implemented methods were then integrated into Malware analysis system. At the conclusion are presented obtained results.
APA, Harvard, Vancouver, ISO, and other styles
50

Togbe, Maurras Ulbricht. "Détection distribuée d'anomalies dans les flux de données." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS400.

Full text
Abstract:
La détection d'anomalies est une problématique importante dans de nombreux domaines d'application comme la santé, le transport, l'industrie etc. Il s'agit d'un sujet d'actualité qui tente de répondre à la demande toujours croissante dans différents domaines tels que la détection d'intrusion, de fraude, etc. Dans cette thèse, après un état de l'art général complet, la méthode non supervisé Isolation Forest (IForest) a été étudiée en profondeur en présentant ses limites qui n'ont pas été abordées dans la littérature. Notre nouvelle version de IForest appelée Majority Voting IForest permet d'améliorer son temps d'exécution. Nos méthodes ADWIN-based IForest ASD et NDKSWIN-based IForest ASD permettent la détection d'anomalies dans les flux de données avec une meilleure gestion du concept drift. Enfin, la détection distribuée d'anomalies en utilisant IForest a été étudiée et évaluée. Toutes nos propositions ont été validées avec des expérimentations sur différents jeux de données
Anomaly detection is an important issue in many application areas such as healthcare, transportation, industry etc. It is a current topic that tries to meet the ever increasing demand in different areas such as intrusion detection, fraud detection, etc. In this thesis, after a general complet state of the art, the unsupervised method Isolation Forest (IForest) has been studied in depth by presenting its limitations that have not been addressed in the literature. Our new version of IForest called Majority Voting IForest improves its execution time. Our ADWIN-based IForest ASD and NDKSWIN-based IForest ASD methods allow the detection of anomalies in data stream with a better management of the drift concept. Finally, distributed anomaly detection using IForest has been studied and evaluated. All our proposals have been validated with experiments on different datasets
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography