Dissertations / Theses on the topic 'Unsupervised anomaly detection'

To see the other types of publications on this topic, follow the link: Unsupervised anomaly detection.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Unsupervised anomaly detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mazel, Johan. "Unsupervised network anomaly detection." Thesis, Toulouse, INSA, 2011. http://www.theses.fr/2011ISAT0024/document.

Full text
Abstract:
La détection d'anomalies est une tâche critique de l'administration des réseaux. L'apparition continue de nouvelles anomalies et la nature changeante du trafic réseau compliquent de fait la détection d'anomalies. Les méthodes existantes de détection d'anomalies s'appuient sur une connaissance préalable du trafic : soit via des signatures créées à partir d'anomalies connues, soit via un profil de normalité. Ces deux approches sont limitées : la première ne peut détecter les nouvelles anomalies et la seconde requiert une constante mise à jour de son profil de normalité. Ces deux aspects limitent de façon importante l'efficacité des méthodes de détection existantes.Nous présentons une approche non-supervisée qui permet de détecter et caractériser les anomalies réseaux de façon autonome. Notre approche utilise des techniques de partitionnement afin d'identifier les flux anormaux. Nous proposons également plusieurs techniques qui permettent de traiter les anomalies extraites pour faciliter la tâche des opérateurs. Nous évaluons les performances de notre système sur des traces de trafic réel issues de la base de trace MAWI. Les résultats obtenus mettent en évidence la possibilité de mettre en place des systèmes de détection d'anomalies autonomes et fonctionnant sans connaissance préalable
Anomaly detection has become a vital component of any network in today’s Internet. Ranging from non-malicious unexpected events such as flash-crowds and failures, to network attacks such as denials-of-service and network scans, network traffic anomalies can have serious detrimental effects on the performance and integrity of the network. The continuous arising of new anomalies and attacks create a continuous challenge to cope with events that put the network integrity at risk. Moreover, the inner polymorphic nature of traffic caused, among other things, by a highly changing protocol landscape, complicates anomaly detection system's task. In fact, most network anomaly detection systems proposed so far employ knowledge-dependent techniques, using either misuse detection signature-based detection methods or anomaly detection relying on supervised-learning techniques. However, both approaches present major limitations: the former fails to detect and characterize unknown anomalies (letting the network unprotected for long periods) and the latter requires training over labeled normal traffic, which is a difficult and expensive stage that need to be updated on a regular basis to follow network traffic evolution. Such limitations impose a serious bottleneck to the previously presented problem.We introduce an unsupervised approach to detect and characterize network anomalies, without relying on signatures, statistical training, or labeled traffic, which represents a significant step towards the autonomy of networks. Unsupervised detection is accomplished by means of robust data-clustering techniques, combining Sub-Space clustering with Evidence Accumulation or Inter-Clustering Results Association, to blindly identify anomalies in traffic flows. Correlating the results of several unsupervised detections is also performed to improve detection robustness. The correlation results are further used along other anomaly characteristics to build an anomaly hierarchy in terms of dangerousness. Characterization is then achieved by building efficient filtering rules to describe a detected anomaly. The detection and characterization performances and sensitivities to parameters are evaluated over a substantial subset of the MAWI repository which contains real network traffic traces.Our work shows that unsupervised learning techniques allow anomaly detection systems to isolate anomalous traffic without any previous knowledge. We think that this contribution constitutes a great step towards autonomous network anomaly detection.This PhD thesis has been funded through the ECODE project by the European Commission under the Framework Programme 7. The goal of this project is to develop, implement, and validate experimentally a cognitive routing system that meet the challenges experienced by the Internet in terms of manageability and security, availability and accountability, as well as routing system scalability and quality. The concerned use case inside the ECODE project is network anomaly
APA, Harvard, Vancouver, ISO, and other styles
2

Joshi, Vineet. "Unsupervised Anomaly Detection in Numerical Datasets." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Di, Felice Marco. "Unsupervised anomaly detection in HPC systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text
Abstract:
Alla base di questo studio vi è l'analisi di tecniche non supervisionate applicate per il rilevamento di stati anomali in sistemi HPC, complessi calcolatori capaci di raggiungere prestazioni dell'ordine dei PetaFLOPS. Nel mondo HPC, per anomalia si intende un particolare stato che induce un cambiamento delle prestazioni rispetto al normale funzionamento del sistema. Le anomalie possono essere di natura diversa come il guasto che può riguardare un componente, una configurazione errata o un'applicazione che entra in uno stato inatteso provocando una prematura interruzione dei processi. I datasets utilizzati in un questo progetto sono stati raccolti da D.A.V.I.D.E., un reale sistema HPC situato presso il CINECA di Casalecchio di Reno, o sono stati generati simulando lo stato di un singolo nodo di un virtuale sistema HPC analogo a quello del CINECA modellato secondo specifiche funzioni non lineari ma privo di rumore. Questo studio propone un approccio inedito, quello non supervisionato, mai applicato prima per svolgere anomaly detection in sistemi HPC. Si è focalizzato sull'individuazione dei possibili vantaggi indotti dall'uso di queste tecniche applicate in tale campo. Sono stati realizzati e mostrati alcuni casi che hanno prodotto raggruppamenti interessanti attraverso le combinazioni di Variational Autoencoders, un particolare tipo di autoencoder probabilistico con la capacità di preservare la varianza dell'input set nel suo spazio latente, e di algoritmi di clustering, come K-Means, DBSCAN, Gaussian Mixture ed altri già noti in letteratura.
APA, Harvard, Vancouver, ISO, and other styles
4

Forstén, Andreas. "Unsupervised Anomaly Detection in Receipt Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215161.

Full text
Abstract:
With the progress of data handling methods and computing power comes the possibility of automating tasks that are not necessarily handled by humans. This study was done in cooperation with a company that digitalizes receipts for companies. We investigate the possibility of automating the task of finding anomalous receipt data, which could automate the work of receipt auditors. We study both anomalous user behaviour and individual receipts. The results indicate that automation is possible, which may reduce the necessity of human inspection of receipts.
Med de framsteg inom datahantering och datorkraft som gjorts så kommer också möjligheten att automatisera uppgifter som ej nödvändigtvis utförs av människor. Denna studie gjordes i samarbete med ett företag som digitaliserar företags kvitton. Vi undersöker möjligheten att automatisera sökandet av avvikande kvittodata, vilket kan avlasta revisorer. Vti studerar både avvikande användarbeteenden och individuella kvitton. Resultaten indikerar att automatisering är möjligt, vilket kan reducera behovet av mänsklig inspektion av kvitton
APA, Harvard, Vancouver, ISO, and other styles
5

Cheng, Leon. "Unsupervised topic discovery by anomaly detection." Thesis, Monterey, California: Naval Postgraduate School, 2013. http://hdl.handle.net/10945/37599.

Full text
Abstract:
Approved for public release; distribution is unlimited
With the vast amount of information and public comment available online, it is of increasing interest to understand what is being said and what topics are trending online. Government agencies, for example, want to know what policies concern the public without having to look through thousands of comments manually. Topic detection provides automatic identification of topics in documents based on the information content and enhances many natural language processing tasks, including text summarization and information retrieval. Unsupervised topic detection, however, has always been a difficult task. Methods such as Latent Dirichlet Allocation (LDA) convert documents from word space into document space (weighted sums over topic space), but do not perform any form of classification, nor do they address the relation of generated topics with actual human level topics. In this thesis we attempt a novel way of unsupervised topic detection and classification by performing LDA and then clustering. We propose variations to the popular K-Mean Clustering algorithm to optimize the choice of centroids, and we perform experiments using Facebook data and the New York Times (NYT) corpus. Although the results were poor for the Facebook data, our method performed acceptably with the NYT data. The new clustering algorithms also performed slightly and consistently better than the normal K-Means algorithm.
APA, Harvard, Vancouver, ISO, and other styles
6

Putina, Andrian. "Unsupervised anomaly detection : methods and applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT012.

Full text
Abstract:
Une anomalie (également connue sous le nom de outlier) est une instance qui s'écarte de manière significative du reste des données et est définie par Hawkins comme "une observation, qui s'écarte tellement des autres observations qu'elle éveille les soupçons qu'il a été généré par un mécanisme différent". La détection d’anomalies (également connue sous le nom de détection de valeurs aberrantes ou de nouveauté) est donc le domaine de l’apprentissage automatique et de l’exploration de données dans le but d’identifier les instances dont les caractéristiques semblent être incohérentes avec le reste de l’ensemble de données. Dans de nombreuses applications, distinguer correctement l'ensemble des points de données anormaux (outliers) de l'ensemble des points normaux (inliers) s'avère très important. Une première application est le nettoyage des données, c'est-à-dire l'identification des mesures bruyantes et fallacieuses dans un ensemble de données avant d'appliquer davantage les algorithmes d'apprentissage. Cependant, avec la croissance explosive du volume de données pouvant être collectées à partir de diverses sources, par exemple les transactions par carte, les connexions Internet, les mesures de température, etc., l'utilisation de la détection d'anomalies devient une tâche autonome cruciale pour la surveillance continue des systèmes. Dans ce contexte, la détection d'anomalies peut être utilisée pour détecter des attaques d'intrusion en cours, des réseaux de capteurs défaillants ou des masses cancéreuses. La thèse propose d'abord une approche basée sur un collection d'arbres pour la détection non supervisée d'anomalies, appelée "Random Histogram Forest (RHF)". L'algorithme résout le problème de la dimensionnalité en utilisant le quatrième moment central (alias 'kurtosis') dans la construction du modèle en bénéficiant d'un temps d'exécution linéaire. Un moteur de détection d'anomalies basé sur le stream, appelé 'ODS', qui exploite DenStream, une technique de clustering non supervisée est présenté par la suite et enfin un moteur de détection automatisée d'anomalies qui allège l'effort humain requis lorsqu'il s'agit de plusieurs algorithmes et hyper-paramètres est présenté en dernière contribution
An anomaly (also known as outlier) is an instance that significantly deviates from the rest of the input data and being defined by Hawkins as 'an observation, which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism'. Anomaly detection (also known as outlier or novelty detection) is thus the machine learning and data mining field with the purpose of identifying those instances whose features appear to be inconsistent with the remainder of the dataset. In many applications, correctly distinguishing the set of anomalous data points (outliers) from the set of normal ones (inliers) proves to be very important. A first application is data cleaning, i.e., identifying noisy and fallacious measurement in a dataset before further applying learning algorithms. However, with the explosive growth of data volume collectable from various sources, e.g., card transactions, internet connections, temperature measurements, etc. the use of anomaly detection becomes a crucial stand-alone task for continuous monitoring of the systems. In this context, anomaly detection can be used to detect ongoing intrusion attacks, faulty sensor networks or cancerous masses.The thesis proposes first a batch tree-based approach for unsupervised anomaly detection, called 'Random Histogram Forest (RHF)'. The algorithm solves the curse of dimensionality problem using the fourth central moment (aka kurtosis) in the model construction while boasting linear running time. A stream based anomaly detection engine, called 'ODS', that leverages DenStream, an unsupervised clustering technique is presented subsequently and finally Automated Anomaly Detection engine which alleviates the human effort required when dealing with several algorithm and hyper-parameters is presented as last contribution
APA, Harvard, Vancouver, ISO, and other styles
7

Audibert, Julien. "Unsupervised anomaly detection in time-series." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS358.

Full text
Abstract:
La détection d'anomalies dans les séries temporelles multivariées est un enjeu majeur dans de nombreux domaines. La complexité croissante des systèmes et l'explosion de la quantité de données ont rendu son automatisation indispensable. Cette thèse propose une méthode non supervisée de détection d'anomalies dans les séries temporelles multivariées appelée USAD. Cependant, les méthodes de réseaux de neurones profonds souffrent d'une limitation dans leur capacité à extraire des caractéristiques des données puisqu'elles ne s'appuient que sur des informations locales. Afin d'améliorer les performances de ces méthodes, cette thèse présente une stratégie d'ingénierie des caractéristiques qui introduit des informations non-locales. Enfin, cette thèse propose une comparaison de seize méthodes de détection d'anomalies dans les séries temporelles pour comprendre si l'explosion de la complexité des méthodes de réseaux de neurones proposées dans les publications actuelles est réellement nécessaire
Anomaly detection in multivariate time series is a major issue in many fields. The increasing complexity of systems and the explosion of the amount of data have made its automation indispensable. This thesis proposes an unsupervised method for anomaly detection in multivariate time series called USAD. However, deep neural network methods suffer from a limitation in their ability to extract features from the data since they only rely on local information. To improve the performance of these methods, this thesis presents a feature engineering strategy that introduces non-local information. Finally, this thesis proposes a comparison of sixteen time series anomaly detection methods to understand whether the explosion in complexity of neural network methods proposed in the current literature is really necessary
APA, Harvard, Vancouver, ISO, and other styles
8

Dani, Mohamed Cherif. "Unsupervised anomaly detection for aircraft health monitoring system." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB258.

Full text
Abstract:
La limite des connaissances techniques ou fondamentale, est une réalité dont l’industrie fait face. Le besoin de mettre à jour cette connaissance acquise est essentiel pour une compétitivité économique, mais aussi pour une meilleure maniabilité des systèmes et machines. Aujourd’hui grâce à ces systèmes et machine, l’expansion de données en quantité, en fréquence de génération est un véritable phénomène. À présent par exemple, les avions Airbus génèrent des centaines de mégas de données par jour, et intègrent des centaines voire des milliers de capteurs dans les nouvelles générations d’avions. Ces données générées par ces capteurs, sont exploitées au sol ou pendant le vol, pour surveiller l’état et la santé de l’avion, et pour détecter des pannes, des incidents ou des changements. En théorie, ces pannes, ces incidents ou ces changements sont connus sous le terme d’anomalie. Une anomalie connue comme un comportement qui ne correspond pas au comportement normal des données. Certains la définissent comme une déviation d’un modèle normal, d’autres la définissent comme un changement. Quelques soit la définition, le besoin de détecter cette anomalie est important pour le bon fonctionnement de l'avion. Actuellement, la détection des anomalies à bord des avions est assuré par plusieurs équipements de surveillance aéronautiques, l’un de ces équipements est le « Aircraft condition monitoring System –ACMS », enregistre les données générées par les capteurs en continu, il surveille aussi l’avion en temps réel grâce à des triggers et des seuils programmés par des Airlines ou autres mais à partir d’une connaissance a priori du système. Cependant, plusieurs contraintes limitent le bon fonctionnement de cet équipement, on peut citer par exemple, la limitation des connaissances humaines un problème classique que nous rencontrons dans plusieurs domaines. Cela veut dire qu’un trigger ne détecte que les anomalies et les incidents dont il est désigné, et si une nouvelle condition surgit suite à une maintenance, changement de pièce, etc. Le trigger est incapable s’adapter à cette nouvelle condition, et il va labéliser toute cette nouvelle condition comme étant une anomalie. D’autres problèmes et contraintes seront cités progressivement dans les chapitres qui suivent. L’objectif principal de notre travail est de détecter les anomalies et les changements dans les données de capteurs, afin d’améliorer le system de surveillance de santé d’avion connu sous le nom Aircraft Health Monitoring(AHM). Ce travail est basé principalement sur une analyse à deux étapes, Une analyse unie varie dans un contexte non supervisé, qui nous permettra de se focaliser sur le comportement de chaque capteur indépendamment, et de détecter les différentes anomalies et changements pour chaque capteur. Puis une analyse multi-variée qui nous permettra de filtrer certaines anomalies détectées (fausses alarmes) dans la première analyse et de détecter des groupes de comportement suspects. La méthode est testée sur des données réelles et synthétiques, où les résultats, l’identification et la validation des anomalies sont discutées dans cette thèse
The limitation of the knowledge, technical, fundamental is a daily challenge for industries. The need to updates these knowledge are important for a competitive industry and also for an efficient reliability and maintainability of the systems. Actually, thanks to these machines and systems, the expansion of the data on quantity and frequency of generation is a real phenomenon. Within Airbus for example, and thanks to thousands of sensors, the aircrafts generate hundreds of megabytes of data per flight. These data are today exploited on the ground to improve safety and health monitoring system as a failure, incident and change detection. In theory, these changes, incident and failure are known as anomalies. An anomaly is known as deviation form a normal behavior of the data. Others define it as a behavior that do not conform the normal behavior. Whatever the definition, the anomaly detection process is very important for good functioning of the aircraft. Currently, the anomaly detection process is provided by several health monitoring equipments, one of these equipment is the Aircraft Health Monitoring System (ACMS), it records continuously the date of each sensor, and also monitor these sensors to detect anomalies and incident using triggers and predefined condition (exeedance approach). These predefined conditions are programmed by airlines and system designed according to a prior knowledge (physical, mechanical, etc.). However, several constraints limit the ACMS anomaly detection potential. We can mention, for example, the limitation the expert knowledge which is a classic problem in many domains, since the triggers are designed only to the targeted anomalies. Otherwise, the triggers do not cover all the system conditions. In other words, if a new behavior appears (new condition) in the sensor, after a maintenance action, parts changing, etc. the predefined conditions won't detect any thing and may be in many cases generated false alarms. Another constraint is that the triggers (predefined conditions) are static, they are unable to adapt their proprieties to each new condition. Another limitation is discussed gradually in the future chapters. The principle of objective of this thesis is to detect anomalies and changes in the ACMS data. In order to improve the health monitoring function of the ACMS. The work is based principally on two stages, the univariate anomaly detection stage, where we use the unsupervised learning to process the univariate sensors, since we don’t have any a prior knowledge of the system, and no documentation or labeled classes are available. The univariate analysis focuses on each sensor independently. The second stage of the analysis is the multivariate anomaly detection, which is based on density clustering, where the objective is to filter the anomalies detected in the first stage (false alarms) and to detect suspected behaviours (group of anomalies). The anomalies detected in both univariate and multivariate can be potential triggers or can be used to update the existing triggers. Otherwise, we propose also a generic concept of anomaly detection based on univariate and multivariate anomaly detection. And finally a new concept of validation anomalies within airbus
APA, Harvard, Vancouver, ISO, and other styles
9

Dani, Mohamed Cherif. "Unsupervised anomaly detection for aircraft health monitoring system." Electronic Thesis or Diss., Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB258.

Full text
Abstract:
La limite des connaissances techniques ou fondamentale, est une réalité dont l’industrie fait face. Le besoin de mettre à jour cette connaissance acquise est essentiel pour une compétitivité économique, mais aussi pour une meilleure maniabilité des systèmes et machines. Aujourd’hui grâce à ces systèmes et machine, l’expansion de données en quantité, en fréquence de génération est un véritable phénomène. À présent par exemple, les avions Airbus génèrent des centaines de mégas de données par jour, et intègrent des centaines voire des milliers de capteurs dans les nouvelles générations d’avions. Ces données générées par ces capteurs, sont exploitées au sol ou pendant le vol, pour surveiller l’état et la santé de l’avion, et pour détecter des pannes, des incidents ou des changements. En théorie, ces pannes, ces incidents ou ces changements sont connus sous le terme d’anomalie. Une anomalie connue comme un comportement qui ne correspond pas au comportement normal des données. Certains la définissent comme une déviation d’un modèle normal, d’autres la définissent comme un changement. Quelques soit la définition, le besoin de détecter cette anomalie est important pour le bon fonctionnement de l'avion. Actuellement, la détection des anomalies à bord des avions est assuré par plusieurs équipements de surveillance aéronautiques, l’un de ces équipements est le « Aircraft condition monitoring System –ACMS », enregistre les données générées par les capteurs en continu, il surveille aussi l’avion en temps réel grâce à des triggers et des seuils programmés par des Airlines ou autres mais à partir d’une connaissance a priori du système. Cependant, plusieurs contraintes limitent le bon fonctionnement de cet équipement, on peut citer par exemple, la limitation des connaissances humaines un problème classique que nous rencontrons dans plusieurs domaines. Cela veut dire qu’un trigger ne détecte que les anomalies et les incidents dont il est désigné, et si une nouvelle condition surgit suite à une maintenance, changement de pièce, etc. Le trigger est incapable s’adapter à cette nouvelle condition, et il va labéliser toute cette nouvelle condition comme étant une anomalie. D’autres problèmes et contraintes seront cités progressivement dans les chapitres qui suivent. L’objectif principal de notre travail est de détecter les anomalies et les changements dans les données de capteurs, afin d’améliorer le system de surveillance de santé d’avion connu sous le nom Aircraft Health Monitoring(AHM). Ce travail est basé principalement sur une analyse à deux étapes, Une analyse unie varie dans un contexte non supervisé, qui nous permettra de se focaliser sur le comportement de chaque capteur indépendamment, et de détecter les différentes anomalies et changements pour chaque capteur. Puis une analyse multi-variée qui nous permettra de filtrer certaines anomalies détectées (fausses alarmes) dans la première analyse et de détecter des groupes de comportement suspects. La méthode est testée sur des données réelles et synthétiques, où les résultats, l’identification et la validation des anomalies sont discutées dans cette thèse
The limitation of the knowledge, technical, fundamental is a daily challenge for industries. The need to updates these knowledge are important for a competitive industry and also for an efficient reliability and maintainability of the systems. Actually, thanks to these machines and systems, the expansion of the data on quantity and frequency of generation is a real phenomenon. Within Airbus for example, and thanks to thousands of sensors, the aircrafts generate hundreds of megabytes of data per flight. These data are today exploited on the ground to improve safety and health monitoring system as a failure, incident and change detection. In theory, these changes, incident and failure are known as anomalies. An anomaly is known as deviation form a normal behavior of the data. Others define it as a behavior that do not conform the normal behavior. Whatever the definition, the anomaly detection process is very important for good functioning of the aircraft. Currently, the anomaly detection process is provided by several health monitoring equipments, one of these equipment is the Aircraft Health Monitoring System (ACMS), it records continuously the date of each sensor, and also monitor these sensors to detect anomalies and incident using triggers and predefined condition (exeedance approach). These predefined conditions are programmed by airlines and system designed according to a prior knowledge (physical, mechanical, etc.). However, several constraints limit the ACMS anomaly detection potential. We can mention, for example, the limitation the expert knowledge which is a classic problem in many domains, since the triggers are designed only to the targeted anomalies. Otherwise, the triggers do not cover all the system conditions. In other words, if a new behavior appears (new condition) in the sensor, after a maintenance action, parts changing, etc. the predefined conditions won't detect any thing and may be in many cases generated false alarms. Another constraint is that the triggers (predefined conditions) are static, they are unable to adapt their proprieties to each new condition. Another limitation is discussed gradually in the future chapters. The principle of objective of this thesis is to detect anomalies and changes in the ACMS data. In order to improve the health monitoring function of the ACMS. The work is based principally on two stages, the univariate anomaly detection stage, where we use the unsupervised learning to process the univariate sensors, since we don’t have any a prior knowledge of the system, and no documentation or labeled classes are available. The univariate analysis focuses on each sensor independently. The second stage of the analysis is the multivariate anomaly detection, which is based on density clustering, where the objective is to filter the anomalies detected in the first stage (false alarms) and to detect suspected behaviours (group of anomalies). The anomalies detected in both univariate and multivariate can be potential triggers or can be used to update the existing triggers. Otherwise, we propose also a generic concept of anomaly detection based on univariate and multivariate anomaly detection. And finally a new concept of validation anomalies within airbus
APA, Harvard, Vancouver, ISO, and other styles
10

Sarossy, George. "Anomaly detection in Network data with unsupervised learning methods." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55096.

Full text
Abstract:
Anomaly detection has become a crucial part of the protection of information and integrity. Due to the increase of cyber threats the demand for anomaly detection has grown for companies. Anomaly detection on time series data aims to detect unexpected behavior on the system. Anomalies often occur online, and companies need to be able to protect themselves from these intrusions. Multiple machine learning algorithms have been used and researched to solve the problem with anomaly detection and it is ongoing research to find the most optimal algorithms. Therefore, this study investigates algorithms such as K-means, Mean Shift and DBSCAN algorithm could be a solution for the problem. The study also investigates if combining the algorithms will improve the result. The results that the study reveals that the combinations of the algorithms perform slightly worse than the individual algorithms regarding speed and accuracy to detect anomalies. The algorithms without combinations did perform well during this study, they have slight differences between each other, and the results show the DBSCAN algorithm has slightly better total detection compared to the other algorithms and has slower execution time. The conclusion for this study reveals that the Mean Shift algorithm had the fastest execution time and the DBSCAN algorithm had the highest accuracy. The study also reveals most of the combinations between the algorithms did not improve during the fusion. However, the DBSCAN + Mean Shift fusion did improve the accuracy, and the K-means + Mean Shift fusion did improve the execution time.
APA, Harvard, Vancouver, ISO, and other styles
11

Lindgren, Erik, and Niklas Allard. "Exploring unsupervised anomaly detection in Bill of Materials structures." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160262.

Full text
Abstract:
Siemens produce a variety of different products that provide innovative solutions within different areas such as electrification, automation and digitalization, some of which are turbine machines. During the process of creating or modifying a machine, it is vital that the documentation used as reference is trustworthy and complete. If the documentation is incomplete during the process, the risk of delivering faulty machines to customers drastically increases, causing potential harm to Siemens. This thesis aims to explore the possibility of finding anomalies in Bill of Material structures, in order to determine the completeness of a given machine structure. A prototype that determines the completeness of a given machine structure by utilizing anomaly detection, was created. Three different anomaly detection algorithms where tested in the prototype: DBSCAN, LOF and Isolation Forest. From the tests, we could see indications of DBSCAN generally performing the best, making it the algorithm of choice for the prototype. In order to achieve more accurate results, more tests needs to be performed.
APA, Harvard, Vancouver, ISO, and other styles
12

Vendramin, Nicoló. "Unsupervised Anomaly Detection on Multi-Process Event Time Series." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254885.

Full text
Abstract:
Establishing whether the observed data are anomalous or not is an important task that has been widely investigated in literature, and it becomes an even more complex problem if combined with high dimensional representations and multiple sources independently generating the patterns to be analyzed. The work presented in this master thesis employs a data-driven pipeline for the definition of a recurrent auto-encoder architecture to analyze, in an unsupervised fashion, high-dimensional event time-series generated by multiple and variable processes interacting with a system. Facing the above mentioned problem the work investigates whether it is possible or not to use a single model to analyze patterns produced by different sources. The analysis of log files that record events of interaction between users and the radio network infrastructure is employed as realworld case-study for the given problem. The investigation aims to verify the performances of a single machine learning model applied to the learning of multiple patterns developed through time by distinct sources. The work proposes a pipeline, to deal with the complex representation of the data source and the definition and tuning of the anomaly detection model, that is based on no domain-specific knowledge and can thus be adapted to different problem settings. The model has been implemented in four different variants that have been evaluated over both normal and anomalous data, gathered partially from real network cells and partially from the simulation of anomalous behaviours. The empirical results show the applicability of the model for the detection of anomalous sequences and events in the described conditions, with scores reaching above 80% in terms of F1-score, and varying depending on the specific threshold setting. In addition, their deeper interpretation gives insights about the difference between the variants of the model and thus, their limitations and strong points.
Att fastställa huruvida observerade data är avvikande eller inte är en viktig uppgift som har studerats ingående i litteraturen och problemet blir ännu mer komplext, om detta kombineras med högdimensionella representationer och flera källor som oberoende genererar de mönster som ska analyseras. Arbetet som presenteras i denna uppsats använder en data-driven pipeline för definitionen av en återkommande auto-encoderarkitektur för att analysera, på ett oövervakat sätt, högdimensionella händelsetidsserier som genereras av flera och variabla processer som interagerar med ett system. Mot bakgrund av ovanstående problem undersöker arbetet om det är möjligt eller inte att använda en enda modell för att analysera mönster som producerats av olika källor. Analys av loggfiler som registrerar händelser av interaktion mellan användare och radionätverksinfrastruktur används som en fallstudie för det angivna problemet. Undersökningen syftar till att verifiera prestandan hos en enda maskininlärningsmodell som tillämpas för inlärning av flera mönster som utvecklats över tid från olika källor. Arbetet föreslår en pipeline för att hantera den komplexa representationen hos datakällorna och definitionen och avstämningen av anomalidetektionsmodellen, som inte är baserad på domänspecifik kunskap och därför kan anpassas till olika probleminställningar. Modellen har implementerats i fyra olika varianter som har utvärderats med avseende på både normala och avvikande data, som delvis har samlats in från verkliga nätverksceller och delvis från simulering av avvikande beteenden. De empiriska resultaten visar modellens tillämplighet för detektering av avvikande sekvenser och händelser i det föreslagna ramverket, med F1-score över 80%, varierande beroende på den specifika tröskelinställningen. Dessutom ger deras djupare tolkning insikter om skillnaden mellan olika varianter av modellen och därmed deras begränsningar och styrkor.
APA, Harvard, Vancouver, ISO, and other styles
13

Granlund, Oskar. "Unsupervised anomaly detection on log-based time series data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-265534.

Full text
Abstract:
Due to a constant increase in the number of connected devices and there is an increased demand for confidentiality, availability, and integrity on applications. This thesis was focused on unsupervised anomaly detection in data centers. It evaluates how suitable open source state-of-the-art solutions are at finding abnormal trends and patterns in log-based data streams. The methods used in this work are Principal component analysis (PCA), LogCluster, and Hierarchical temporal memory (HTM). They were evaluated using F-score on a real data set from an Apache access log. The data set was carefully chosen to represent a normal state in which close to no anomalous events occurred. Af- terward, 0.5% of the data points were transformed into anomalous data points, calculated based on the average frequency of log events matching a certain pattern. PCA showed the best performance with an F-score ranging from 0.4 - 0.56. The second best method was LogCluster but the HTM methods did not show adequate results. The result showed that PCA can find approximately 50% of the injected anomalies, this can be used to improve the confidentiality, integrity and availability of applications.
Eftersom antalet uppkopplade enheter ständigt har ökat och kravet på tillgänglighet, äkthet och integritet hos applikationer är höga så har den här uppsatsen fokuserat på oövervakad anomalidetektering i datacenter. Den utvärderar hur lämpliga öppna och moderna anomalidetekteringsmetoder är för att hitta avvikande mönster och trender på logbaserade dataströmmar. Metoderna använda i det här projektet är Principalkomponentanalys, LogCluster och Hierarkisk temporärt minne. De är utvärderade med F-score på en datamängd från en Apache-accesslogg tagen från en produktionsmiljö. Datan var utvald för att reprensentera ett normalt tillstånd där få eller inga onormala händelser förekom. 0.5% av datapunkterna transformerades till anomalier, baserat på den genomsnittliga förekomsten av varje logsekvens som matchar ett visst mönster. Principalkomponentanalys visade de bästa resultaten med ett F-score från 0.4 till 0.56. Näst bäst var LogCluster, de två metoderna baserade på hierarkiskt temporärt minne visade inte alls bra resultat. Resultaten visade att PCA kan hitta ca 50% av de injecerade anomalierna vilket kan användas för att förbättra konfidentialitet, tillgänglighet och integriteten hos applikationer.
APA, Harvard, Vancouver, ISO, and other styles
14

Leto, Kevin. "Anomaly detection in HPC systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text
Abstract:
Nell’ambito dei supercomputer, l’attività di anomaly detection rappresenta un’ottima strategia per mantenere alte le performance del sistema (disponibilità ed affidabilità), consentendo di prevenire i guasti e di adattare l’attività di manutenzione alla salute del sistema stesso. Il supercomputer esaminato da questa ricerca è chiamato MARCONI ed appartiene al CINECA, consorzio interuniversitario italiano con sede a Bologna. I dati estratti per l’analisi si riferiscono in particolar modo al nodo “r183c12s04”, ma per provare la generalità dell’approccio sono stati eseguiti ulteriori test anche su nodi differenti (seppur di minor portata). L’approccio seguito sfrutta le potenzialità del machine learning, combinando addestramento non supervisionato e supervisionato. Un autoencoder viene addestrato in modo non supervisionato per ottenere una rappresentazione compressa (dimensionality reduction) dei dati grezzi estratti da un nodo del sistema. I dati compressi vengono poi forniti ad una rete neurale di 3 livelli (input, hidden, output) per effettuare una classificazione supervised tra stati normali e stati anomali. Il nostro approccio si è rilevato molto promettente, raggiungendo livelli di accuracy, precision, recall e f1_score tutti superiori al 97% per il nodo principale. Mentre livelli più bassi, ma comunque molto positivi (mediamente superiori al 83%) sono stati riscontrati per gli altri nodi presi in considerazione. Le performance non perfette degli altri nodi sono sicuramente causate dal basso numero di esempi anomalie presenti nei dataset di riferimento.
APA, Harvard, Vancouver, ISO, and other styles
15

Beach, David J. "Anomaly Detection with Advanced Nonlinear Dimensionality Reduction." Digital WPI, 2020. https://digitalcommons.wpi.edu/etd-theses/1378.

Full text
Abstract:
Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward.
APA, Harvard, Vancouver, ISO, and other styles
16

Fröjdholm, Hampus. "Learning from 3D generated synthetic data for unsupervised anomaly detection." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-443243.

Full text
Abstract:
Modern machine learning methods, utilising neural networks, require a lot of training data. Data gathering and preparation has thus become a major bottleneck in the machine learning pipeline and researchers often use large public datasets to conduct their research (such as the ImageNet [1] or MNIST [2] datasets). As these methods begin being used in industry, these challenges become apparent. In factories objects being produced are often unique and may even involve trade secrets and patents that need to be protected. Additionally, manufacturing may not have started yet, making real data collection impossible. In both cases a public dataset is unlikely to be applicable. One possible solution, investigated in this thesis, is synthetic data generation. Synthetic data generation using physically based rendering was tested for unsupervised anomaly detection on a 3D printed block. A small image dataset was gathered of the block as control and a data generation model was created using its CAD model, a resource most often available in industrial settings. The data generation model used randomisation to reduce the domain shift between the real and synthetic data. For testing the data, autoencoder models were trained, both on the real and synthetic data separately and in combination. The material of the block, a white painted surface, proved challenging to reconstruct and no significant difference between the synthetic and real data could be observed. The model trained on real data outperformed the models trained on synthetic and the combined data. However, the synthetic data combined with the real data showed promise with reducing some of the bias intentionally introduced in the real dataset. Future research could focus on creating synthetic data for a problem where a good anomaly detection model already exists, with the goal of transferring some of the synthetic data generation model (such as the materials) to a new problem. This would be of interest in industries where they produce many different but similar objects and could reduce the time needed when starting a new machine learning project.
APA, Harvard, Vancouver, ISO, and other styles
17

Azmoudeh, Fard Simon. "Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55097.

Full text
Abstract:
The increasing popularity of networking devices at workplaces leads to an exponential increase in the frequency of network attacks. This leads to having protected networks being more and more important. Because of the increase in network activity workplaces have started to leave anomaly detection in the hands of artificial intelligence. However, the current methods of detecting anomalies can not accurately detect all of them. In this thesis, I propose a training method for autoencoders that shows how k-Means Clustering can be combined with an autoencoder for feature extraction with the use of differential evolution. The features extracted from this autoencoder is then used to classify the network activity of the KDD-99 dataset in order to be able to compare accuracies and false-positive rates with other anomaly detection methods. The results of this thesis show that it is possible to combine k-Means Clustering with autoencoders with the use of differential evolution. However, this proposed training method leads to a decrease in accuracy of classifiers. The classifiers reached around 19% accuracy when using extracted features from the autoencoder using my proposed training method as opposed to around 94% accuracy when using extracted features from an autoencoder that is not combined with k-Means Clustering. However, this research is only a preliminary research, and as such the results of this thesis should not be used for any real anomaly detection systems.
APA, Harvard, Vancouver, ISO, and other styles
18

Wu, Xinheng. "A Deep Unsupervised Anomaly Detection Model for Automated Tumor Segmentation." Thesis, The University of Sydney, 2020. https://hdl.handle.net/2123/22502.

Full text
Abstract:
Many researches have been investigated to provide the computer aided diagnosis (CAD) automated tumor segmentation in various medical images, e.g., magnetic resonance (MR), computed tomography (CT) and positron-emission tomography (PET). The recent advances in automated tumor segmentation have been achieved by supervised deep learning (DL) methods trained on large labelled data to cover tumor variations. However, there is a scarcity in such training data due to the cost of labeling process. Thus, with insufficient training data, supervised DL methods have difficulty in generating effective feature representations for tumor segmentation. This thesis aims to develop an unsupervised DL method to exploit large unlabeled data generated during clinical process. Our assumption is unsupervised anomaly detection (UAD) that, normal data have constrained anatomy and variations, while anomalies, i.e., tumors, usually differ from the normality with high diversity. We demonstrate our method for automated tumor segmentation on two different image modalities. Firstly, given that bilateral symmetry in normal human brains and unsymmetry in brain tumors, we propose a symmetric-driven deep UAD model using GAN model to model the normal symmetric variations thus segmenting tumors by their being unsymmetrical. We evaluated our method on two benchmarked datasets. Our results show that our method outperformed the state-of-the-art unsupervised brain tumor segmentation methods and achieved competitive performance to the supervised segmentation methods. Secondly, we propose a multi-modal deep UAD model for PET-CT tumor segmentation. We model a manifold of normal variations shared across normal CT and PET pairs; this manifold representing the normal pairing that can be used to segment the anomalies. We evaluated our method on two PET-CT datasets and the results show that we outperformed the state-of-the-art unsupervised methods, supervised methods and baseline fusion techniques.
APA, Harvard, Vancouver, ISO, and other styles
19

Larsson, Frans. "Algorithmic trading surveillance : Identifying deviating behavior with unsupervised anomaly detection." Thesis, Uppsala universitet, Matematiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-389941.

Full text
Abstract:
The financial markets are no longer what they used to be and one reason for this is the breakthrough of algorithmic trading. Although this has had several positive effects, there have been recorded incidents where algorithms have been involved. It is therefore of interest to find effective methods to monitor algorithmic trading. The purpose of this thesis was therefore to contribute to this research area by investigating if machine learning can be used for detecting deviating behavior. Since the real world data set used in this study lacked labels, an unsupervised anomaly detection approach was chosen. Two models, isolation forest and deep denoising autoencoder, were selected and evaluated. Because the data set lacked labels, artificial anomalies were injected into the data set to make evaluation of the models possible. These synthetic anomalies were generated by two different approaches, one based on a downsampling strategy and one based on manual construction and modification of real data. The evaluation of the anomaly detection models shows that both isolation forest and deep denoising autoencoder outperform a trivial baseline model, and have the ability to detect deviating behavior. Furthermore, it is shown that a deep denoising autoencoder outperforms isolation forest, with respect to both area under the receiver operating characteristics curve and area under the precision-recall curve. A deep denoising autoencoder is therefore recommended for the purpose of algorithmic trading surveillance.
APA, Harvard, Vancouver, ISO, and other styles
20

Haddad, Josef, and Carl Piehl. "Unsupervised anomaly detection in time series with recurrent neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259655.

Full text
Abstract:
Artificial neural networks (ANN) have been successfully applied to a wide range of problems. However, most of the ANN-based models do not attempt to model the brain in detail, but there are still some models that do. An example of a biologically constrained ANN is Hierarchical Temporal Memory (HTM). This study applies HTM and Long Short-Term Memory (LSTM) to anomaly detection problems in time series in order to compare their performance for this task. The shape of the anomalies are restricted to point anomalies and the time series are univariate. Pre-existing implementations that utilise these networks for unsupervised anomaly detection in time series are used in this study. We primarily use our own synthetic data sets in order to discover the networks’ robustness to noise and how they compare to each other regarding different characteristics in the time series. Our results shows that both networks can handle noisy time series and the difference in performance regarding noise robustness is not significant for the time series used in the study. LSTM outperforms HTM in detecting point anomalies on our synthetic time series with sine curve trend but a conclusion about the overall best performing network among these two remains inconclusive.
Artificiella neurala nätverk (ANN) har tillämpats på många problem. Däremot försöker inte de flesta ANN-modeller efterlikna hjärnan i detalj. Ett exempel på ett ANN som är begränsat till att efterlikna hjärnan är Hierarchical Temporal Memory (HTM). Denna studie tillämpar HTM och Long Short-Term Memory (LSTM) på avvikelsedetektionsproblem i tidsserier för att undersöka vilka styrkor och svagheter de har för detta problem. Avvikelserna i denna studie är begränsade till punktavvikelser och tidsserierna är i endast en variabel. Redan existerande implementationer som utnyttjar dessa nätverk för oövervakad avvikelsedetektionsproblem i tidsserier används i denna studie. Vi använder främst våra egna syntetiska tidsserier för att undersöka hur nätverken hanterar brus och hur de hanterar olika egenskaper som en tidsserie kan ha. Våra resultat visar att båda nätverken kan hantera brus och prestationsskillnaden rörande brusrobusthet var inte tillräckligt stor för att urskilja modellerna. LSTM presterade bättre än HTM på att upptäcka punktavvikelser i våra syntetiska tidsserier som följer en sinuskurva men en slutsats angående vilket nätverk som presterar bäst överlag är fortfarande oavgjord.
APA, Harvard, Vancouver, ISO, and other styles
21

Sreenivasulu, Ajay. "Evaluation of cluster based Anomaly detection." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18053.

Full text
Abstract:
Anomaly detection has been widely researched and used in various application domains such as network intrusion, military, and finance, etc. Anomalies can be defined as an unusual behavior that differs from the expected normal behavior. This thesis focuses on evaluating the performance of different clustering algorithms namely k-Means, DBSCAN, and OPTICS as an anomaly detector. The data is generated using the MixSim package available in R. The algorithms were tested on different cluster overlap and dimensions. Evaluation metrics such as Recall, precision, and F1 Score were used to analyze the performance of clustering algorithms. The results show that DBSCAN performed better than other algorithms when provided low dimensional data with different cluster overlap settings but it did not perform well when provided high dimensional data with different cluster overlap. For high dimensional data k-means performed better compared to DBSCAN and OPTICS with different cluster overlaps
APA, Harvard, Vancouver, ISO, and other styles
22

Fockstedt, Jonas, and Ema Krcic. "Unsupervised anomaly detection for structured data - Finding similarities between retail products." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-44756.

Full text
Abstract:
Data is one of the most contributing factors for modern business operations. Having bad data could therefore lead to tremendous losses, both financially and for customer experience. This thesis seeks to find anomalies in real-world, complex, structured data, causing an international enterprise to miss out on income and the potential loss of customers. By using graph theory and similarity analysis, the findings suggest that certain countries contribute to the discrepancies more than other countries. This is believed to be an effect of countries customizing their products to match the market’s needs. This thesis is just scratching the surface of the analysis of the data, and the number of opportunities for future work are therefore many.
APA, Harvard, Vancouver, ISO, and other styles
23

Renström, Martin, and Timothy Holmsten. "Fraud Detection on Unlabeled Data with Unsupervised Machine Learning." Thesis, KTH, Hälsoinformatik och logistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230592.

Full text
Abstract:
A common problem in systems handling user interaction was the risk for fraudulent behaviour. As an example, in a system with credit card transactions it could have been a person using a another user's account for purchases, or in a system with advertisment it could be bots clicking on ads. These malicious attacks were often disguised as normal interactions and could be difficult to detect. It was especially challenging when working with datasets that did not contain so called labels, which showed if the data point was fraudulent or not. This meant that there were no data that had previously been classified as fraud, which in turn made it difficult to develop an algorithm that could distinguish between normal and fraudulent behavior. In this thesis, the area of anomaly detection was explored with the intent of detecting fraudulent behavior without labeled data. Three neural network based prototypes were developed in this study. All three prototypes were some sort of variation of autoencoders. The first prototype which served as a baseline was a simple three layer autoencoder, the second prototype was a novel autoencoder which was called stacked autoencoder, the third prototype was a variational autoencoder. The prototypes were then trained and evaluated on two different datasets which both contained non fraudulent and fraudulent data. In this study it was found that the proposed stacked autoencoder architecture achieved better performance scores in recall, accuracy and NPV in the tests that were designed to simulate a real world scenario.
Ett vanligt problem med användares interaktioner i ett system var risken för bedrägeri. För ett system som hanterarade dataset med kreditkortstransaktioner så kunde ett exempel vara att en person använde en annans identitet för kortköp, eller i system som hanterade reklam så skulle det kunna ha varit en automatiserad mjukvara som simulerade interaktioner. Dessa attacker var ofta maskerade som normala interaktioner och kunde därmed vara svåra att upptäcka. Inom dataset som inte har korrekt märkt data så skulle det vara speciellt svårt att utveckla en algoritm som kan skilja på om interaktionen var avvikande eller inte. I denna avhandling så utforskas ämnet att upptäcka anomalier i dataset utan specifik data som tyder på att det var bedrägeri. Tre prototyper av neurala nätverk användes i denna studie som tränades och utvärderades på två dataset som innehöll både data som sade att det var bedrägeri och inte bedrägeri. Den första prototypen som fungerade som en bas var en simpel autoencoder med tre lager, den andra prototypen var en ny autoencoder som har fått namnet staplad autoencoder och den tredje prototypen var en variationell autoencoder. För denna studie så gav den föreslagna staplade autoencodern bäst resultat för återkallelse, noggrannhet och NPV i de test som var designade att efterlikna ett verkligt scenario.
APA, Harvard, Vancouver, ISO, and other styles
24

Merrill, Nicholas Swede. "Modified Kernel Principal Component Analysis and Autoencoder Approaches to Unsupervised Anomaly Detection." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/98659.

Full text
Abstract:
Unsupervised anomaly detection is the task of identifying examples that differ from the normal or expected pattern without the use of labeled training data. Our research addresses shortcomings in two existing anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE), and proposes novel solutions to improve both of their performances in the unsupervised settings. Anomaly detection has several useful applications, such as intrusion detection, fault monitoring, and vision processing. More specifically, anomaly detection can be used in autonomous driving to identify obscured signage or to monitor intersections. Kernel techniques are desirable because of their ability to model highly non-linear patterns, but they are limited in the unsupervised setting due to their sensitivity of parameter choices and the absence of a validation step. Additionally, conventionally KPCA suffers from a quadratic time and memory complexity in the construction of the gram matrix and a cubic time complexity in its eigendecomposition. The problem of tuning the Gaussian kernel parameter, $sigma$, is solved using the mini-batch stochastic gradient descent (SGD) optimization of a loss function that maximizes the dispersion of the kernel matrix entries. Secondly, the computational time is greatly reduced, while still maintaining high accuracy by using an ensemble of small, textit{skeleton} models and combining their scores. The performance of traditional machine learning approaches to anomaly detection plateaus as the volume and complexity of data increases. Deep anomaly detection (DAD) involves the applications of multilayer artificial neural networks to identify anomalous examples. AEs are fundamental to most DAD approaches. Conventional AEs rely on the assumption that a trained network will learn to reconstruct normal examples better than anomalous ones. In practice however, given sufficient capacity and training time, an AE will generalize to reconstruct even very rare examples. Three methods are introduced to more reliably train AEs for unsupervised anomaly detection: Cumulative Error Scoring (CES) leverages the entire history of training errors to minimize the importance of early stopping and Percentile Loss (PL) training aims to prevent anomalous examples from contributing to parameter updates. Lastly, early stopping via Knee detection aims to limit the risk of over training. Ultimately, the two new modified proposed methods of this research, Unsupervised Ensemble KPCA (UE-KPCA) and the modified training and scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets.
Master of Science
Anomaly detection is the task of identifying examples that differ from the normal or expected pattern. The challenge of unsupervised anomaly detection is distinguishing normal and anomalous data without the use of labeled examples to demonstrate their differences. This thesis addresses shortcomings in two anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE) and proposes new solutions to apply them in the unsupervised setting. Ultimately, the two modified methods, Unsupervised Ensemble KPCA (UE-KPCA) and the Modified Training and Scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
25

Tjhai, Gina C. "Anomaly-based correlation of IDS alarms." Thesis, University of Plymouth, 2011. http://hdl.handle.net/10026.1/308.

Full text
Abstract:
An Intrusion Detection System (IDS) is one of the major techniques for securing information systems and keeping pace with current and potential threats and vulnerabilities in computing systems. It is an indisputable fact that the art of detecting intrusions is still far from perfect, and IDSs tend to generate a large number of false IDS alarms. Hence human has to inevitably validate those alarms before any action can be taken. As IT infrastructure become larger and more complicated, the number of alarms that need to be reviewed can escalate rapidly, making this task very difficult to manage. The need for an automated correlation and reduction system is therefore very much evident. In addition, alarm correlation is valuable in providing the operators with a more condensed view of potential security issues within the network infrastructure. The thesis embraces a comprehensive evaluation of the problem of false alarms and a proposal for an automated alarm correlation system. A critical analysis of existing alarm correlation systems is presented along with a description of the need for an enhanced correlation system. The study concludes that whilst a large number of works had been carried out in improving correlation techniques, none of them were perfect. They either required an extensive level of domain knowledge from the human experts to effectively run the system or were unable to provide high level information of the false alerts for future tuning. The overall objective of the research has therefore been to establish an alarm correlation framework and system which enables the administrator to effectively group alerts from the same attack instance and subsequently reduce the volume of false alarms without the need of domain knowledge. The achievement of this aim has comprised the proposal of an attribute-based approach, which is used as a foundation to systematically develop an unsupervised-based two-stage correlation technique. From this formation, a novel SOM K-Means Alarm Reduction Tool (SMART) architecture has been modelled as the framework from which time and attribute-based aggregation technique is offered. The thesis describes the design and features of the proposed architecture, focusing upon the key components forming the underlying architecture, the alert attributes and the way they are processed and applied to correlate alerts. The architecture is strengthened by the development of a statistical tool, which offers a mean to perform results or alert analysis and comparison. The main concepts of the novel architecture are validated through the implementation of a prototype system. A series of experiments were conducted to assess the effectiveness of SMART in reducing false alarms. This aimed to prove the viability of implementing the system in a practical environment and that the study has provided appropriate contribution to knowledge in this field.
APA, Harvard, Vancouver, ISO, and other styles
26

Vidmark, Anton. "CONSTRUCTING AND VARYING DATA MODELS FOR UNSUPERVISED ANOMALY DETECTION ON LOG DATAData modelling and domain knowledge’s impact on anomaly detection and explainability." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163544.

Full text
Abstract:
As the complexity of today’s systems increases, manual system monitoring and log fi€le analysis are no longer applicable, giving an increasing need for automated anomaly detection systems. However, most current research in the domain, tend to focus only on the technical details of the frameworks and the evaluations of the algorithms, and how this impacts anomaly detection results. In contrast, this study emphasizes the details of how one can approach to understand and model the data, and how this impact anomaly detection performance.Given log data from an education platform application, data is analysed to conform a concept of what is normal, with regards to educational course section behaviour. Data is then modelled to capture the dimensions of a course section, and a detection model created, running a statically tuned K-Nearest neighbours algorithm as classi€er - to emphasize the impact of the modelling, not the algorithm.‘ The results showed that single point anomalies could successfully be detected. However, the results were hard to interpret due to lack of reason and explainability.‘ Thereby, this study presents a method of modifying a multidimensional data model to conform a detection model with increased explainability. ‘The original model is decomposed into smaller modules by utilizing explicit categorical domain knowledge of the available features. Each module will represent a more speci€c aspect of the whole model and results show a more explicit coverage of detected point anomalies and a higher degree of explainability of the detection output, in terms of increased interpretability as well as increased comprehensibility.
APA, Harvard, Vancouver, ISO, and other styles
27

Alaverdyan, Zaruhi. "Unsupervised representation learning for anomaly detection on neuroimaging. Application to epilepsy lesion detection on brain MRI." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI005/document.

Full text
Abstract:
Cette étude vise à développer un système d’aide au diagnostic (CAD) pour la détection de lésions épileptogènes, reposant sur l’analyse de données de neuroimagerie, notamment, l’IRM T1 et FLAIR. L’approche adoptée, introduite précédemment par Azami et al., 2016, consiste à placer la tâche de détection dans le cadre de la détection de changement à l'échelle du voxel, basée sur l’apprentissage d’un modèle one-class SVM pour chaque voxel dans le cerveau. L'objectif principal de ce travail est de développer des mécanismes d’apprentissage de représentations, qui capturent les informations les plus discriminantes à partir de l’imagerie multimodale. Les caractéristiques manuelles ne sont pas forcément les plus pertinentes pour la tâche visée. Notre première contribution porte sur l'intégration de différents réseaux profonds non-supervisés, pour extraire des caractéristiques dans le cadre du problème de détection de changement. Nous introduisons une nouvelle configuration des réseaux siamois, mieux adaptée à ce contexte. Le système CAD proposé a été évalué sur l’ensemble d’images IRM T1 des patients atteints d'épilepsie. Afin d'améliorer la performance obtenue, nous avons proposé d'étendre le système pour intégrer des données multimodales qui possèdent des informations complémentaires sur la pathologie. Notre deuxième contribution consiste donc à proposer des stratégies de combinaison des différentes modalités d’imagerie dans un système pour la détection de changement. Ce système multimodal a montré une amélioration importante sur la tâche de détection de lésions épileptogènes sur les IRM T1 et FLAIR. Notre dernière contribution se focalise sur l'intégration des données TEP dans le système proposé. Etant donné le nombre limité des images TEP, nous envisageons de synthétiser les données manquantes à partir des images IRM disponibles. Nous démontrons que le système entraîné sur les données réelles et synthétiques présente une amélioration importante par rapport au système entraîné sur les images réelles uniquement
This work represents one attempt to develop a computer aided diagnosis system for epilepsy lesion detection based on neuroimaging data, in particular T1-weighted and FLAIR MR sequences. Given the complexity of the task and the lack of a representative voxel-level labeled data set, the adopted approach, first introduced in Azami et al., 2016, consists in casting the lesion detection task as a per-voxel outlier detection problem. The system is based on training a one-class SVM model for each voxel in the brain on a set of healthy controls, so as to model the normality of the voxel. The main focus of this work is to design representation learning mechanisms, capturing the most discriminant information from multimodality imaging. Manual features, designed to mimic the characteristics of certain epilepsy lesions, such as focal cortical dysplasia (FCD), on neuroimaging data, are tailored to individual pathologies and cannot discriminate a large range of epilepsy lesions. Such features reflect the known characteristics of lesion appearance; however, they might not be the most optimal ones for the task at hand. Our first contribution consists in proposing various unsupervised neural architectures as potential feature extracting mechanisms and, eventually, introducing a novel configuration of siamese networks, to be plugged into the outlier detection context. The proposed system, evaluated on a set of T1-weighted MRIs of epilepsy patients, showed a promising performance but a room for improvement as well. To this end, we considered extending the CAD system so as to accommodate multimodality data which offers complementary information on the problem at hand. Our second contribution, therefore, consists in proposing strategies to combine representations of different imaging modalities into a single framework for anomaly detection. The extended system showed a significant improvement on the task of epilepsy lesion detection on T1-weighted and FLAIR MR images. Our last contribution focuses on the integration of PET data into the system. Given the small number of available PET images, we make an attempt to synthesize PET data from the corresponding MRI acquisitions. Eventually we show an improved performance of the system when trained on the mixture of synthesized and real images
APA, Harvard, Vancouver, ISO, and other styles
28

Lindroth, Henriksson Amelia. "Unsupervised Anomaly Detection on Time Series Data: An Implementation on Electricity Consumption Series." Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301731.

Full text
Abstract:
Digitization of the energy industry, introduction of smart grids and increasing regulation of electricity consumption metering have resulted in vast amounts of electricity data. This data presents a unique opportunity to understand the electricity usage and to make it more efficient, reducing electricity consumption and carbon emissions. An important initial step in analyzing the data is to identify anomalies. In this thesis the problem of anomaly detection in electricity consumption series is addressed using four machine learning methods: density based spatial clustering for applications with noise (DBSCAN), local outlier factor (LOF), isolation forest (iForest) and one-class support vector machine (OC-SVM). In order to evaluate the methods synthetic anomalies were introduced to the electricity consumption series and the methods were then evaluated for the two anomaly types point anomaly and collective anomaly. In addition to electricity consumption data, features describing the prior consumption, outdoor temperature and date-time properties were included in the models. Results indicate that the addition of the temperature feature and the lag features generally impaired anomaly detection performance, while the inclusion of date-time features improved it. Of the four methods, OC-SVM was found to perform the best at detecting point anomalies, while LOF performed the best at detecting collective anomalies. In an attempt to improve the models' detection power the electricity consumption series were de-trended and de-seasonalized and the same experiments were carried out. The models did not perform better on the decomposed series than on the non-decomposed.
Digitaliseringen av elbranschen, införandet av smarta nät samt ökad reglering av elmätning har resulterat i stora mängder eldata. Denna data skapar en unik möjlighet att analysera och förstå fastigheters elförbrukning för att kunna effektivisera den. Ett viktigt inledande steg i analysen av denna data är att identifiera möjliga anomalier. I denna uppsats testas fyra olika maskininlärningsmetoder för detektering av anomalier i elförbrukningsserier: densitetsbaserad spatiell klustring för applikationer med brus (DBSCAN), lokal avvikelse-faktor (LOF), isoleringsskog (iForest) och en-klass stödvektormaskin (OC-SVM). För att kunna utvärdera metoderna infördes syntetiska anomalier i elförbrukningsserierna och de fyra metoderna utvärderades därefter för de två anomalityperna punktanomali och gruppanomali. Utöver elförbrukningsdatan inkluderades även variabler som beskriver tidigare elförbrukning, utomhustemperatur och tidsegenskaper i modellerna. Resultaten tyder på att tillägget av temperaturvariabeln och lag-variablerna i allmänhet försämrade modellernas prestanda, medan införandet av tidsvariablerna förbättrade den. Av de fyra metoderna visade sig OC-SVM vara bäst på att detektera punktanomalier medan LOF var bäst på att detektera gruppanomalier. I ett försök att förbättra modellernas detekteringsförmåga utfördes samma experiment efter att elförbrukningsserierna trend- och säsongsrensats. Modellerna presterade inte bättre på de rensade serierna än på de icke-rensade.
APA, Harvard, Vancouver, ISO, and other styles
29

Jernbäcker, Carl. "Unsupervised real-time anomaly detection on streaming data for large-scale application deployments." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-262681.

Full text
Abstract:
Anomaly detection is the classification of data points that do not adhere to the familiar pattern; in previous studies there existed a need for extensive human interactions with either labelling or sorting normal and abnormal data from one another. In this thesis, we want to go one step further and apply machine learning techniques on time-series data in order to have a deeper understanding of the properties of a given data point without any sorting and labelling. In this thesis, a method is presented that can successfully find anomalies in both real and synthetic datasets. The method uses a combination of three algorithms from various disciplines, Hierarchical temporal memory and Restricted Boltzmann machines from machine learning and Autoregressive integrated moving average from regression. Each algorithm is specialised in finding a particular type of anomalies. The combination finds all anomalies with little to no gap from the occurrence of an anomaly to its detection.
Avvikelsedetektering är klassificeringen av datapunkter som inte följer det kända mönstret; tidigare studier krävde omfattande mänskliga interaktioner med antingen märkning eller sortering av normala och onormala data från varandra. I detta examensarbete vill vi gå ett steg längre och tillämpa maskininlärningsteknik på tidsseriedata för att få en djupare förståelse för egenskaperna hos en given datapunkt utan någon sortering och märkning. I detta examensarbete presenteras en metod som framgångsrikt kan hitta anomalier i både reella och syntetiska dataset. Metoden använder en kombination av tre algoritmer från olika discipliner, Hierarchical temporal memory och Restricted Boltzmann machines från maskininlärning och Autoregressive integrated moving average från regression. Varje algoritm är specialiserad på att hitta en viss typ av anomalier. Kombinationen finner alla anomalier med liten eller inget avstånd från förekomst av en anomali till dess detektion.
APA, Harvard, Vancouver, ISO, and other styles
30

Bracci, Lorenzo, and Amirhossein Namazi. "EVALUATION OF UNSUPERVISED MACHINE LEARNING MODELS FOR ANOMALY DETECTION IN TIME SERIES SENSOR DATA." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299734.

Full text
Abstract:
With the advancement of the internet of things and the digitization of societies sensor recording time series data can be found in an always increasing number of places including among other proximity sensors on cars, temperature sensors in manufacturing plants and motion sensors inside smart homes. This always increasing reliability of society on these devices lead to a need for detecting unusual behaviour which could be caused by malfunctioning of the sensor or by the detection of an uncommon event. The unusual behaviour mentioned is often referred to as an anomaly. In order to detect anomalous behaviours, advanced technologies combining mathematics and computer science, which are often referred to as under the umbrella of machine learning, are frequently used to solve these problems. In order to help machines to learn valuable patterns often human supervision is needed, which in this case would correspond to use recordings which a person has already classified as anomalies or normal points. It is unfortunately time consuming to label data, especially the large datasets that are created from sensor recordings. Therefore in this thesis techniques that require no supervision are evaluated to perform anomaly detection. Several different machine learning models are trained on different datasets in order to gain a better understanding concerning which techniques perform better when different requirements are important such as presence of a smaller dataset or stricter requirements on inference time. Out of the models evaluated, OCSVM resulted in the best overall performance, achieving an accuracy of 85% and K- means was the fastest model as it took 0.04 milliseconds to run inference on one sample. Furthermore LSTM based models showed most possible improvements with larger datasets.
Med utvecklingen av Sakernas internet och digitaliseringen av samhället kan man registrera tidsseriedata på allt fler platser, bland annat igenom närhetssensorer på bilar, temperatursensorer i tillverkningsanläggningar och rörelsesensorer i smarta hem. Detta ständigt ökande beroende i samhället av dessa enheter leder till ett behov av att upptäcka ovanligt beteende som kan orsakas av funktionsstörning i sensorn eller genom upptäckt av en ovanlig händelse. Det ovanliga beteendet som nämns kallas ofta för en anomali. För att upptäcka avvikande beteenden används avancerad teknik som kombinerar matematik och datavetenskap, som ofta kallas maskininlärning. För att hjälpa maskiner att lära sig värdefulla mönster behövs ofta mänsklig tillsyn, vilket i detta fall skulle motsvara användningsinspelningar som en person redan har klassificerat som avvikelser eller normala punkter. Tyvärr är det tidskrävande att märka data, särskilt de stora datamängder som skapas från sensorinspelningar. Därför utvärderas tekniker som inte kräver någon handledning i denna avhandling för att utföra anomalidetektering. Flera olika maskininlärningsmodeller utbildas på olika datamängder för att få en bättre förståelse för vilka tekniker som fungerar bättre när olika krav är viktiga, t.ex. närvaro av en mindre dataset eller strängare krav på inferens tid. Av de utvärderade modellerna resulterade OCSVM i bästa totala prestanda, uppnådde en noggrannhet på 85% och K- means var den snabbaste modellen eftersom det hade en inferens tid av 0,04 millisekunder. Dessutom visade LSTM- baserade modeller de bästa möjliga förbättringarna med större datamängder.
APA, Harvard, Vancouver, ISO, and other styles
31

Mathur, Nitin O. "Application of Autoencoder Ensembles in Anomaly and Intrusion Detection using Time-Based Analysis." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161374876195402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Hanna, Peter, and Erik Swartling. "Anomaly Detection in Time Series Data using Unsupervised Machine Learning Methods: A Clustering-Based Approach." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273630.

Full text
Abstract:
For many companies in the manufacturing industry, attempts to find damages in their products is a vital process, especially during the production phase. Since applying different machine learning techniques can further aid the process of damage identification, it becomes a popular choice among companies to make use of these methods to enhance the production process even further. For some industries, damage identification can be heavily linked with anomaly detection of different measurements. In this thesis, the aim is to construct unsupervised machine learning models to identify anomalies on unlabeled measurements of pumps using high frequency sampled current and voltage time series data. The measurement can be split up into five different phases, namely the startup phase, three duty point phases and lastly the shutdown phase. The approach is based on clustering methods, where the main algorithms of use are the density-based algorithms DBSCAN and LOF. Dimensionality reduction techniques, such as feature extraction and feature selection, are applied to the data and after constructing the five models of each phase, it can be seen that the models identifies anomalies in the data set given.​
För flera företag i tillverkningsindustrin är felsökningar av produkter en fundamental uppgift i produktionsprocessen. Då användningen av olika maskininlärningsmetoder visar sig innehålla användbara tekniker för att hitta fel i produkter är dessa metoder ett populärt val bland företag som ytterligare vill förbättra produktionprocessen. För vissa industrier är feldetektering starkt kopplat till anomalidetektering av olika mätningar. I detta examensarbete är syftet att konstruera oövervakad maskininlärningsmodeller för att identifiera anomalier i tidsseriedata. Mer specifikt består datan av högfrekvent mätdata av pumpar via ström och spänningsmätningar. Mätningarna består av fem olika faser, nämligen uppstartsfasen, tre last-faser och fasen för avstängning. Maskinilärningsmetoderna är baserade på olika klustertekniker, och de metoderna som användes är DBSCAN och LOF algoritmerna. Dessutom tillämpades olika dimensionsreduktionstekniker och efter att ha konstruerat 5 olika modeller, alltså en för varje fas, kan det konstateras att modellerna lyckats identifiera anomalier i det givna datasetet.
APA, Harvard, Vancouver, ISO, and other styles
33

Sivaramakrishnan, Jayaram. "Unsupervised probabilistic and kernel regression methods for anomaly detection and parameter margin prediction of industrial design." Thesis, Sivaramakrishnan, Jayaram (2021) Unsupervised probabilistic and kernel regression methods for anomaly detection and parameter margin prediction of industrial design. PhD thesis, Murdoch University, 2021. https://researchrepository.murdoch.edu.au/id/eprint/62536/.

Full text
Abstract:
One of the significant challenges facing industrial plant design is ensuring the integrity of massive design datasets generated during the project execution. This work is motivated by personal experience of data integrity issues during projects caused by insufficient automation affecting the quality of deliverables. Therefore, this project sought automated solutions for detecting anomalies in industrial design data in the form of outliers. Several novel methods are proposed, based on the Hidden Markov Model (HMM) and a modified General Regression Neural Network called the Margin-Based GRNN (MB-GRNN) along with optimisation techniques that minimise computation time. An HMM was used for validating process plant tag numbers using a self-learning approach. Results from experimental data show that HMM performance is equivalent to that of a custom-made design rule checking algorithm. The choice of components in industrial design involves setting specific design parameters that typically must reside inside permissible ranges called “design margins”. The MB-GRNN has the ability to estimate these permissible margins directly from design data and indicate potential design errors resulting from the invalid choice of design parameters by identifying data points outside of the estimated margins as outliers. The extremal permissible margin boundaries are determined by “stretching out” the upper and lower GRNN surfaces using an iterative application of stretch factors (a second kernel weighting factor). The method creates a variable insensitive band surrounding the data cloud, interlinked with the normal regression function, providing upper and lower margin boundaries. These boundaries can then be used to determine outliers and to predict a range of permissible values of design parameters during design. This method was compared to Parzen-Windows and another proximity-based method. The MB-GRNN also benefits from a modified algorithm for estimating the smoothing parameter using a combination of clustering, k-nearest neighbour, and localised covariance matrix. The computation time for this difficult task is minimised using a new derivative-based method that was tested successfully using a range of root-finding functions. The efficacy of the MB-GRNN and associated optimisation techniques were verified using three multivariate design datasets. The experimentation shows that the regression-based outlier classification approach used in this project complements the existing Parzen density-based method. These methods used in combination are intended for implementation as a decision support system for checking the quality of industrial design data to help minimise design and implementation costs. It is expected that the unsupervised techniques presented in this research work will benefit from the ever-increasing automation of industrial design processes.
APA, Harvard, Vancouver, ISO, and other styles
34

Minarini, Francesco. "Anomaly detection prototype for log-based predictive maintenance at INFN-CNAF tier-1." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19304/.

Full text
Abstract:
Splitting the evolution of HEP from the one of computational resources needed to perform analyses is, nowadays, not possible. Each year, in fact, LHC produces dozens of PetaBytes of data (e.g. collision data, particle simulation, metadata etc.) that need orchestrated computing resources for storage, computational power and high throughput networks to connect centers. As a consequence of the LHC upgrade, the Luminosity of the experiment will increase by a factor of 10 over its originally designed value, entailing a non negligible technical challenge at computing centers: it is expected, in fact, an uprising in the amount of data produced and processed by the experiment. With this in mind, the HEP Software Foundation took action and released a road-map document describing the actions needed to prepare the computational infrastructure to support the upgrade. As a part of this collective effort, involving all computing centres of the Grid, INFN-CNAF has set a preliminary study towards the development of AI driven maintenance paradigm. As a contribution to this preparatory study, this master thesis presents an original software prototype that has been developed to handle the task of identifying critical activity time windows of a specific service (StoRM). Moreover, the prototype explores the viability of a content extraction via Text Processing techniques, applying such strategies to messages belonging to anomalous time windows.
APA, Harvard, Vancouver, ISO, and other styles
35

Labonne, Maxime. "Anomaly-based network intrusion detection using machine learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAS011.

Full text
Abstract:
Ces dernières années, le piratage est devenu une industrie à part entière, augmentant le nombre et la diversité des cyberattaques. Les menaces qui pèsent sur les réseaux informatiques vont des logiciels malveillants aux attaques par déni de service, en passant par le phishing et l'ingénierie sociale. Un plan de cybersécurité efficace ne peut plus reposer uniquement sur des antivirus et des pare-feux pour contrer ces menaces : il doit inclure plusieurs niveaux de défense. Les systèmes de détection d'intrusion (IDS) réseaux sont un moyen complémentaire de renforcer la sécurité, avec la possibilité de surveiller les paquets de la couche 2 (liaison) à la couche 7 (application) du modèle OSI. Les techniques de détection d'intrusion sont traditionnellement divisées en deux catégories : la détection par signatures et la détection par anomalies. La plupart des IDS utilisés aujourd'hui reposent sur la détection par signatures ; ils ne peuvent cependant détecter que des attaques connues. Les IDS utilisant la détection par anomalies sont capables de détecter des attaques inconnues, mais sont malheureusement moins précis, ce qui génère un grand nombre de fausses alertes. Dans ce contexte, la création d'IDS précis par anomalies est d'un intérêt majeur pour pouvoir identifier des attaques encore inconnues.Dans cette thèse, les modèles d'apprentissage automatique sont étudiés pour créer des IDS qui peuvent être déployés dans de véritables réseaux informatiques. Tout d'abord, une méthode d'optimisation en trois étapes est proposée pour améliorer la qualité de la détection : 1/ augmentation des données pour rééquilibrer les jeux de données, 2/ optimisation des paramètres pour améliorer les performances du modèle et 3/ apprentissage ensembliste pour combiner les résultats des meilleurs modèles. Les flux détectés comme des attaques peuvent être analysés pour générer des signatures afin d'alimenter les bases de données d'IDS basées par signatures. Toutefois, cette méthode présente l'inconvénient d'exiger des jeux de données étiquetés, qui sont rarement disponibles dans des situations réelles. L'apprentissage par transfert est donc étudié afin d'entraîner des modèles d'apprentissage automatique sur de grands ensembles de données étiquetés, puis de les affiner sur le trafic normal du réseau à surveiller. Cette méthode présente également des défauts puisque les modèles apprennent à partir d'attaques déjà connues, et n'effectuent donc pas réellement de détection d'anomalies. C'est pourquoi une nouvelle solution basée sur l'apprentissage non supervisé est proposée. Elle utilise l'analyse de l'en-tête des protocoles réseau pour modéliser le comportement normal du trafic. Les anomalies détectées sont ensuite regroupées en attaques ou ignorées lorsqu'elles sont isolées. Enfin, la détection la congestion réseau est étudiée. Le taux d'utilisation de la bande passante entre les différents liens est prédit afin de corriger les problèmes avant qu'ils ne se produisent
In recent years, hacking has become an industry unto itself, increasing the number and diversity of cyber attacks. Threats on computer networks range from malware to denial of service attacks, phishing and social engineering. An effective cyber security plan can no longer rely solely on antiviruses and firewalls to counter these threats: it must include several layers of defence. Network-based Intrusion Detection Systems (IDSs) are a complementary means of enhancing security, with the ability to monitor packets from OSI layer 2 (Data link) to layer 7 (Application). Intrusion detection techniques are traditionally divided into two categories: signatured-based (or misuse) detection and anomaly detection. Most IDSs in use today rely on signature-based detection; however, they can only detect known attacks. IDSs using anomaly detection are able to detect unknown attacks, but are unfortunately less accurate, which generates a large number of false alarms. In this context, the creation of precise anomaly-based IDS is of great value in order to be able to identify attacks that are still unknown.In this thesis, machine learning models are studied to create IDSs that can be deployed in real computer networks. Firstly, a three-step optimization method is proposed to improve the quality of detection: 1/ data augmentation to rebalance the dataset, 2/ parameters optimization to improve the model performance and 3/ ensemble learning to combine the results of the best models. Flows detected as attacks can be analyzed to generate signatures to feed signature-based IDS databases. However, this method has the disadvantage of requiring labelled datasets, which are rarely available in real-life situations. Transfer learning is therefore studied in order to train machine learning models on large labeled datasets, then finetune them on benign traffic of the network to be monitored. This method also has flaws since the models learn from already known attacks, and therefore do not actually perform anomaly detection. Thus, a new solution based on unsupervised learning is proposed. It uses network protocol header analysis to model normal traffic behavior. Anomalies detected are then aggregated into attacks or ignored when isolated. Finally, the detection of network congestion is studied. The bandwidth utilization between different links is predicted in order to correct issues before they occur
APA, Harvard, Vancouver, ISO, and other styles
36

Pierrau, Magnus. "Evaluating Unsupervised Methods for Out-of-Distribution Detection on Semantically Similar Image Data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302583.

Full text
Abstract:
Out-of-distribution detection considers methods used to detect data that deviates from the underlying data distribution used to train some machine learning model. This is an important topic, as artificial neural networks have previously been shown to be capable of producing arbitrarily confident predictions, even for anomalous samples that deviate from the training distribution. Previous work has developed many reportedly effective methods for out-of-distribution detection, but these are often evaluated on data that is semantically different from the training data, and therefore does not necessarily reflect the true performance that these methods would show in more challenging conditions. In this work, six unsupervised out-of- distribution detection methods are evaluated and compared under more challenging conditions, in the context of classification of semantically similar image data using deep neural networks. It is found that the performance of all methods vary significantly across the tested datasets, and that no one method is consistently superior. Encouraging results are found for a method using ensembles of deep neural networks, but overall, the observed performance for all methods is considerably lower than in many related works, where easier tasks are used to evaluate the performance of these methods.
Begreppet “out-of-distribution detection” (OOD-detektion) avser metoder vilka används för att upptäcka data som avviker från den underliggande datafördelningen som använts för att träna en maskininlärningsmodell. Detta är ett viktigt ämne, då artificiella neuronnät tidigare har visat sig benägna att generera godtyckligt säkra förutsägelser, även på data som avviker från den underliggande träningsfördelningen. Tidigare arbeten har producerat många välpresterande OOD-detektionsmetoder, men dessa har ofta utvärderats på data som är semantiskt olikt träningsdata, och reflekterar därför inte nödvändigtvis metodernas förmåga under mer utmanande förutsättningar. I detta arbete utvärderas och jämförs sex oövervakade OOD-detektionsmetoder under utmanande förhållanden, i form av klassificering av semantiskt liknande bilddata med hjälp av djupa neuronnät. Arbetet visar att resultaten för samtliga metoder varierar markant mellan olika data och att ingen enskild modell är konsekvent överlägsen de andra. Arbetet finner lovande resultat för en metod som utnyttjar djupa neuronnätsensembler, men överlag så presterar samtliga modeller sämre än vad tidigare arbeten rapporterat, där mindre utmanande data har nyttjats för att utvärdera metoderna.
APA, Harvard, Vancouver, ISO, and other styles
37

Kommineni, Sri Sai Manoj, and Akhila Dindi. "Automating Log Analysis." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21175.

Full text
Abstract:
Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers.  Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs. Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment. Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms. Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score. Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured.
APA, Harvard, Vancouver, ISO, and other styles
38

ABUKMEIL, MOHANAD. "UNSUPERVISED GENERATIVE MODELS FOR DATA ANALYSIS AND EXPLAINABLE ARTIFICIAL INTELLIGENCE." Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/889159.

Full text
Abstract:
For more than a century, the methods of learning representation and the exploration of the intrinsic structures of data have developed remarkably and currently include supervised, semi-supervised, and unsupervised methods. However, recent years have witnessed the flourishing of big data, where typical dataset dimensions are high, and the data can come in messy, missing, incomplete, unlabeled, or corrupted forms. Consequently, discovering and learning the hidden structure buried inside such data becomes highly challenging. From this perspective, latent data analysis and dimensionality reduction play a substantial role in decomposing the exploratory factors and learning the hidden structures of data, which encompasses the significant features that characterize the categories and trends among data samples in an ordered manner. That is by extracting patterns, differentiating trends, and testing hypotheses to identify anomalies, learning compact knowledge, and performing many different machine learning (ML) tasks such as classification, detection, and prediction. Unsupervised generative learning (UGL) methods are a class of ML characterized by their possibility of analyzing and decomposing latent data, reducing dimensionality, visualizing the manifold of data, and learning representations with limited levels of predefined labels and prior assumptions. Furthermore, explainable artificial intelligence (XAI) is an emerging field of ML that deals with explaining the decisions and behaviors of learned models. XAI is also associated with UGL models to explain the hidden structure of data, and to explain the learned representations of ML models. However, the current UGL models lack large-scale generalizability and explainability in the testing stage, which leads to restricting their potential in ML and XAI applications. To overcome the aforementioned limitations, this thesis proposes innovative methods that integrate UGL and XAI to enable data factorization and dimensionality reduction to improve the generalizability of the learned ML models. Moreover, the proposed methods enable visual explainability in modern applications as anomaly detection and autonomous driving systems. The main research contributions are listed as follows: • A novel overview of UGL models including blind source separation (BSS), manifold learning (MfL), and neural networks (NNs). Also, the overview considers open issues and challenges among each UGL method. • An innovative method to identify the dimensions of the compact feature space via a generalized rank in the application of image dimensionality reduction. • An innovative method to hierarchically reduce and visualize the manifold of data to improve the generalizability in limited data learning scenarios, and computational complexity reduction applications. • An original method to visually explain autoencoders by reconstructing an attention map in the application of anomaly detection and explainable autonomous driving systems. The novel methods introduced in this thesis are benchmarked on publicly available datasets, and they outperformed the state-of-the-art methods considering different evaluation metrics. Furthermore, superior results were obtained with respect to the state-of-the-art to confirm the feasibility of the proposed methodologies concerning the computational complexity, availability of learning data, model explainability, and high data reconstruction accuracy.
APA, Harvard, Vancouver, ISO, and other styles
39

Avdic, Adnan, and Albin Ekholm. "Anomaly Detection in an e-Transaction System using Data Driven Machine Learning Models : An unsupervised learning approach in time-series data." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18421.

Full text
Abstract:
Background: Detecting anomalies in time-series data is a task that can be done with the help of data driven machine learning models. This thesis will investigate if, and how well, different machine learning models, with an unsupervised approach,can detect anomalies in the e-Transaction system Ericsson Wallet Platform. The anomalies in our domain context is delays on the system. Objectives: The objectives of this thesis work is to compare four different machine learning models ,in order to find the most relevant model. The best performing models are decided by the evaluation metric F1-score. An intersection of the best models are also being evaluated in order to decrease the number of False positives in order to make the model more precise. Methods: Investigating a relevant time-series data sample with 10-minutes interval data points from the Ericsson Wallet Platform was used. A number of steps were taken such as, handling data, pre-processing, normalization, training and evaluation.Two relevant features was trained separately as one-dimensional data sets. The two features that are relevant when finding delays in the system which was used in this thesis is the Mean wait (ms) and the feature Mean * N were the N is equal to the Number of calls to the system. The evaluation metrics that was used are True positives, True Negatives, False positives, False Negatives, Accuracy, Precision, Recall, F1-score and Jaccard index. The Jaccard index is a metric which will reveal how similar each algorithm are at their detection. Since the detection are binary, it’s classifying the each data point in the time-series data. Results: The results reveals the two best performing models regards to the F1-score.The intersection evaluation reveals if and how well a combination of the two best performing models can reduce the number of False positives. Conclusions: The conclusion to this work is that some algorithms perform better than others. It is a proof of concept that such classification algorithms can separate normal from non-normal behavior in the domain of the Ericsson Wallet Platform.
APA, Harvard, Vancouver, ISO, and other styles
40

Baur, Christoph [Verfasser], Nassir [Akademischer Betreuer] Navab, Nassir [Gutachter] Navab, and Ben [Gutachter] Glocker. "Anomaly Detection in Brain MRI: From Supervised to Unsupervised Deep Learning / Christoph Baur ; Gutachter: Nassir Navab, Ben Glocker ; Betreuer: Nassir Navab." München : Universitätsbibliothek der TU München, 2021. http://d-nb.info/1236343115/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Manovi, Livia. "Machine Learning Unsupervised Methods in the Design of an On-board Health Monitoring System for Satellite Applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
The dissertation starts by providing a description of the phenomena related to the increasing importance recently acquired by satellite applications. The spread of such technology comes with implications, such as an increase in maintenance cost, from which derives the interest in developing advanced techniques that favor an augmented autonomy of spacecrafts in health monitoring. Machine learning techniques are widely employed to lay a foundation for effective systems specialized in fault detection by examining telemetry data. Telemetry consists of a considerable amount of information; therefore, the adopted algorithms must be able to handle multivariate data while facing the limitations imposed by on-board hardware features. In the framework of outlier detection, the dissertation addresses the topic of unsupervised machine learning methods. In the unsupervised scenario, lack of prior knowledge of the data behavior is assumed. In the specific, two models are brought to attention, namely Local Outlier Factor and One-Class Support Vector Machines. Their performances are compared in terms of both the achieved prediction accuracy and the equivalent computational cost. Both models are trained and tested upon the same sets of time series data in a variety of settings, finalized at gaining insights on the effect of the increase in dimensionality. The obtained results allow to claim that both models, combined with a proper tuning of their characteristic parameters, successfully comply with the role of outlier detectors in multivariate time series data. Nevertheless, under this specific context, Local Outlier Factor results to be outperforming One-Class SVM, in that it proves to be more stable over a wider range of input parameter values. This property is especially valuable in unsupervised learning since it suggests that the model is keen to adapting to unforeseen patterns.
APA, Harvard, Vancouver, ISO, and other styles
42

Dalvi, Aditi. "Performance of One-class Support Vector Machine (SVM) in Detection of Anomalies in the Bridge Data." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin150478019017791.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Ait, Saada Mira. "Unsupervised learning from textual data with neural text representations." Electronic Thesis or Diss., Université Paris Cité, 2023. http://www.theses.fr/2023UNIP7122.

Full text
Abstract:
L'ère du numérique génère des quantités énormes de données non structurées telles que des images et des documents, nécessitant des méthodes de traitement spécifiques pour en tirer de la valeur. Les données textuelles présentent une difficulté supplémentaire car elles ne contiennent pas de valeurs numériques. Les plongements de mots sont des techniques permettant de transformer automatiquement du texte en données numériques, qui permettent aux algorithmes d'apprentissage automatique de les traiter. Les tâches non-supervisées sont un enjeu majeur dans l'industrie car elles permettent de créer de la valeur à partir de grandes quantités de données sans nécessiter une labellisation manuelle coûteuse. Cette thèse explore l'utilisation des modèles Transformeurs pour les tâches non-supervisées telles que la classification automatique, la détection d'anomalies et la visualisation de données. Elle propose également des méthodologies pour exploiter au mieux les modèles Transformeurs multicouches dans un contexte non-supervisé pour améliorer la qualité et la robustesse du clustering de documents tout en s'affranchissant du choix de la couche à utiliser et du nombre de classes. En outre, la thèse examine les méthodes de transfert d'apprentissage pour améliorer la qualité des modèles Transformeurs pré-entraînés sur une autre tâche en les utilisant pour la tâche de clustering. Par ailleurs, nous investiguons plus profondément dans cette thèse les modèles de langage "Transformers" et leur application au clustering en examinant en particulier les méthodes de transfert d'apprentissage qui consistent à réapprendre des modèles pré-entraînés sur une tâche différente afin d'améliorer leur qualité pour de futures tâches. Nous démontrons par une étude empirique que les méthodes de post-traitement basées sur la réduction de dimension sont plus avantageuses que les stratégies de réapprentissage proposées dans la littérature pour le clustering. Enfin, nous proposons un nouveau cadre de détection d'anomalies textuelles en français adapté à deux cas : celui où les données concernent une thématique précise et celui où les données ont plusieurs sous-thématiques. Dans les deux cas, nous obtenons des résultats supérieurs à l'état de l'art avec un temps de calcul nettement inférieur
The digital era generates enormous amounts of unstructured data such as images and documents, requiring specific processing methods to extract value from them. Textual data presents an additional challenge as it does not contain numerical values. Word embeddings are techniques that transform text into numerical data, enabling machine learning algorithms to process them. Unsupervised tasks are a major challenge in the industry as they allow value creation from large amounts of data without requiring costly manual labeling. In thesis we explore the use of Transformer models for unsupervised tasks such as clustering, anomaly detection, and data visualization. We also propose methodologies to better exploit multi-layer Transformer models in an unsupervised context to improve the quality and robustness of document clustering while avoiding the choice of which layer to use and the number of classes. Additionally, we investigate more deeply Transformer language models and their application to clustering, examining in particular transfer learning methods that involve fine-tuning pre-trained models on a different task to improve their quality for future tasks. We demonstrate through an empirical study that post-processing methods based on dimensionality reduction are more advantageous than fine-tuning strategies proposed in the literature. Finally, we propose a framework for detecting text anomalies in French adapted to two cases: one where the data concerns a specific topic and the other where the data has multiple sub-topics. In both cases, we obtain superior results to the state of the art with significantly lower computation time
APA, Harvard, Vancouver, ISO, and other styles
44

Formato, Lorenzo. "IDENTIFICAZIONE DI GUASTI TRAMITE ALGORITMI DI CLASSIFICAZIONE & CLUSTERING per applicazioni di Manutenzione Predittiva in Scenari di Industria 4.0." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23028/.

Full text
Abstract:
L'elaborato della presente tesi tratta l'applicazione dei modelli di Machine Learning all'interno di un banco di test per assali elettrici; una soluzione dedicata al mondo dell'Industria 4.0. Il progetto di tesi prevede l'utilizzo di modelli di Classificazione (Logistic Regression, SVM: Support Vector Machine, Naive Bayes, Decision Tree e Random Forest) e di Clustering (K-Means e Agglomerative) per l'identificazione dei comportamenti normali e attesi durante la fase di test. L'obiettivo finale della trattazione è dunque quello di riuscire ad ottenere un modello capace di identificare situazioni di guasto dai dati generati dal banco di test. L'elaborato è diviso in 4 capitoli: "Stato dell'Arte", "Progettazione", "Implementazione" e "Conclusioni e sviluppi futuri".
APA, Harvard, Vancouver, ISO, and other styles
45

Hamid, Muhammad Raffay. "A computational framework for unsupervised analysis of everyday human activities." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24765.

Full text
Abstract:
Thesis (Ph.D.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Rehg
APA, Harvard, Vancouver, ISO, and other styles
46

Boniol, Paul. "Detection of anomalies and identification of their precursors in large data series collections." Electronic Thesis or Diss., Université Paris Cité, 2021. http://www.theses.fr/2021UNIP5206.

Full text
Abstract:
Les larges collections de séries temporelles deviennent une réalité dans un grand nombre de domaines scientifiques et sociaux, comme la finance, les sciences de l’environnement, l’astrophysique, les neurosciences, l’ingénierie ou les métiers du numérique. Il y a donc un intérêt et un besoin de plus en plus importants de développer des techniques efficaces pour analyser et traiter ce type de données. De manière informelle, une série temporelle est une séquence ordonnée de points ou de valeurs. Une fois les séries collectées et disponibles, les utilisateurs ont souvent besoin de les étudier pour en extraire de la valeur et de la connaissance. Ces analyses peuvent être simples, comme sélectionner des fenêtres temporelles, mais aussi complexes, comme rechercher des similarités entre des séries ou détecter des anomalies, souvent synonymes d’évolutions soudaines et inhabituelles possiblement non souhaitées, voire de dysfonctionnements du système étudié. Ce dernier type d’analyse représente un enjeu crucial pour des applications dans un large éventail de domaines partageant tous le même objectif : détecter les anomalies le plus rapidement possible pour éviter la survenue de tout événement critique, comme par exemple de prévenir les dégradations et donc d’allonger la durée de vie des systèmes. Par conséquent, dans ce travail de thèse, nous traitons les trois objectifs suivants : (i) l’exploration non-supervisée de séries temporelles pour la détection rétrospective d’anomalies à partir d’une collection de séries temporelles. (ii) la détection non-supervisée d’anomalies en temps réel dans les séries temporelles. (iii) l’explication de la classification d’anomalies connues dans les séries temporelles, afin d’identifier de possibles précurseurs. Dans ce manuscrit, nous introduisons d’abord le contexte industriel qui a motivé la thèse, des définitions fondamentales, une taxonomie des séries temporelles et un état de l’art des méthodes de détection d’anomalies. Nous présentons ensuite nos contributions scientifiques en suivant les trois axes mentionnés précédemment. Ainsi, nous décrivons premièrement deux solutions originales, NormA (basée sur une méthode de clustering de sous-séquences de la série temporelle à analyser) et Series2Graph (qui s’appuie sur une transformation de la séries temporelle en un réseau orienté), pour la tâche de détection non supervisée de sous-séquences anormales dans les séries temporelles statiques (i.e., n’évoluant pas dans le temps). Nous présentons dans un deuxième temps la méthode SAND (inspiré du fonctionnement de NormA) développée pour répondre à la tâche de détection non-supervisée de sous-séquences anormales dans les séries temporelles évoluant de manière continue dans le temps. Dans une troisième phase, nous abordons le problème lié à l’identification supervisée des précurseurs. Nous subdivisons cette tâche en deux problèmes génériques : la classification supervisée de séries temporelles d’une part, l’explication des résultats de cette classification par l’identification de sous-séquences discriminantes d’autre part. Enfin, nous illustrons l’applicabilité et l’intérêt de nos développements au travers d’une application portant sur l’identification de précurseurs de vibrations indésirables survenant sur des pompes d’alimentation en eau dans les centrales nucléaires françaises d’EDF
Extensive collections of data series are becoming a reality in a large number of scientific and social domains. There is, therefore, a growing interest and need to elaborate efficient techniques to analyze and process these data, such as in finance, environmental sciences, astrophysics, neurosciences, engineering. Informally, a data series is an ordered sequence of points or values. Once these series are collected and available, users often need to query them. These queries can be simple, such as the selection of time interval, but also complex, such as the similarities search or the detection of anomalies, often synonymous with malfunctioning of the system under study, or sudden and unusual evolution likely undesired. This last type of analysis represents a crucial problem for applications in a wide range of domains, all sharing the same objective: to detect anomalies as soon as possible to avoid critical events. Therefore, in this thesis, we address the following three objectives: (i) retrospective unsupervised subsequence anomaly detection in data series. (ii) unsupervised detection of anomalies in data streams. (iii) classification explanation of known anomalies in data series in order to identify possible precursors. This manuscript first presents the industrial context that motivated this thesis, fundamental definitions, a taxonomy of data series, and state-of-the-art anomaly detection methods. We then present our contributions along the three axes mentioned above. First, we describe two original solutions, NormA (that aims to build a weighted set of subsequences that represent the different behaviors of the data series) and Series2Graph (that transform the data series in a directed graph), for the task of unsupervised detection of anomalous subsequences in static data series. Secondly, we present the SAND (inspired from NormA) method for unsupervised detection of anomalous subsequences in data streams. Thirdly, we address the problem of the supervised identification of precursors. We subdivide this task into two generic problems: the supervised classification of time series and the explanation of this classification’s results by identifying discriminative subsequences. Finally, we illustrate the applicability and interest of our developments through an application concerning the identification of undesirable vibration precursors occurring in water supply pumps in the French nuclear power plants of EDF
APA, Harvard, Vancouver, ISO, and other styles
47

OLIVEIRA, Paulo César de. "Abordagem semi-supervisionada para detecção de módulos de software defeituosos." Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/19990.

Full text
Abstract:
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-07-24T12:11:04Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Dissertação Mestrado Paulo César de Oliveira.pdf: 2358509 bytes, checksum: 36436ca63e0a8098c05718bbee92d36e (MD5)
Made available in DSpace on 2017-07-24T12:11:04Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Dissertação Mestrado Paulo César de Oliveira.pdf: 2358509 bytes, checksum: 36436ca63e0a8098c05718bbee92d36e (MD5) Previous issue date: 2015-08-31
Com a competitividade cada vez maior do mercado, aplicações de alto nível de qualidade são exigidas para a automação de um serviço. Para garantir qualidade de um software, testá-lo visando encontrar falhas antecipadamente é essencial no ciclo de vida de desenvolvimento. O objetivo do teste de software é encontrar falhas que poderão ser corrigidas e consequentemente, aumentar a qualidade do software em desenvolvimento. À medida que o software cresce, uma quantidade maior de testes é necessária para prevenir ou encontrar defeitos, visando o aumento da qualidade. Porém, quanto mais testes são criados e executados, mais recursos humanos e de infraestrutura são necessários. Além disso, o tempo para realizar as atividades de teste geralmente não é suficiente, fazendo com que os defeitos possam escapar. Cada vez mais as empresas buscam maneiras mais baratas e efetivas para detectar defeitos em software. Muitos pesquisadores têm buscado nos últimos anos, mecanismos para prever automaticamente defeitos em software. Técnicas de aprendizagem de máquina vêm sendo alvo das pesquisas, como uma forma de encontrar defeitos em módulos de software. Tem-se utilizado muitas abordagens supervisionadas para este fim, porém, rotular módulos de software como defeituosos ou não para fins de treinamento de um classificador é uma atividade muito custosa e que pode inviabilizar a utilização de aprendizagem de máquina. Neste contexto, este trabalho propõe analisar e comparar abordagens não supervisionadas e semisupervisionadas para detectar módulos de software defeituosos. Para isto, foram utilizados métodos não supervisionados (de detecção de anomalias) e também métodos semi-supervisionados, tendo como base os classificadores AutoMLP e Naive Bayes. Para avaliar e comparar tais métodos, foram utilizadas bases de dados da NASA disponíveis no PROMISE Software Engineering Repository.
Because the increase of market competition then high level of quality applications are required to provide automate services. In order to achieve software quality testing is essential in the development lifecycle with the purpose of finding defect as earlier as possible. The testing purpose is not only to find failures that can be fixed, but improve software correctness and quality. Once software gets more complex, a greater number of tests will be necessary to prevent or find defects. Therefore, the more tests are designed and exercised, the more human and infrastructure resources are needed. However, time to run the testing activities are not enough, thus, as a result, it causes escape defects. Companies are constantly trying to find cheaper and effective ways to software defect detection in earlier stages. In the past years, many researchers are trying to finding mechanisms to automatically predict these software defects. Machine learning techniques are being a research target, as a way of finding software modules detection. Many supervised approaches are being used with this purpose, but labeling software modules as defective or not defective to be used in training phase is very expensive and it can make difficult machine learning use. Considering that this work aims to analyze and compare unsupervised and semi-supervised approaches to software module defect detection. To do so, unsupervised methods (of anomaly detection) and semi-supervised methods using AutoMLP and Naive Bayes algorithms were used. To evaluate and compare these approaches, NASA datasets were used at PROMISE Software Engineering Repository.
APA, Harvard, Vancouver, ISO, and other styles
48

Boussik, Amine. "Apprentissage profond non-supervisé : Application à la détection de situations anormales dans l’environnement du train autonome." Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2023. http://www.theses.fr/2023UPHF0040.

Full text
Abstract:
La thèse aborde les défis du monitoring de l’environnement et de détection des anomalies, notamment des obstacles, pour un train de fret autonome. Bien que traditionnellement, les transports ferroviaires étaient sous la supervision humaine, les trains autonomes offrent des perspectives d’avantages en termes de coûts, de temps et de sécurité. Néanmoins, leur exploitation dans des environnements complexes pose d’importants enjeux de sûreté. Au lieu d’une approche supervisée nécessitant des données annotées onéreuses et limitées, cette recherche adopte une technique non supervisée, utilisant des données non étiquetées pour détecter les anomalies en s’appuyant sur des techniques capables d’identifier les comportements atypiques.Deux modèles de surveillance environnementale sont présentés : le premier, basé sur un autoencodeur convolutionnel (CAE), est dédié à l’identification d’obstacles sur la voie principale; le second, une version avancée incorporant le transformeur de vision (ViT), se concentre sur la surveillance générale de l’environnement. Tous deux exploitent des techniques d’apprentissage non supervisé pour la détection d’anomalies.Les résultats montrent que la méthode mise en avant apporte des éléments pertinents pour le monitoring de l’environnement du train de fret autonome, ayant un potentiel pour renforcer sa fiabilité et sécurité. L’utilisation de techniques non supervisées démontre ainsi l’utilité et la pertinence de leur adoption dans un contexte d’application pour le train autonome
The thesis addresses the challenges of monitoring the environment and detecting anomalies, especially obstacles, for an autonomous freight train. Although traditionally, rail transport was under human supervision, autonomous trains offer potential advantages in terms of costs, time, and safety. However, their operation in complex environments poses significant safety concerns. Instead of a supervised approach that requires costly and limited annotated data, this research adopts an unsupervised technique, using unlabeled data to detect anomalies based on methods capable of identifying atypical behaviors.Two environmental surveillance models are presented : the first, based on a convolutional autoencoder (CAE), is dedicated to identifying obstacles on the main track; the second, an advanced version incorporating the vision transformer (ViT), focuses on overall environmental surveillance. Both employ unsupervised learning techniques for anomaly detection.The results show that the highlighted method offers relevant insights for monitoring the environment of the autonomous freight train, holding potential to enhance its reliability and safety. The use of unsupervised techniques thus showcases the utility and relevance of their adoption in an application context for the autonomous train
APA, Harvard, Vancouver, ISO, and other styles
49

Cherdo, Yann. "Détection d'anomalie non supervisée sur les séries temporelle à faible coût énergétique utilisant les SNNs." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4018.

Full text
Abstract:
Dans le cadre de la maintenance prédictive du constructeur automobile Renault, cette thèse vise à fournir des solutions à faible coût énergétique pour la détection non supervisée d'anomalies sur des séries temporelles. Avec l'évolution récente de l'automobile, de plus en plus de données sont produites et doivent être traitées par des algorithmes d'apprentissage automatique. Ce traitement peut être effectué dans le cloud ou directement à bord de la voiture. Dans un tel cas, la bande passante du réseau, les coûts des services cloud, la gestion de la confidentialité des données et la perte de données peuvent être économisés. L'intégration d'un modèle d'apprentissage automatique dans une voiture est un défi car elle nécessite des modèles frugaux en raison des contraintes de mémoire et de calcul. Dans ce but, nous étudions l'utilisation de réseaux de neurones impulsionnels (SNN) pour la detection d'anomalies, la prédiction et la classification sur des séries temporelles. Les performances et les coûts énergétiques des modèles d'apprentissage automatique sont évalués dans un scénario Edge à l'aide de modèles matériels génériques qui prennent en compte tous les coûts de calcul et de mémoire. Pour exploiter autant que possible l'activité neuronale parcimonieuse des SNN, nous proposons un modèle avec des connexions peu denses et entraînables qui consomme la moitié de l'énergie de sa version dense. Ce modèle est évalué sur des benchmarks publics de détection d'anomalies, un cas d'utilisation réel de détection d'anomalies sur les voitures de Renault Alpine, les prévisions météorologiques et le dataset Google Speech Command. Nous comparons également ses performances avec d'autres modèles d'apprentissage automatique existants. Nous concluons que, pour certains cas d'utilisation, les modèles SNN peuvent atteindre les performances de l'état de l'art tout en consommant 2 à 8 fois moins d'énergie. Pourtant, d'autres études devraient être entreprises pour évaluer ces modèles une fois embarqués dans une voiture. Inspirés par les neurosciences, nous soutenons que d'autres propriétés bio-inspirées telles que l'attention, l'activité parcimonieuse, la hiérarchie ou la dynamique des assemblées de neurons pourraient être exploités pour obtenir une meilleure efficacité énergétique et de meilleures performances avec des modèles SNN. Enfin, nous terminons cette thèse par un essai à la croisée des neurosciences cognitives, de la philosophie et de l'intelligence artificielle. En plongeant dans les difficultés conceptuelles liées à la conscience et en considérant les mécanismes déterministes de la mémoire, nous soutenons que la conscience et le soi pourraient être constitutivement indépendants de la mémoire. L'objectif de cet essai est de questionner la nature de l'humain par opposition à celle des machines et de l'IA
In the context of the predictive maintenance of the car manufacturer Renault, this thesis aims at providing low-power solutions for unsupervised anomaly detection on time-series. With the recent evolution of cars, more and more data are produced and need to be processed by machine learning algorithms. This processing can be performed in the cloud or directly at the edge inside the car. In such a case, network bandwidth, cloud services costs, data privacy management and data loss can be saved. Embedding a machine learning model inside a car is challenging as it requires frugal models due to memory and processing constraints. To this aim, we study the usage of spiking neural networks (SNNs) for anomaly detection, prediction and classification on time-series. SNNs models' performance and energy costs are evaluated in an edge scenario using generic hardware models that consider all calculation and memory costs. To leverage as much as possible the sparsity of SNNs, we propose a model with trainable sparse connections that consumes half the energy compared to its non-sparse version. This model is evaluated on anomaly detection public benchmarks, a real use-case of anomaly detection from Renault Alpine cars, weather forecasts and the google speech command dataset. We also compare its performance with other existing SNN and non-spiking models. We conclude that, for some use-cases, spiking models can provide state-of-the-art performance while consuming 2 to 8 times less energy. Yet, further studies should be undertaken to evaluate these models once embedded in a car. Inspired by neuroscience, we argue that other bio-inspired properties such as attention, sparsity, hierarchy or neural assemblies dynamics could be exploited to even get better energy efficiency and performance with spiking models. Finally, we end this thesis with an essay dealing with cognitive neuroscience, philosophy and artificial intelligence. Diving into conceptual difficulties linked to consciousness and considering the deterministic mechanisms of memory, we argue that consciousness and the self could be constitutively independent from memory. The aim of this essay is to question the nature of humans by contrast with the ones of machines and AI
APA, Harvard, Vancouver, ISO, and other styles
50

Lu, Wei. "Unsupervised anomaly detection framework for multiple-connection based network intrusions." Thesis, 2005. http://hdl.handle.net/1828/1949.

Full text
Abstract:
In this dissertation, we propose an effective and efficient online unsupervised anomaly detection framework. The framework consists of new anomalousness metrics, named IP Weight, and a new hybrid clustering algorithm, named I-means. IP Weight metrics provide measures of anomalousness of IP packet flows on networks. A simple classification of network intrusions consists of distinguishing between single-connection based attacks and multiple-connection based attacks. The IP weight metrics proposed in this work characterize specifically multiple-connection based attacks. The definition of specific metrics for single-connection based attacks is left for future work. The I-means algorithm combines mixture resolving, a genetic algorithm automatically estimating the optimal number of clusters for a set of data, and the k-means algorithm for clustering. Three sets of experiments are conducted to evaluate our new unsupervised anomaly detection framework. The first experiment empirically validates that IP Weight metrics reduce dimensions of feature space characterizing IP packets at a level comparable with the principal component analysis technique. The second experiment is an offline evaluation based on 1998 DARPA intrusion detection dataset. In the offline evaluation, we compare our framework with three other unsupervised anomaly detection approaches, namely, plain k-means clustering, univariate outlier detection and multivariate outlier detection. Evaluation results show that the detection framework based on I-means yields the highest detection rate with a low false alarm rate. Specifically, it detects 18 types of attacks out of a total of 19 multiple-connection based attack types. The third experiment is an online evaluation in a live networking environment. The evaluation result not only confirms the detection effectiveness observed with the DARPA dataset, but also shows a good runtime efficiency, with response times falling within few seconds ranges.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography