Дисертації: "Security of machine learning classifiers"

1

Lubenko, Ivans. "Towards robust steganalysis : binary classifiers and large, heterogeneous data." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:c1ae44b8-94da-438d-b318-f038ad6aac57.

Повний текст джерела

Анотація:

The security of a steganography system is defined by our ability to detect it. It is of no surprise then that steganography and steganalysis both depend heavily on the accuracy and robustness of our detectors. This is especially true when real-world data is considered, due to its heterogeneity. The difficulty of such data manifests itself in a penalty that has periodically been reported to affect the performance of detectors built on binary classifiers; this is known as cover source mismatch. It remains unclear how the performance drop that is associated with cover source mismatch is mitigated or even measured. In this thesis we aim to show a robust methodology to empirically measure its effects on the detection accuracy of steganalysis classifiers. Some basic machine-learning based methods, which take their origin in domain adaptation, are proposed to counter it. Specifically, we test two hypotheses through an empirical investigation. First, that linear classifiers are more robust than non-linear classifiers to cover source mismatch in real-world data and, second, that linear classifiers are so robust that given sufficiently large mismatched training data they can equal the performance of any classifier trained on small matched data. With the help of theory we draw several nontrivial conclusions based on our results. The penalty from cover source mismatch may, in fact, be a combination of two types of error; estimation error and adaptation error. We show that relatedness between training and test data, as well as the choice of classifier, both have an impact on adaptation error, which, as we argue, ultimately defines a detector's robustness. This provides a novel framework for reasoning about what is required to improve the robustness of steganalysis detectors. Whilst our empirical results may be viewed as the first step towards this goal, we show that our approach provides clear advantages over earlier methods. To our knowledge this is the first study of this scale and structure.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Nowroozi, Ehsan. "Machine Learning Techniques for Image Forensics in Adversarial Setting." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1096177.

Повний текст джерела

Анотація:

The use of machine-learning for multimedia forensics is gaining more and more consensus, especially due to the amazing possibilities offered by modern machine learning techniques. By exploiting deep learning tools, new approaches have been proposed whose performance remarkably exceed those achieved by state-of-the-art methods based on standard machine-learning and model-based techniques. However, the inherent vulnerability and fragility of machine learning architectures pose new serious security threats, hindering the use of these tools in security-oriented applications, and, among them, multimedia forensics. The analysis of the security of machine learning-based techniques in the presence of an adversary attempting to impede the forensic analysis, and the development of new solutions capable to improve the security of such techniques is then of primary importance, and, recently, has marked the birth of a new discipline, named Adversarial Machine Learning. By focusing on Image Forensics and image manipulation detection in particular, this thesis contributes to the above mission by developing novel techniques for enhancing the security of binary manipulation detectors based on machine learning in several adversarial scenarios. The validity of the proposed solutions has been assessed by considering several manipulation tasks, ranging from the detection of double compression and contrast adjustment, to the detection of geometric transformations and ltering operations.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Singh, Gurpreet. "Statistical Modeling of Dynamic Risk in Security Systems." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273599.

Повний текст джерела

Анотація:

Big data has been used regularly in finance and business to build forecasting models. It is, however, a relatively new concept in the security industry. This study predicts technology related alarm codes that will sound in the coming 7 days at location $L$ by observing the past 7 days. Logistic regression and neural networks are applied to solve this problem. Due to the problem being of a multi-labeled nature logistic regression is applied in combination with binary relevance and classifier chains. The models are trained on data that has been labeled with two separate methods, the first method labels the data by only observing location $L$. The second considers $L$ and $L$'s surroundings. As the problem is multi-labeled the labels are likely to be unbalanced, thus a resampling technique, SMOTE, and random over-sampling is applied to increase the frequency of the minority labels. Recall, precision, and F1-score are calculated to evaluate the models. The results show that the second labeling method performs better for all models and that the classifier chains and binary relevance model performed similarly. Resampling the data with the SMOTE technique increases the macro average F1-scores for the binary relevance and classifier chains models, however, the neural networks performance decreases. The SMOTE resampling technique also performs better than random over-sampling. The neural networks model outperforms the other two models on all methods and achieves the highest F1-score.
Big data har använts regelbundet inom ekonomi för att bygga prognosmodeller, det är dock ett relativt nytt koncept inom säkerhetsbranschen. Denna studie förutsäger vilka larmkoder som kommer att låta under de kommande 7 dagarna på plats $L$ genom att observera de senaste 7 dagarna. Logistisk regression och neurala nätverk används för att lösa detta problem. Eftersom att problemet är av en multi-label natur tillämpas logistisk regression i kombination med binary relevance och classifier chains. Modellerna tränas på data som har annoterats med två separata metoder. Den första metoden annoterar datan genom att endast observera plats $L$ och den andra metoden betraktar $L$ och $L$:s omgivning. Eftersom problemet är multi-labeled kommer annoteringen sannolikt att vara obalanserad och därför används resamplings metoden, SMOTE, och random over-sampling för att öka frekvensen av minority labels. Recall, precision och F1-score mättes för att utvärdera modellerna. Resultaten visar att den andra annoterings metoden presterade bättre för alla modeller och att classifier chains och binary relevance presterade likartat. Binary relevance och classifier chains modellerna som tränades på datan som använts sig av resamplings metoden SMOTE gav ett högre macro average F1-score, dock sjönk prestationen för neurala nätverk. Resamplings metoden SMOTE presterade även bättre än random over-sampling. Neurala nätverksmodellen överträffade de andra två modellerna på alla metoder och uppnådde högsta F1-score.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Sayin, Günel Burcu. "Towards Reliable Hybrid Human-Machine Classifiers." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/349843.

Повний текст джерела

Анотація:

In this thesis, we focus on building reliable hybrid human-machine classifiers to be deployed in cost-sensitive classification tasks. The objective is to assess ML quality in hybrid classification contexts and design the appropriate metrics, thereby knowing whether we can trust the model predictions and identifying the subset of items on which the model is well-calibrated and trustworthy. We start by discussing the key concepts, research questions, challenges, and architecture to design and implement an effective hybrid classification service. We then present a deeper investigation of each service component along with our solutions and results. We mainly contribute to cost-sensitive hybrid classification, selective classification, model calibration, and active learning. We highlight the importance of model calibration in hybrid classification services and propose novel approaches to improve the calibration of human-machine classifiers. In addition, we argue that the current accuracy-based metrics are misaligned with the actual value of machine learning models and propose a novel metric ``value". We further test the performance of SOTA machine learning models in NLP tasks with a cost-sensitive hybrid classification context. We show that the performance of the SOTA models in cost-sensitive tasks significantly drops when we evaluate them according to value rather than accuracy. Finally, we investigate the quality of hybrid classifiers in the active learning scenarios. We review the existing active learning strategies, evaluate their effectiveness, and propose a novel value-aware active learning strategy to improve the performance of selective classifiers in the active learning of cost-sensitive tasks.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

McClintick, Kyle W. "Training Data Generation Framework For Machine-Learning Based Classifiers." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1276.

Повний текст джерела

Анотація:

In this thesis, we propose a new framework for the generation of training data for machine learning techniques used for classification in communications applications. Machine learning-based signal classifiers do not generalize well when training data does not describe the underlying probability distribution of real signals. The simplest way to accomplish statistical similarity between training and testing data is to synthesize training data passed through a permutation of plausible forms of noise. To accomplish this, a framework is proposed that implements arbitrary channel conditions and baseband signals. A dataset generated using the framework is considered, and is shown to be appropriately sized by having $11\%$ lower entropy than state-of-the-art datasets. Furthermore, unsupervised domain adaptation can allow for powerful generalized training via deep feature transforms on unlabeled evaluation-time signals. A novel Deep Reconstruction-Classification Network (DRCN) application is introduced, which attempts to maintain near-peak signal classification accuracy despite dataset bias, or perturbations on testing data unforeseen in training. Together, feature transforms and diverse training data generated from the proposed framework, teaching a range of plausible noise, can train a deep neural net to classify signals well in many real-world scenarios despite unforeseen perturbations.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Dang, Robin, and Anders Nilsson. "Evaluation of Machine Learning classifiers for Breast Cancer Classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280349.

Повний текст джерела

Анотація:

Breast cancer is a common and fatal disease among women globally, where early detection is vital to improve the prognosis of patients. In today’s digital society, computers and complex algorithms can evaluate and diagnose diseases more efficiently and with greater certainty than experienced doctors. Several studies have been conducted to automate medical imaging techniques, by utilizing machine learning techniques, to predict and detect breast cancer. In this report, the suitability of using machine learning to classify whether breast cancer is of benign or malignant characteristic is evaluated. More specifically, five different machine learning methods are examined and compared. Furthermore, we investigate how the efficiency of the methods, with regards to classification accuracy and execution time, is affected by the preprocessing method Principal component analysis and the ensemble method Bootstrap aggregating. In theory, both methods should favor certain machine learning methods and consequently increase the classification accuracy. The study is based on a well-known breast cancer dataset from Wisconsin which is used to train the algorithms. The result was evaluated by applying statistical methods concerning the classification accuracy, sensitivity and execution time. Consequently, the results are then compared between the different classifiers. The study showed that the use of neither Principal component analysis nor Bootstrap aggregating resulted in any significant improvements in classification accuracy. However, the results showed that the support vector machines classifiers were the better performer. As the survey was limited in terms of the amount of datasets and the choice of different evaluation methods with associating adjustments, it is uncertain whether the obtained result can be generalized over other datasets or populations.
Bröstcancer är en vanlig och dödlig sjukdom bland kvinnor globalt där en tidig upptäckt är avgörande för att förbättra prognosen för patienter. I dagens digitala samhälle kan datorer och komplexa algoritmer utvärdera och diagnostisera sjukdomar mer effektivt och med större säkerhet än erfarna läkare. Flera studier har genomförts för att automatisera tekniker med medicinska avbildningsmetoder, genom maskininlärnings tekniker, för att förutsäga och upptäcka bröstcancer. I den här rapport utvärderas och jämförs lämpligheten hos fem olika maskininlärningsmetoder att klassificera huruvida bröstcancer är av god- eller elakartad karaktär. Vidare undersöks hur metodernas effektivitet, med avseende på klassificeringssäkerhet samt exekveringstid, påverkas av förbehandlingsmetoden Principal component analysis samt ensemble metoden Bootstrap aggregating. I teorin skall båda förbehandlingsmetoder gynna vissa maskininlärningsmetoder och således öka klassificeringssäkerheten. Undersökningen är baserat på ett välkänt bröstcancer dataset från Wisconsin som används till att träna algoritmerna. Resultaten är evaluerade genom applicering av statistiska metoder där träffsäkerhet, känslighet och exekveringstid tagits till hänsyn. Följaktligen jämförs resultaten mellan de olika klassificerarna. Undersökningen visade att användningen av varken Principal component analysis eller Bootstrap aggregating resulterade i några nämnvärda förbättringar med avseende på klassificeringssäkerhet. Dock visade resultaten att klassificerarna Support vector machines Linear och RBF presterade bäst. I och med att undersökningen var begränsad med avseende på antalet dataset samt val av olika evalueringsmetoder med medförande justeringar är det därför osäkert huruvida det erhållna resultatet kan generaliseras över andra dataset och populationer.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Rigaki, Maria. "Adversarial Deep Learning Against Intrusion Detection Classifiers." Thesis, Luleå tekniska universitet, Datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-64577.

Повний текст джерела

Анотація:

Traditional approaches in network intrusion detection follow a signature-based ap- proach, however the use of anomaly detection approaches based on machine learning techniques have been studied heavily for the past twenty years. The continuous change in the way attacks are appearing, the volume of attacks, as well as the improvements in the big data analytics space, make machine learning approaches more alluring than ever. The intention of this thesis is to show that using machine learning in the intrusion detection domain should be accompanied with an evaluation of its robustness against adversaries. Several adversarial techniques have emerged lately from the deep learning research, largely in the area of image classification. These techniques are based on the idea of introducing small changes in the original input data in order to make a machine learning model to misclassify it. This thesis follows a big data Analytics methodol- ogy and explores adversarial machine learning techniques that have emerged from the deep learning domain, against machine learning classifiers used for network intrusion detection. The study looks at several well known classifiers and studies their performance under attack over several metrics, such as accuracy, F1-score and receiver operating character- istic. The approach used assumes no knowledge of the original classifier and examines both general and targeted misclassification. The results show that using relatively sim- ple methods for generating adversarial samples it is possible to lower the detection accuracy of intrusion detection classifiers from 5% to 28%. Performance degradation is achieved using a methodology that is simpler than previous approaches and it re- quires only 6.25% change between the original and the adversarial sample, making it a candidate for a practical adversarial approach.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Ford, John M. "Pulsar Search Using Supervised Machine Learning." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1001.

Повний текст джерела

Анотація:

Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Burago, Igor. "Automated Attacks on Compression-Based Classifiers." Thesis, University of Oregon, 2014. http://hdl.handle.net/1794/18439.

Повний текст джерела

Анотація:

Methods of compression-based text classification have proven their usefulness for various applications. However, in some classification problems, such as spam filtering, a classifier confronts one or many adversaries willing to induce errors in the classifier's judgment on certain kinds of input. In this thesis, we consider the problem of finding thrifty strategies for character-based text modification that allow an adversary to revert classifier's verdict on a given family of input texts. We propose three statistical statements of the problem that can be used by an attacker to obtain transformation models which are optimal in some sense. Evaluating these three techniques on a realistic spam corpus, we find that an adversary can transform a spam message (detectable as such by an entropy-based text classifier) into a legitimate one by generating and appending, in some cases, as few additional characters as 20% of the original length of the message.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Ishii, Shotaro, and David Ljunggren. "A Comparative Analysis of Robustness to Noise in Machine Learning Classifiers." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302532.

Повний текст джерела

Анотація:

Data that stems from real measurements often to some degree contain distortions. Such distortions are generally referred to as noise in machine learning terminology, and can lead to decreased classification accuracy and poor prediction results. In this study, three machine learning classifiers were compared by their performance and robustness in the presence of noise. More specifically, random forests, support vector machines and artificial neural networks were trained and compared on four different data sets with varying levels of noise artificially added to them. In summary, the random forest classifier performed the best and was the most robust classifier at eight out of ten of noise levels, closely followed by the artificial neural network classifier. At the two remaining noise levels, the support vector machine classifier with a linear kernel performed the best and was the most robust classifier.
Data som härstammar från verkliga mätningar innehåller ofta förvrängningar i viss utsträckning. Sådana förvrängningar kan i vissa fall leda till försämrad klassificeringsnoggrannhet. I den här studien jämförs tre klassificeringsalgoritmer med avseende på hur pass robusta de är när den data de presenteras innehåller syntetiska förvrängningar. Mer specifikt så tränades och jämfördes slumpskogar, stödvektormaskiner och artificiella neuronnät på fyra olika mängder data med varierande nivåer av syntetiska förvrängningar. Sammanfattningsvis så presterade slumpskogen bäst, och var den mest robusta klassificeringsalgoritmen på åtta av tio förvrängningsnivåer, tätt följt av det artificiella neuronnätet. På de två återstående förvrängningsnivåerna presterade stödvektormaskinen med linjär kärna bäst och var den mest robusta klassificeringsalgoritmen.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Stomeo, Carlo. "Applying Machine Learning to Cyber Security." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/17303/.

Повний текст джерела

Анотація:

Intrusion Detection Systems (IDS) nowadays are a very important part of a system. In the last years many methods have been proposed to implement this kind of security measure against cyber attacks, including Machine Learning and Data Mining based. In this work we discuss in details the family of anomaly based IDSs, which are able to detect never seen attacks, paying particular attention to adherence to the FAIR principles. This principles include the Accessibility and the Reusability of software. Moreover, as the purpose of this work is the assessment of what is going on in the state of the art we have selected three approaches, according to their reproducibility and we have compared their performances with a common experimental setting. Lastly real world use case has been analyzed, resulting in the proposal of an usupervised ML model for pre-processing and analyzing web server logs. The proposed solution uses clustering and outlier detection techniques to detect attacks in an unsupervised way.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Jan, Steve T. K. "Robustifying Machine Learning based Security Applications." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99862.

Повний текст джерела

Анотація:

In recent years, machine learning (ML) has been explored and employed in many fields. However, there are growing concerns about the robustness of machine learning models. These concerns are further amplified in security-critical applications — attackers can manipulate the inputs (i.e., adversarial examples) to cause machine learning models to make a mistake, and it's very challenging to obtain a large amount of attackers' data. These make applying machine learning in security-critical applications difficult. In this dissertation, we present several approaches to robustifying three machine learning based security applications. First, we start from adversarial examples in image recognition. We develop a method to generate robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. We further show these robust adversarial examples can improve the robustness of machine learning models by adversarial retraining. The second application is bot detection. We show that the performance of existing machine learning models is not effective if we only have the limit attackers' data. We develop a data synthesis method to address this problem. The key novelty is that our method is distribution aware synthesis, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We show the detection performance using 1% of attackers' data is close to existing methods trained with 100% of the attackers' data. The third component of this dissertation is phishing detection. By designing a novel measurement system, we search and detect phishing websites that adopt evasion techniques not only at the page content level but also at the web domain level. The key novelty is that our system is built on the observation of the evasive behaviors of phishing pages in practice. We also study how existing browsers defenses against phishing websites that impersonate trusted entities at the web domain. Our results show existing browsers are not yet effective to detect them.
Doctor of Philosophy
Machine learning (ML) is computer algorithms that aim to identify hidden patterns from the data. In recent years, machine learning has been widely used in many fields. The range of them is broad, from natural language to autonomous driving. However, there are growing concerns about the robustness of machine learning models. And these concerns are further amplified in security-critical applications — Attackers can manipulate their inputs (i.e., adversarial examples) to cause machine learning models to predict wrong, and it's highly expensive and difficult to obtain a huge amount of attackers' data because attackers are rare compared to the normal users. These make applying machine learning in security-critical applications concerning. In this dissertation, we seek to build better defenses in three types of machine learning based security applications. The first one is image recognition, by developing a method to generate realistic adversarial examples, the machine learning models are more robust for defending against adversarial examples by adversarial retraining. The second one is bot detection, we develop a data synthesis method to detect malicious bots when we only have the limit malicious bots data. For phishing websites, we implement a tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

VISA, SOFIA. "FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1182226868.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Perez, Daniel Antonio. "Performance comparison of support vector machine and relevance vector machine classifiers for functional MRI data." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34858.

Повний текст джерела

Анотація:

Multivariate pattern analysis (MVPA) of fMRI data has been growing in popularity due to its sensitivity to networks of brain activation. It is performed in a predictive modeling framework which is natural for implementing brain state prediction and real-time fMRI applications such as brain computer interfaces. Support vector machines (SVM) have been particularly popular for MVPA owing to their high prediction accuracy even with noisy datasets. Recent work has proposed the use of relevance vector machines (RVM) as an alternative to SVM. RVMs are particularly attractive in time sensitive applications such as real-time fMRI since they tend to perform classification faster than SVMs. Despite the use of both methods in fMRI research, little has been done to compare the performance of these two techniques. This study compares RVM to SVM in terms of time and accuracy to determine which is better suited to real-time applications.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Yan, Jie Lu. "Development and validation of deep learning classifiers for antimicrobial peptide prediction." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3881886.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Lavesson, Niklas. "Evaluation and Analysis of Supervised Learning Algorithms and Classifiers." Licentiate thesis, Karlskrona : Blekinge Institute of Technology, 2006. http://www.bth.se/fou/Forskinfo.nsf/allfirst2/c655a0b1f9f88d16c125714c00355e5d?OpenDocument.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Wang, Zhuang. "Budgeted Online Kernel Classifiers for Large Scale Learning." Diss., Temple University Libraries, 2010. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/89554.

Повний текст джерела

Анотація:

Computer and Information Science
Ph.D.
In the environment where new large scale problems are emerging in various disciplines and pervasive computing applications are becoming more common, there is an urgent need for machine learning algorithms that could process increasing amounts of data using comparatively smaller computing resources in a computational efficient way. Previous research has resulted in many successful learning algorithms that scale linearly or even sub-linearly with sample size and dimension, both in runtime and in space. However, linear or even sub-linear space scaling is often not sufficient, because it implies an unbounded growth in memory with sample size. This clearly opens another challenge: how to learn from large, or practically infinite, data sets or data streams using memory limited resources. Online learning is an important learning scenario in which a potentially unlimited sequence of training examples is presented one example at a time and can only be seen in a single pass. This is opposed to offline learning where the whole collection of training examples is at hand. The objective is to learn an accurate prediction model from the training stream. Upon on repetitively receiving fresh example from stream, typically, online learning algorithms attempt to update the existing model without retraining. The invention of the Support Vector Machines (SVM) attracted a lot of interest in adapting the kernel methods for both offline and online learning. Typical online learning for kernel classifiers consists of observing a stream of training examples and their inclusion as prototypes when specified conditions are met. However, such procedure could result in an unbounded growth in the number of prototypes. In addition to the danger of the exceeding the physical memory, this also implies an unlimited growth in both update and prediction time. To address this issue, in my dissertation I propose a series of kernel-based budgeted online algorithms, which have constant space and constant update and prediction time. This is achieved by maintaining a fixed number of prototypes under the memory budget. Most of the previous works on budgeted online algorithms focus on kernel perceptron. In the first part of the thesis, I review and discuss these existing algorithms and then propose a kernel perceptron algorithm which removes the prototype with the minimal impact on classification accuracy to maintain the budget. This is achieved by dual use of cached prototypes for both model presentation and validation. In the second part, I propose a family of budgeted online algorithms based on the Passive-Aggressive (PA) style. The budget maintenance is achieved by introducing an additional constraint into the original PA optimization problem. A closed-form solution was derived for the budget maintenance and model update. In the third part, I propose a budgeted online SVM algorithm. The proposed algorithm guarantees that the optimal SVM solution is maintained on all the prototype examples at any time. To maximize the accuracy, prototypes are constructed to approximate the data distribution near the decision boundary. In the fourth part, I propose a family of budgeted online algorithms for multi-class classification. The proposed algorithms are the recently proposed SVM training algorithm Pegasos. I prove that the gap between the budgeted Pegasos and the optimal SVM solution directly depends on the average model degradation due to budget maintenance. Following the analysis, I studied greedy multi-class budget maintenance methods based on removal, projection and merging of SVs. In each of these four parts, the proposed algorithms were experimentally evaluated against the state-of-art competitors. The results show that the proposed budgeted online algorithms outperform the competitive algorithm and achieve accuracy comparable to non-budget counterparts while being extremely computationally efficient.
Temple University--Theses

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Ayres, Dorothy Lucille. "Promises and Pitfalls of Machine Learning Classifiers for Inter-Rater Reliability Annotation." Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1622487666348687.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Augustsson, Christian, Jacobson Pontus Egeberg, and Erik Scherqvist. "Evaluating Machine Learning Intrusion Detection System classifiers : Using a transparent experiment approach." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17192.

Повний текст джерела

Анотація:

There have been many studies performing experiments that showcase the potential of machine learning solutions for intrusion detection, but their experimental approaches are non-transparent and vague, making it difficult to replicate their trained methods and results. In this thesis we exemplify a healthier experimental methodology. A survey was performed to investigate evaluation metrics. Three experiments implementing and benchmarking machine learning classifiers, using different optimization techniques, were performed to set up a frame of reference for future work, as well as signify the importance of using descriptive metrics and disclosing implementation. We found a set of metrics that more accurately describes the models, and we found guidelines that we would like future researchers to fulfill in order to make their work more comprehensible. For future work we would like to see more discussion regarding metrics, and a new dataset that is more generalizable.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Kamat, Sai Shyamsunder. "Analyzing Radial Basis Function Neural Networks for predicting anomalies in Intrusion Detection Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259187.

Повний текст джерела

Анотація:

In the 21st century, information is the new currency. With the omnipresence of devices connected to the internet, humanity can instantly avail any information. However, there are certain are cybercrime groups which steal the information. An Intrusion Detection System (IDS) monitors a network for suspicious activities and alerts its owner about an undesired intrusion. These commercial IDS’es react after detecting intrusion attempts. With the cyber attacks becoming increasingly complex, it is expensive to wait for the attacks to happen and respond later. It is crucial for network owners to employ IDS’es that preemptively differentiate a harmless data request from a malicious one. Machine Learning (ML) can solve this problem by recognizing patterns in internet traffic to predict the behaviour of network users. This project studies how effectively Radial Basis Function Neural Network (RBFN) with Deep Learning Architecture can impact intrusion detection. On the basis of the existing framework, it asks how well can an RBFN predict malicious intrusive attempts, especially when compared to contemporary detection practices.Here, an RBFN is a multi-layered neural network model that uses a radial basis function to transform input traffic data. Once transformed, it is possible to separate the various traffic data points using a single straight line in extradimensional space. The outcome of the project indicates that the proposed method is severely affected by limitations. E.g. the model needs to be fine tuned over several trials to achieve a desired accuracy. The results of the implementation show that RBFN is accurate at predicting various cyber attacks such as web attacks, infiltrations, brute force, SSH etc, and normal internet behaviour on an average 80% of the time. Other algorithms in identical testbed are more than 90% accurate. Despite the lower accuracy, RBFN model is more than 94% accurate at recording specific kinds of attacks such as Port Scans and BotNet malware. One possible solution is to restrict this model to predict only malware attacks and use different machine learning algorithm for other attacks.
I det 21: a århundradet är information den nya valutan. Med allnärvaro av enheter anslutna till internet har mänskligheten tillgång till information inom ett ögonblick. Det finns dock vissa grupper som använder metoder för att stjäla information för personlig vinst via internet. Ett intrångsdetekteringssystem (IDS) övervakar ett nätverk för misstänkta aktiviteter och varnar dess ägare om ett oönskat intrång skett. Kommersiella IDS reagerar efter detekteringen av ett intrångsförsök. Angreppen blir alltmer komplexa och det kan vara dyrt att vänta på att attackerna ska ske för att reagera senare. Det är avgörande för nätverksägare att använda IDS:er som på ett förebyggande sätt kan skilja på oskadlig dataanvändning från skadlig. Maskininlärning kan lösa detta problem. Den kan analysera all befintliga data om internettrafik, känna igen mönster och förutse användarnas beteende. Detta projekt syftar till att studera hur effektivt Radial Basis Function Neural Networks (RBFN) med Djupinlärnings arkitektur kan påverka intrångsdetektering. Från detta perspektiv ställs frågan hur väl en RBFN kan förutsäga skadliga intrångsförsök, särskilt i jämförelse med befintliga detektionsmetoder.Här är RBFN definierad som en flera-lagers neuralt nätverksmodell som använder en radiell grundfunktion för att omvandla data till linjärt separerbar. Efter en undersökning av modern litteratur och lokalisering av ett namngivet dataset användes kvantitativ forskningsmetodik med prestanda indikatorer för att utvärdera RBFN: s prestanda. En Random Forest Classifier algorithm användes också för jämförelse. Resultaten erhölls efter en serie finjusteringar av parametrar på modellerna. Resultaten visar att RBFN är korrekt när den förutsäger avvikande internetbeteende i genomsnitt 80% av tiden. Andra algoritmer i litteraturen beskrivs som mer än 90% korrekta. Den föreslagna RBFN-modellen är emellertid mycket exakt när man registrerar specifika typer av attacker som Port Scans och BotNet malware. Resultatet av projektet visar att den föreslagna metoden är allvarligt påverkad av begränsningar. T.ex. så behöver modellen finjusteras över flera försök för att uppnå önskad noggrannhet. En möjlig lösning är att begränsa denna modell till att endast förutsäga malware-attacker och använda andra maskininlärnings-algoritmer för andra attacker.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Kola, Lokesh, and Vigneshwar Muriki. "A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Diabetes Prediction." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21816.

Повний текст джерела

Анотація:

Background: The main cause of diabetes is due to high sugar levels in the blood. There is no permanent cure for diabetes. However, it can be prevented by early diagnosis. In recent years, the hype for Machine Learning is increasing in disease prediction especially during COVID-19 times. In the present scenario, it is difficult for patients to visit doctors. A possible framework is provided using Machine Learning which can detect diabetes at early stages. Objectives: This thesis aims to identify the critical features that impact gestational (Type-3) diabetes and experiments are performed to identify the efficient algorithm for Type-3 diabetes prediction. The selected algorithms are Decision Trees, RandomForest, Support Vector Machine, Gaussian Naive Bayes, Bernoulli Naive Bayes, Laplacian Support Vector Machine. The algorithms are compared based on the performance. Methods: The method consists of gathering the dataset and preprocessing the data. SelectKBestunivariate feature selection was performed for selecting the important features, which influence the Type-3 diabetes prediction. A new dataset was created by binning some of the important features from the original dataset, leading to two datasets, non-binned and binned datasets. The original dataset was imbalanced due to the unequal distribution of class labels. The train-test split was performed on both datasets. Therefore, the oversampling technique was performed on both training datasets to overcome the imbalance nature. The selected Machine Learning algorithms were trained. Predictions were made on the test data. Hyperparameter tuning was performed on all algorithms to improve the performance. Predictions were made again on the test data and accuracy, precision, recall, and f1-score were measured on both binned and non-binned datasets. Results: Among selected Machine Learning algorithms, Laplacian Support Vector Machineattained higher performance with 89.61% and 86.93% on non-binned and binned datasets respectively. Hence, it is an efficient algorithm for Type-3 diabetes prediction. The second best algorithm is Random Forest with 74.5% and 72.72% on non-binned and binned datasets. The non-binned dataset performed well for the majority of selected algorithms. Conclusions: Laplacian Support Vector Machine scored high performance among the other algorithms on both binned and non-binned datasets. The non-binned dataset showed the best performance in almost all Machine Learning algorithms except Bernoulli naive Bayes. Therefore, the non-binned dataset is more suitable for the Type-3 diabetes prediction.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Anyango, Stephen Omondi Otieno. "VisuNet: Visualizing Networks of feature interactions in rule-based classifiers." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296336.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

23

DEMETRIO, LUCA. "Formalizing evasion attacks against machine learning security detectors." Doctoral thesis, Università degli studi di Genova, 2021. http://hdl.handle.net/11567/1035018.

Повний текст джерела

Анотація:

Recent work has shown that adversarial examples can bypass machine learning-based threat detectors relying on static analysis by applying minimal perturbations. To preserve malicious functionality, previous attacks either apply trivial manipulations (e.g. padding), potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard adversarial variants that do not correctly execute in sandbox environments. While machine learning systems for detecting SQL injections have been proposed in the literature, no attacks have been tested against the proposed solutions to assess the effectiveness and robustness of these methods. In this thesis, we overcome these limitations by developing RAMEn, a unifying framework that (i) can express attacks for different domains, (ii) generalizes previous attacks against machine learning models, and (iii) uses functions that preserve the functionality of manipulated objects. We provide new attacks for both Windows malware and SQL injection detection scenarios by exploiting the format used for representing these objects. To show the efficacy of RAMEn, we provide experimental results of our strategies in both white-box and black-box settings. The white-box attacks against Windows malware detectors show that it takes only the 2% of the input size of the target to evade detection with ease. To further speed up the black-box attacks, we overcome the issues mentioned before by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content, which will never be executed, either at the end of the malicious file, or within some newly-created sections, encoded in an algorithm called GAMMA. We also evaluate whether GAMMA transfers to other commercial antivirus solutions, and surprisingly find that it can evade many commercial antivirus engines. For evading SQLi detectors, we create WAF-A-MoLE, a mutational fuzzer that that exploits random mutations of the input samples, keeping alive only the most promising ones. WAF-A-MoLE is capable of defeating detectors built with different architectures by using the novel practical manipulations we have proposed. To facilitate reproducibility and future work, we open-source our framework and corresponding attack implementations. We conclude by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts naturally into the learning process.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Joe-Yen, Stefan. "Performance Envelopes of Adaptive Ensemble Data Stream Classifiers." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1014.

Повний текст джерела

Анотація:

This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as “concept drift,” often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities of each in terms of accuracy and throughput. For each< run of the experiment, the percentage of correct classifications was measured using prequential analysis, a well-established methodology in the evaluation of streaming classifiers. Throughput was measured in classifications performed per second as timed by the CPU clock. Two main experimental variables were manipulated to investigate and compare the range of accuracy and throughput exhibited by each algorithm under various conditions. The number of attributes in the instances to be classified and the speed at which the definitions of labeled data drifted were varied across six total combinations of drift-speed and dimensionality. The implications of results are used to recommend improved methods for working with stream-based data sources. The typical approach to counteract concept drift is to update the classification models with new data. In the stream paradigm, classifiers are continuously exposed to new data that may serve as representative examples of the current situation. However, updating the ensemble classifier in order to maintain or improve accuracy can be computationally costly and will negatively impact throughput. In a real-time system, this could lead to an unacceptable slow-down. The results of this research showed that,among several algorithms for reducing the effect of concept drift, adaptive decision trees maintained the highest accuracy without slowing down with respect to the no-drift condition. Adaptive ensemble techniques were also able to maintain reasonable accuracy in the presence of drift without much change in the throughput. However, the overall throughput of the adaptive methods is low and may be unacceptable for extremely time-sensitive applications. The performance visualization methodology utilized in this study gives a clear and intuitive visual summary that allows system designers to evaluate candidate algorithms with respect to their performance needs.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Milne, Linda Computer Science &amp Engineering Faculty of Engineering UNSW. "Machine learning for automatic classification of remotely sensed data." Publisher:University of New South Wales. Computer Science & Engineering, 2008. http://handle.unsw.edu.au/1959.4/41322.

Повний текст джерела

Анотація:

As more and more remotely sensed data becomes available it is becoming increasingly harder to analyse it with the more traditional labour intensive, manual methods. The commonly used techniques, that involve expert evaluation, are widely acknowledged as providing inconsistent results, at best. We need more general techniques that can adapt to a given situation and that incorporate the strengths of the traditional methods, human operators and new technologies. The difficulty in interpreting remotely sensed data is that often only a small amount of data is available for classification. It can be noisy, incomplete or contain irrelevant information. Given that the training data may be limited we demonstrate a variety of techniques for highlighting information in the available data and how to select the most relevant information for a given classification task. We show that more consistent results between the training data and an entire image can be obtained, and how misclassification errors can be reduced. Specifically, a new technique for attribute selection in neural networks is demonstrated. Machine learning techniques, in particular, provide us with a means of automating classification using training data from a variety of data sources, including remotely sensed data and expert knowledge. A classification framework is presented in this thesis that can be used with any classifier and any available data. While this was developed in the context of vegetation mapping from remotely sensed data using machine learning classifiers, it is a general technique that can be applied to any domain. The emphasis of the applicability for this framework being domains that have inadequate training data available.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Tian, Ke. "Learning-based Cyber Security Analysis and Binary Customization for Security." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/85013.

Повний текст джерела

Анотація:

This thesis presents machine-learning based malware detection and post-detection rewriting techniques for mobile and web security problems. In mobile malware detection, we focus on detecting repackaged mobile malware. We design and demonstrate an Android repackaged malware detection technique based on code heterogeneity analysis. In post-detection rewriting, we aim at enhancing app security with bytecode rewriting. We describe how flow- and sink-based risk prioritization improves the rewriting scalability. We build an interface prototype with natural language processing, in order to customize apps according to natural language inputs. In web malware detection for Iframe injection, we present a tag-level detection system that aims to detect the injection of malicious Iframes for both online and offline cases. Our system detects malicious iframe by combining selective multi-execution and machine learning algorithms. We design multiple contextual features, considering Iframe style, destination and context properties.
Ph. D.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Gilmore, Eugene M. "Learning Interpretable Decision Tree Classifiers with Human in the Loop Learning and Parallel Coordinates." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/418633.

Повний текст джерела

Анотація:

The Machine Learning (ML) community has recently started to recognise the importance of model interpretability when using ML techniques. In this work, I review the literature on Explainable Artificial Intelligence (XAI) and interpretability in ML and discuss several reasons why interpretability is critical for many ML applications. Although there is now increased interest in XAI, there are significant issues with the approaches taken in a large portion of the research in XAI. In particular, the popularity of techniques that try to explain black-box models often leads to misleading explanations that are not faithful to the model being explained. The popularity of black-box models is, in large part, due to the immense size and complexity of many datasets available today. The high dimensionality of many datasets has encouraged research in ML and particular techniques such as Artificial Neural Networks (ANNs). However, I argue in this work that the high dimensionality of a dataset should not, in itself, be a reason to settle for black-box models that humans cannot understand. Instead, I argue for the need to learn inherently interpretable models, rather than black-box models with post-hoc explanations of their results. One of the most well-known ML models for supervised learning tasks that remains interpretable to humans is the Decision Tree Classifier (DTC). The DTC's interpretability is due to its simple tree structure where a human can individually inspect the splits at each node in the tree. Although a DTC's fundamental structure is interpretable to humans, even a DTC can effective become a black-box model. This may be due to the size of a DTC being too large for a human to comprehend. Alternatively, a DTC may use uninterpretable oblique splits at each node. These oblique splits most often use a hyperplane through the entire attributes space of a dataset to construct a split which is impossible for a human to interpret past three dimensions. In this work, I propose techniques for learning and visualising DTCs and datasets to produce interpretable classifiers that do not sacrifice predictive power. Moreover, I combine such visualisation with an interactive DTC building strategy and enable productive and effective Human-In-the-Loop-Learning (HILL). Not only do classifiers learnt with human involvement have the natural requirement of being humanly interpretable, but there are also several additional advantages to be gained by involving human expertise. These advantages include the ability for a domain expert to contribute their domain knowledge to a model. We can also exploit the highly sophisticated visual pattern recognition capabilities of the human to learn models that more effectively generalise to unseen data. Despite limitations of current HILL systems, a user study conducted as part of this work provides promising results for the involving the human in the construction of DTCs. However, to effective employ this learning style, we need powerful visualisation techniques for both high dimensional datasets and DTCs. Remarkably, despite being ideally suited for high dimensional datasets, the use of Parallel Coordinates (||-coords) by the ML community is minimal. First proposed by Alfred Inselberg, ||-coords is a revolutionary visualisation technique that uses parallel axis to display a dataset of an arbitrary number of dimensions. Using ||-coords, I propose a HILL system for the construction of DTCs. This work also exploits the ||-coords visualisation system to facilitate human input to the splits of internal nodes in a DTC. In addition, I propose a new form of oblique split for DTCs that uses the properties of the ||-coords plane. Unlike other oblique rules, this oblique rule can be easily visualised using ||-coords. While there has recently been renewed interest in XAI and HILL, the research that evaluates systems that facilitate XAI and HILL is limited. I report on an online survey that gathers data from 104 participants. This survey examines participants' use of visualisation systems which I argue are ideally suited for HILL and XAI. The results support my hypothesis and the proposals for HILL. I further argue that for a HILL system to succeed, comprehensive algorithm support is critical. As such, I propose two new DTC induction algorithms. These algorithms are designed to be used in conjunction with the HILL system developed in this work to provide algorithmic assistance in the form of suggestions of splits for a DTC node. The first proposed induction algorithm uses the newly proposed form of oblique split with ||-coords to learn interpretable splits that can capture correlations between attributes. The second induction algorithm advances the nested cavities algorithm originally proposed by Inselberg for classification tasks using ||-coords. Using these induction algorithms enables learning of DTCs with oblique splits that remain interpretable to a human without sacrificing predictive performance.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Info & Comm Tech
Science, Environment, Engineering and Technology
Full Text

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Lounici, Sofiane. "Watermarking machine learning models." Electronic Thesis or Diss., Sorbonne université, 2022. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2022SORUS282.pdf.

Повний текст джерела

Анотація:

La protection de la propriété intellectuelle des modèles d’apprentissage automatique apparaît de plus en plus nécessaire, au vu des investissements et de leur impact sur la société. Dans cette thèse, nous proposons d’étudier le tatouage de modèles d’apprentissage automatique. Nous fournissons un état de l’art sur les techniques de tatouage actuelles, puis nous le complétons en considérant le tatouage de modèles au-delà des tâches de classification d’images. Nous définissons ensuite les attaques de contrefaçon contre le tatouage pour les plateformes d’hébergement de modèles, et nous présentons une nouvelle technique de tatouages par biais algorithmique. De plus, nous proposons une implémentation des techniques présentées
The protection of the intellectual property of machine learning models appears to be increasingly necessary, given the investments and their impact on society. In this thesis, we propose to study the watermarking of machine learning models. We provide a state of the art on current watermarking techniques, and then complement it by considering watermarking beyond image classification tasks. We then define forging attacks against watermarking for model hosting platforms and present a new fairness-based watermarking technique. In addition, we propose an implementation of the presented techniques

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Pozdniakov, K. "A machine learning approach for smart computer security audit." Thesis, City, University of London, 2017. http://openaccess.city.ac.uk/19971/.

Повний текст джерела

Анотація:

This thesis presents a novel application of machine learning technology to automate network security audit and penetration testing processes in particular. A model-free reinforcement learning approach is presented. It is characterized by the absence of the environmental model. The model is derived autonomously by the audit system while acting in the tested computer network. The penetration testing process is specified as a Markov decision process (MDP) without definition of reward and transition functions for every state/action pair. The presented approach includes application of traditional and modified Q-learning algorithms. A traditional Q-learning algorithm learns the action-value function stored in the table, which gives the expected utility of executing a particular action in a particular state of the penetration testing process. The modified Q-learning algorithm differs by incorporation of the state space approximator and representation of the action-value function as a linear combination of features. Two deep architectures of the approximator are presented: autoencoder joint with artificial neural network (ANN) and autoencoder joint with recurrent neural network (RNN). The autoencoder is used to derive the feature set defining audited hosts. ANN is intended to approximate the state space of the audit process based on derived features. RNN is a more advanced version of the approximator and differs by the existence of the additional loop connections from hidden to input layers of the neural network. Such architecture incorporates previously executed actions into new inputs. It gives the opportunity to audit system learn sequences of actions leading to the goal of the audit, which is defined as receiving administrator rights on the host. The model-free reinforcement learning approach based on traditional Q-learning algorithms was also applied to reveal new vulnerabilities, buffer overflow in particular. The penetration testing system showed the ability to discover a string, exploiting potential vulnerability, by learning its formation process on the go. In order to prove the concept and to test the efficiency of an approach, audit tool was developed. Presented results are intended to demonstrate the adaptivity of the approach, performance of the algorithms and deep machine learning architectures. Different sets of hyperparameters are compared graphically to test the ability of convergence to the optimal action policy. An action policy is a sequence of actions, leading to the audit goal (getting admin rights on the remote host). The testing environment is also presented. It consists of 80+ virtual machines based on a vSphere virtualization platform. This combination of hosts represents a typical corporate network with Users segment, Demilitarized zone (DMZ) and external segment (Internet). The network has typical corporate services available: web server, mail server, file server, SSH, SQL server. During the testing process, the audit system acts as an attacker from the Internet.

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Grosse, Kathrin [Verfasser]. "Why is Machine Learning Security so hard? / Kathrin Grosse." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2020. http://d-nb.info/1237268818/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Davis, Jonathan J. "Machine learning and feature engineering for computer network security." Thesis, Queensland University of Technology, 2017. https://eprints.qut.edu.au/106914/1/Jonathan_Davis_Thesis.pdf.

Повний текст джерела

Анотація:

This thesis studies the application of machine learning to the field of Cyber security. Machine learning algorithms promise to enhance Cyber security by identifying malicious activity based only on provided examples. However, a major difficulty is the unsuitability of raw Cyber security data as input. In an attempt to address this problem, this thesis presents a framework for automatically constructing relevant features suitable for machine learning directly from network traffic. We then test the effectiveness of the framework by applying it to three Cyber security problems: HTTP tunnel detection, DNS tunnel detection, and traffic classification.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Sorio, Enrico. "Machine Learning Techniques for Document Processing and Web Security." Doctoral thesis, Università degli studi di Trieste, 2013. http://hdl.handle.net/10077/8533.

Повний текст джерела

Анотація:

2011/2012
The task of extracting structured information from documents that are unstructured or whose structure is unknown is of uttermost importance in many application domains, e.g., office automation, knowledge management, machine-to-machine interactions. In practice, this information extraction task can be automated only to a very limited extent or subject to strong assumptions and constraints on the execution environment. In this thesis work I will present several novel application of machine learning techniques aimed at extending the scope and opportunities for automation of information extraction from documents of different types, ranging from printed invoices to structured XML documents, to potentially malicious documents exposed on the web. The main results of this thesis consist in the design, development and experimental evaluation of a system for information extraction from printed documents. My approach is designed for scenarios in which the set of possible documents layouts is unknown and may evolve over time. The system uses the layout information to define layout-specific extraction rules that can be used to extract information from a document. As far as I know, this is the first information extraction system that is able to detect if the document under analysis has an unseen layout and hence needs new extraction rules. In such case, it uses a probability based machine learning algorithm in order to build those extraction rules using just the document under analysis. Another novel contribution of our system is that it continuously exploits the feedback from human operators in order to improve its extraction ability. I investigate a method for the automatic detection and correction of OCR errors. The algorithm uses domain-knowledge about possible misrecognition of characters and about the type of the extracted information to propose and validate corrections. I propose a system for the automatic generation of regular expression for text-extraction tasks. The system is based on genetic programming and uses a set of user-provided labelled examples to drive the evolutionary search for a regular expression suitable for the specified task. As regards information extraction from structured document, I present an approach, based on genetic programming, for schema synthesis starting from a set of XML sample documents. The tool takes as input one or more XML documents and automatically produces a schema, in DTD language, which describes the structure of the input documents. Finally I will move to the web security. I attempt to assess the ability of Italian public administrations to be in full control of the respective web sites. Moreover, I developed a technique for the detection of certain types of fraudulent intrusions that are becoming of practical interest on a large scale.
XXV Ciclo
1985

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Schwalbe, Lehtihet Oliver, and Viktor Åryd. "A Comparison of Performance and Noise Resistance of Different Machine Learning Classifiers on Gaussian Clusters." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301781.

Повний текст джерела

Анотація:

Most real world data contains some amount of noise, i.e. unwanted factors obscuring the underlying signal, making it harder to detect or categorize. This is a problem in machine learning classification. Multiple studies have shown the impact noise can have on the difficulty of different classification problems. Both attribute and class noise can have a big impact on classification accuracy, especially as the levels of noise increase. In this study we analyse how severely a number of different classification algorithms are affected by both attribute and class noise, and how the number of classes and parameters further affect their resistance to noise. Similar studies have been done before, but we aim to supplement this research by using classifiers not as broadly studied, with self- generated data. This aims to increase our control of experiment parameters to give us more easily interpretable results. Among the classification algorithms used (Support Vector Machine, Random Forest, K-Nearest Neighbours and K-Means Clustering), the Random Forest algorithm outperform the other classifiers in most of the tests performed. However, both initial performance with noise free data and the resistance to noise seem to be highly dependant on the nature of the data itself, and also on the type of noise introduced. Ultimately, more research is needed, especially concerning how different data distributions and classifier parameters impact noise resistance
När man samlar in data förväntar man sig i princip alltid en viss mängd brus. Brus kan försämra kvaliten på datan och göra den mer svårtolkad. Detta är ett problem inom maskininlärningsklassificering. Flera studier har visat svårigheterna som uppstår inom detta område när datan som använder är brusfylld. Både brus i attribut och i klasser har visat sig skadligt för klassifikationsnoggrannhet, speciellt för högre nivåer av brus. I denna studie analyserar vi hur skadligt attribut och klass-brus är för ett antal klassifikationsalgoritmer, samt hur antalet klasser och parametrar påverkar dessa resultat. Liknande studier har gjorts förut, men vi ämnar till att bygga på dessa resultat genom vårt val av klassificerare, samt genom egengenererad data. Egen data bidrar till kontrollerade experiment med mer lättolkade resultat. Bland de klassifikationsalgoritmer som används (SupportVector Machine, Random Forest,K-NearestNeighbours och K-Means Clustering), så är Random Forest generellt bäst. Dock så verkar både brusresistans och klassifikationsnoggrannhet bero väldigt mycket på hur datan ser ut, samt på vilket sorts brus som introduceras. I slutändan krävs det mer forskning i detta område, speciellt när kring hur olika datadistributioner och parametrar till klassifikationsalgoritmer påverkar restultaten.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Hu, Ji. "A virtual machine architecture for IT-security laboratories." Phd thesis, [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=980935652.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Ramakrishnan, Shubha. "A system design approach to neuromorphic classifiers." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/51718.

Повний текст джерела

Анотація:

This work considers alternative strategies to mainstream digital approaches to signal processing - namely analog and neuromorphic solutions, for increased computing efficiency. In the context of a speech recognizer application, we use low-power analog approaches for the signal conditioning and basic auditory feature extraction, while using a neuromorphic IC for building a dendritic classifier that can be used as a low-power word spotter. In doing so, this work also aspires to posit the significance of dendrites in neural computation.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Löfström, Tuwe. "On Effectively Creating Ensembles of Classifiers : Studies on Creation Strategies, Diversity and Predicting with Confidence." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-116683.

Повний текст джерела

Анотація:

An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts.
En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen.

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.

Dataanalys för detektion av läkemedelseffekter (DADEL)

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Liu, Ruidong. "Power system stability scanning and security assessment using machine learning." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19584.

Повний текст джерела

Анотація:

Future grids planning requires a major departure from conventional power system planning, where only a handful of the most critical scenarios is analyzed. To account for a wide range of possible future evolutions, scenario analysis has been proposed in many industries. As opposed to the conventional power system planning, where the aim is to ﬁnd an optimal transmission and/or generation expansion plan for an existing grid, the aim in future grids scenario analysis is to analyze possible evolution pathways to inform power system planning and policy making. Therefore, future grids’ planning may involve large amount of scenarios and the existing planning tools may no longer suitable. Other than the raised future grids’ planning issues, operation of future grids using conventional tools is also challenged by the new features of future grids such as intermittent generation, demand response and fast responding power electronic plants which lead to much more diverse operation conditions compared to the existing networks. Among all operation issues, monitoring stability as well as security of a power system and action with deliberated preventive or remedial adjustment is of vital important. On- line Dynamic Security Assessment (DSA) can evaluate security of a power system almost instantly when current or imminent operation conditions are supplied. The focus of this dissertation are, for future grid planning, to develop a framework using Machine Learning (ML) to effectively assess the security of future grids by analyzing a large amount of the scenarios; for future grids operation, to propose approaches to address technique issues brought by future grids’ diverse operation conditions using ML techniques. Unsupervised learning, supervised learning and semi-supervised learning techniques are utilized in a set of proposed planning and operation security assessment tools.

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Šrndić, Nedim [Verfasser]. "Machine Learning and Security of Non-Executable Files / Nedim Šrndić." München : Verlag Dr. Hut, 2017. http://d-nb.info/1149580364/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

39

MARCELLI, ANDREA. "Machine Learning and other Computational-Intelligence Techniques for Security Applications." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2751497.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Caley, Jeffrey Allan. "A Survey of Systems for Predicting Stock Market Movements, Combining Market Indicators and Machine Learning Classifiers." PDXScholar, 2013. https://pdxscholar.library.pdx.edu/open_access_etds/2001.

Повний текст джерела

Анотація:

In this work, we propose and investigate a series of methods to predict stock market movements. These methods use stock market technical and macroeconomic indicators as inputs into different machine learning classifiers. The objective is to survey existing domain knowledge, and combine multiple techniques into one method to predict daily market movements for stocks. Approaches using nearest neighbor classification, support vector machine classification, K-means classification, principal component analysis and genetic algorithms for feature reduction and redefining the classification rule were explored. Ten stocks, 9 companies and 1 index, were used to evaluate each iteration of the trading method. The classification rate, modified Sharpe ratio and profit gained over the test period is used to evaluate each strategy. The findings showed nearest neighbor classification using genetic algorithm input feature reduction produced the best results, achieving higher profits than buy-and-hold for a majority of the companies.

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Gapper, Justin J. "Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images." Chapman University Digital Commons, 2019. https://digitalcommons.chapman.edu/cads_dissertations/2.

Повний текст джерела

Анотація:

This dissertation is an evaluation of the generalization characteristics of machine learning classifiers as applied to the detection of coral reefs using remote sensing images. Three scientific studies have been conducted as part of this research: 1) Evaluation of Spatial Generalization Characteristics of a Robust Classifier as Applied to Coral Reef Habitats in Remote Islands of the Pacific Ocean 2) Coral Reef Change Detection in Remote Pacific Islands using Support Vector Machine Classifiers 3) A Generalized Machine Learning Classifier for Spatiotemporal Analysis of Coral Reefs in the Red Sea. The aim of this dissertation is to propose and evaluate a methodology for developing a robust machine learning classifier that can effectively be deployed to accurately detect coral reefs at scale. The hypothesis is that Landsat data can be used to train a classifier to detect coral reefs in remote sensing imagery and that this classifier can be trained to generalize across multiple sites. Another objective is to identify how well different classifiers perform under the generalized conditions and how unique the spectral signature of coral is as environmental conditions vary across observation sites. A methodology for validating the generalization performance of a classifier to unseen locations is proposed and implemented (Controlled Parameter Cross-Validation,). Analysis is performed using satellite imagery from nine different locations with known coral reefs (six Pacific Ocean sites and three Red Sea sites). Ground truth observations for four of the Pacific Ocean sites and two of the Red Sea sites were used to validate the proposed methodology. Within the Pacific Ocean sites, the consolidated classifier (trained on data from all sites) yielded an accuracy of 75.5% (0.778 AUC). Within the Red Sea sites, the consolidated classifier yielded an accuracy of 71.0% (0.7754 AUC). Finally, long-term change detection analysis is conducted for each of the sites evaluated. In total, over 16,700 km2 was analyzed for benthic cover type and cover change detection analysis. Within the Pacific Ocean sites, decreases in coral cover ranged from 25.3% reduction (Kingman Reef) to 42.7% reduction (Kiritimati Island). Within the Red Sea sites, decrease in coral cover ranged from 3.4% (Umluj) to 13.6% (Al Wajh).

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Chavez, Wesley. "An Exploration of Linear Classifiers for Unsupervised Spiking Neural Networks with Event-Driven Data." PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4439.

Повний текст джерела

Анотація:

Object recognition in video has seen giant strides in accuracy improvements in the last few years, a testament to the computational capacity of deep convolutional neural networks. However, this computational capacity of software-based neural networks coincides with high power consumption compared to that of some spiking neural networks (SNNs), up to 300,000 times more energy per synaptic event in IBM's TrueNorth chip, for example. SNNs are also well-suited to exploit the precise timing of event-driven image sensors, which transmit asynchronous "events" only when the luminance of a pixel changes above or below a threshold value. The combination of event-based imagers and SNNs becomes a straightforward way to achieve low power consumption in object recognition tasks. This thesis compares different linear classifiers for two low-power, hardware-friendly, spiking, unsupervised neural network architectures, SSLCA and HFirst, in response to asynchronous event-based data, and explores their ability to learn and recognize patterns from two event-based image datasets, N-MNIST and CIFAR10-DVS. By performing a grid search of important SNN and classifier hyperparameters, we also explore how to improve classification performance of these architectures. Results show that a softmax regression classifier exhibits modest accuracy gains (0.73%) over the next-best performing linear support vector machine (SVM), and considerably outperforms a single layer perceptron (by 5.28%) when classification performance is averaged over all datasets and spiking neural network architectures with varied hyperparameters. Min-max normalization of the inputs to the linear classifiers aides in classification accuracy, except in the case of the single layer perceptron classifier. We also see the highest reported classification accuracy for spiking convolutional networks on N-MNIST and CIFAR10-DVS, increasing this accuracy from 97.77% to 97.82%, and 29.67% to 31.76%, respectively. These findings are relevant for any system employing unsupervised SNNs to extract redundant features from event-driven data for recognition.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

DING, ZEJIN. "Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and their Application in Bioinformatics." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/60.

Повний текст джерела

Анотація:

In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in this dissertation.We propose a new ensemble learning framework—Diversified Ensemble Classifiers for Imbal-anced Data Learning (DECIDL), based on the advantages of existing ensemble imbalanced learning strategies. Our framework combines three learning techniques: a) ensemble learning, b) artificial example generation, and c) diversity construction by reversely data re-labeling. As a meta-learner, DECIDL utilizes general supervised learning algorithms as base learners to build an ensemble committee. We create a standard benchmark data pool, which contains 30 highly skewed sets with diverse characteristics from different domains, in order to facilitate future research on imbalance data learning. We use this benchmark pool to evaluate and compare our DECIDL framework with several ensemble learning methods, namely under-bagging, over-bagging, SMOTE-bagging, and AdaBoost. Extensive experiments suggest that our DECIDL framework is comparable with other methods. The data sets, experiments and results provide a valuable knowledge base for future research on imbalance learning. We develop a simple but effective artificial example generation method for data balancing. Two new methods DBEG-ensemble and DECIDL-DBEG are then designed to improve the power of imbalance learning. Experiments show that these two methods are comparable to the state-of-the-art methods, e.g., GSVM-RU and SMOTE-bagging. Furthermore, we investigate learning on imbalanced data from a new angle—active learning. By combining active learning with the DECIDL framework, we show that the newly designed Active-DECIDL method is very effective for imbalance learning, suggesting the DECIDL framework is very robust and flexible.Lastly, we apply the proposed learning methods to a real-world bioinformatics problem—protein methylation prediction. Extensive computational results show that the DECIDL method does perform very well for the imbalanced data mining task. Importantly, the experimental results have confirmed our new contributions on this particular data learning problem.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Mayo, Quentin R. "Detection of Generalizable Clone Security Coding Bugs Using Graphs and Learning Algorithms." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404548/.

Повний текст джерела

Анотація:

This research methodology isolates coding properties and identifies the probability of security vulnerabilities using machine learning and historical data. Several approaches characterize the effectiveness of detecting security-related bugs that manifest as vulnerabilities, but none utilize vulnerability patch information. The main contribution of this research is a framework to analyze LLVM Intermediate Representation Code and merging core source code representations using source code properties. This research is beneficial because it allows source programs to be transformed into a graphical form and users can extract specific code properties related to vulnerable functions. The result is an improved approach to detect, identify, and track software system vulnerabilities based on a performance evaluation. The methodology uses historical function level vulnerability information, unique feature extraction techniques, a novel code property graph, and learning algorithms to minimize the amount of end user domain knowledge necessary to detect vulnerabilities in applications. The analysis shows approximately 99% precision and recall to detect known vulnerabilities in the National Institute of Standards and Technology (NIST) Software Assurance Metrics and Tool Evaluation (SAMATE) project. Furthermore, 72% percent of the historical vulnerabilities in the OpenSSL testing environment were detected using a linear support vector classifier (SVC) model.

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Abo, Al Ahad George, and Abbas Salami. "Machine Learning for Market Prediction : Soft Margin Classifiers for Predicting the Sign of Return on Financial Assets." Thesis, Linköpings universitet, Produktionsekonomi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151459.

Повний текст джерела

Анотація:

Forecasting procedures have found applications in a wide variety of areas within finance and have further shown to be one of the most challenging areas of finance. Having an immense variety of economic data, stakeholders aim to understand the current and future state of the market. Since it is hard for a human to make sense out of large amounts of data, different modeling techniques have been applied to extract useful information from financial databases, where machine learning techniques are among the most recent modeling techniques. Binary classifiers such as Support Vector Machines (SVMs) have to some extent been used for this purpose where extensions of the algorithm have been developed with increased prediction performance as the main goal. The objective of this study has been to develop a process for improving the performance when predicting the sign of return of financial time series with soft margin classifiers. An analysis regarding the algorithms is presented in this study followed by a description of the methodology that has been utilized. The developed process containing some of the presented soft margin classifiers, and other aspects of kernel methods such as Multiple Kernel Learning have shown pleasant results over the long term, in which the capability of capturing different market conditions have been shown to improve with the incorporation of different models and kernels, instead of only a single one. However, the results are mostly congruent with earlier studies in this field. Furthermore, two research questions have been answered where the complexity regarding the kernel functions that are used by the SVM have been studied and the robustness of the process as a whole. Complexity refers to achieving more complex feature maps through combining kernels by either adding, multiplying or functionally transforming them. It is not concluded that an increased complexity leads to a consistent improvement, however, the combined kernel function is superior during some of the periods of the time series used in this thesis for the individual models. The robustness has been investigated for different signal-to-noise ratio where it has been observed that windows with previously poor performance are more exposed to noise impact.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Pundir, Nitin K. Pundir. "Design of a Hardware Security PUF Immune to Machine Learning Attacks." University of Toledo / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1513009797455883.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Mutai, K. (Kenneth). "Internet of Things security with machine learning techniques:a systematic literature review." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201906212619.

Повний текст джерела

Анотація:

Abstract. The Internet of Things (IoT) technologies are beneficial for both private and businesses. The growth of the technology and its rapid introduction to target fast-growing markets faces security challenges. Machine learning techniques have been recently used in research studies as a solution in securing IoT devices. These machine learning techniques have been implemented successfully in other fields. The objective of this thesis is to identify and analyze existing scientific literature published recently regarding the use of machine learning techniques in securing IoT devices. In this thesis, a systematic literature review was conducted to explore the previous research on the use of machine learning in IoT security. The review was conducted by following a procedure developed in the review protocol. The data for the study was collected from three databases i.e. IEEE Xplore, Scopus and Web of Science. From a total of 855 identified papers, 20 relevant primary studies were selected to answer the research question. The study identified 7 machine learning techniques used in IoT security, additionally, several attack models were identified and classified into 5 categories. The results show that the use of machine learning techniques in IoT security is a promising solution to the challenges facing security. Supervised machine learning techniques have better performance in comparison to unsupervised and reinforced learning. The findings also identified that data types and the learning method affects the performance of machine learning techniques. Furthermore, the results show that machine learning approach is mostly used in securing the network.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Duan, Ren. "Machine Learning in Defensive IT Security: Early Detection of Novel Threats." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-266122.

Повний текст джерела

Анотація:

The rapid development of technology leads to a rise in cybercrime, hence cybersecurityis of unprecedented significance, especially for businesses. Defensiveand forensic IT security is a rather niche field in IT security but it issurely going to grow. It focuses on preventing attacks by good design standardsand the education of persons. The typical reaction time of a computerattack currently lies in the order of hours, due to the reason that this field stillrelies on intensive manual work of skilled experts. In this thesis, we combineddefensive IT security with the most flourishing field in the present time: ArtificialIntelligence and Machine Learning. We investigate the possibility ofusing Machine Learning for filtering out the obvious normal data and focusingthe attention of the experts onto important things where experience reallymatters. The nature of this problem is anomaly detection, therefore, we selectand test several algorithms which perform well in detecting anomalies, includingTerm Frequency-Inverse Document Frequency, K-Means, K-NearestNeighbours, Isolation Forest, and Autoencoders, and apply them on the Http(KDDCUP99) dataset and our own network connection dataset collected usingCarbon Black Response. Carbon Black Response is an industry-leadingincident response and threat hunting solution. The results show that IsolationForest and K-Nearest Neighbours are the best traditional Machine Learningmethods for the two datasets respectively, meanwhile, as a deep learningmethod, Autoencoders did quite well in differentiating normal and maliciousevents for both datasets.
Den snabba och ständigt ökande teknologiska utvecklingen har lett till att enökning inom IT relaterade brott där företag och organisationer ofta blir drabbademed nästintill oförutsägbara konsekvenser. Defensiv IT-säkerhet och forensik,där fokus ligger på att upptäcka, stoppa och mitigera attacker genom olikatekniker, utbildning och design. Trots att organisationer idag ofta spenderarstora delar av sin budget på defensiv säkerhet så mäts ändå tiden det tar att agerapå attacker och intrång ofta minst i timmar då arbetet innebär stora mängdermanuellt arbete för områdets experter. Större angrepp kan ta veckor eller månaderatt utreda. I det här arbetet kombineras defensiv IT-säkerhet med någraav de mest omtalade områdena i dagsläget: Artificiell intelligens och maskininlärning.Vi undersöker möjligheten att använda dessa tekniker för att filtreraut det uppenbart normala datat och fokusera på det avvikande och vesentliga såatt områdets experter kan lägga tid där det verkligen behövs. Problemets kärnaligger i att kunna detektera avvikelser. Därav grundas arbetet i att utvärderaolika algoritmer för att upptäcka anomalier för att se hur dessa preseterar motvarandra. Vi kommer använda oss av tekniker som Term Frequency-InverseDocument Frequency, K-Means, K-Nearest Neighbours, Isolation Forest, ochAutoencoders mot två olika dataset. Det första datasetet är baserat på HTTPtrafik (KDDCUP99) medan det andra bygger på insamling av data från riktigaklienter via ett verktyg som heter Carbon Black Response som är ett ledandeverktyg för att utför storskaliga undersökningar och söka efter angripare. Resultatetav arbetet visar att Isolation Forest och K-Nearest Neighbours är förrespektive dataset men också att Autoencoders, som är en metod för Deep Learning,presterar goda resultat för att identifiera elakartade aktiviteter för bådadataseten.

Стилі APA, Harvard, Vancouver, ISO та ін.

49

GUIDOTTI, DARIO. "Verification and Repair of Machine Learning Models." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1082694.

Повний текст джерела

Анотація:

In these last few years, machine learning (ML) has gained incredible traction in the Artificial Intelligence community, and ML models have found successful applications in many different domains across computer science. However, it is hard to provide any formal guarantee on the behavior of ML models, and therefore their reliability is still in doubt, especially concerning their deployment in safety and security-critical applications. Verification and repair emerged as promising solutions to address some of these problems. In this dissertation, we present our contributions to these two lines of research: in particular, we focus on verifying and repairing machine-learned controllers, leveraging learning techniques to enhance the verification and repair of neural networks, and developing novel tools and algorithms for verifying neural networks. Part of our research is made available in the library pyNeVer, which provides capabilities for training, verification, and management of neural networks.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Cheng, Aidan. "Using Machine Learning to Detect Malicious URLs." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1567.

Повний текст джерела

Анотація:

There is a need for better predictive model that reduces the number of malicious URLs being sent through emails. This system should learn from existing metadata about URLs. The ideal solution for this problem would be able to learn from its predictions. For example, if it predicts a URL to be malicious, and that URL is deemed safe by the sandboxing environment, the predictor should refine its model to account for this data. The problem, then, is to construct a model with these characteristics that can make these predictions for the vast number of URLs being processed. Given that the current system does not employ machine learning methods, we intend to investigate multiple such models and summarize which of those might be worth pursuing on a large scale.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Security of machine learning classifiers"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями