Log in

Relevant bibliographies by topics / Binary code learning / Journal articles

Journal articles on the topic 'Binary code learning'

To see the other types of publications on this topic, follow the link: Binary code learning.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Binary code learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mohan Liu, Mohan Liu, Xiaoming Tang Mohan Liu, and Hanming Fei Xiaoming Tang. "Design of Malicious Code Detection System Based on Binary Code Slicing." 電腦學刊 33, no. 3 (June 2022): 225–38. http://dx.doi.org/10.53106/199115992022063303018.

Full text

Abstract:

<p>Malicious code threatens the safety of computer systems. Researching malicious code design techniques and mastering code behavior patterns are the basic work of network security prevention. With the game of network offense and defense, malicious code shows the characteristics of invisibility, polymorphism, and multi-dismutation. How to correctly and effectively understand malicious code and extract the key malicious features is the main goal of malicious code detection technology. As an important method of program understanding, program slicing is used to analyze the program code by using the idea of “decomposition”, and then extract the code fragments that the analyst is interested in. In recent years, data mining and machine learning techniques have been applied to the field of malicious code detection. The reason why it has become the focus of research is that it can use data mining to dig out meaningful patterns from a large amount of existing code data. Machine learning can It helps to summarize the identification knowledge of known malicious code, so as to conduct similarity search and help find unknown malicious code. The machine learning heuristic malicious code detection method firstly needs to automatically or manually extract the structure, function and behavior characteristics of the malicious code, so we can first slice the malicious code and then perform the detection. Through the improvement of the classic program slicing algorithm, this paper effectively improves the slicing problem between binary code processes. At the same time, it implements a malicious code detection system. The machine code byte sequence variable-length N-gram is used as the feature extraction method to further prove that the efficiency and accuracy of malicious code detection technology based on data mining and machine learning. </p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

2

Zhou, Xiang, Fumin Shen, Yang Yang, Guangwei Gao, and Yuan Wang. "Binary code learning via optimal class representations." Neurocomputing 208 (October 2016): 59–65. http://dx.doi.org/10.1016/j.neucom.2015.12.129.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Zhou, Lei, Xiao Bai, Xianglong Liu, Jun Zhou, and Edwin R. Hancock. "Learning binary code for fast nearest subspace search." Pattern Recognition 98 (February 2020): 107040. http://dx.doi.org/10.1016/j.patcog.2019.107040.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Xiang, Yuanping Nie, Zhi Wang, Xiaohui Kuang, Kefan Qiu, Cheng Qian, and Gang Zhao. "BMOP: Bidirectional Universal Adversarial Learning for Binary OpCode Features." Wireless Communications and Mobile Computing 2020 (December 2, 2020): 1–11. http://dx.doi.org/10.1155/2020/8876632.

Full text

Abstract:

For malware detection, current state-of-the-art research concentrates on machine learning techniques. Binary n -gram OpCode features are commonly used for malicious code identification and classification with high accuracy. Binary OpCode modification is much more difficult than modification of image pixels. Traditional adversarial perturbation methods could not be applied on OpCode directly. In this paper, we propose a bidirectional universal adversarial learning method for effective binary OpCode perturbation from both benign and malicious perspectives. Benign features are those OpCodes that represent benign behaviours, while malicious features are OpCodes for malicious behaviours. From a large dataset of benign and malicious binary applications, we select the most significant benign and malicious OpCode features based on the feature SHAP value in the trained machine learning model. We implement an OpCode modification method that insert benign OpCodes into executables as garbage codes without execution and modify malicious OpCodes by equivalent replacement preserving execution semantics. The experimental results show that the benign and malicious OpCode perturbation (BMOP) method could bypass malicious code detection models based on the SVM, XGBoost, and DNN algorithms.

APA, Harvard, Vancouver, ISO, and other styles

5

Jeong, Junho, Yangsun Lee, Uduakobong George Offong, and Yunsik Son. "A Type Information Reconstruction Scheme Based on Long Short-Term Memory for Weakness Analysis in Binary File." International Journal of Software Engineering and Knowledge Engineering 28, no. 09 (September 2018): 1267–86. http://dx.doi.org/10.1142/s0218194018400156.

Full text

Abstract:

Due to increasing use of third-party libraries because of the increasing complexity of software development, the lack of management of legacy code and the nature of embedded software, the use of third-party libraries which have no source code is increasing. Without the source code, it is difficult to analyze these libraries for vulnerabilities. Therefore, to analyze weaknesses inherent in binary code, various studies have been conducted to perform static analysis using intermediate code. The conversion from binary code to intermediate language differs depending on the execution environment. In this paper, we propose a deep learning-based analysis method to reconstruct missing data types during the compilation process from binary code to intermediate language, and propose a method to generate supervised learning data for deep learning.

APA, Harvard, Vancouver, ISO, and other styles

6

Lo, James Ting-Ho, and Bryce Mackey-Williams Carey. "A Cortical Learning Machine for Learning Real-Valued and Ranked Data." International Journal of Clinical Medicine and Bioengineering 1, no. 1 (December 30, 2021): 12–24. http://dx.doi.org/10.35745/ijcmb2021v01.01.0003.

Full text

Abstract:

The cortical learning machine (CLM) introduced in [1-3] is a low-order computational model of the neocortex. It has the real-time, photogragraphic, unsupervised, and hierarchical learning capabilities, which existing learning machines such as the multilayer perceptron and convolutional neural network do not have. The CLM is a network of processing units (PUs) each comprising novel computational models of dendrites (for encoding), synapses (for storing code covariance matrices), spiking/nonspiking somas (for evaluating empirical probabilities and generating spikes), and unsupervised/supervised Hebbian learning schemes. In this paper, the masking matrix in the CLM in [1-3] is generalized to enable the CLM to learn ranked and real-valued data in the form of the binary numbers and unary (thermometer) codes. The general masking matrix assigns weights to the bits in the binary and unary code to reflect their relative significances. Numerical examples are provided to illustrate that a single PU with the general masking matrix is a pattern recognizer with an efficacy comparable to those of leading statistical and machine learning methods, showing the potential of CLMs with multiple PUs especially in consideration of the aforementioned capabilities of the CLM.

APA, Harvard, Vancouver, ISO, and other styles

7

Shen, Fumin, Xiang Zhou, Yang Yang, Jingkuan Song, Heng Tao Shen, and Dacheng Tao. "A Fast Optimization Method for General Binary Code Learning." IEEE Transactions on Image Processing 25, no. 12 (December 2016): 5610–21. http://dx.doi.org/10.1109/tip.2016.2612883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Do, Thanh-Toan, Tuan Hoang, Dang-Khoa Le Tan, Anh-Dzung Doan, and Ngai-Man Cheung. "Compact Hash Code Learning With Binary Deep Neural Network." IEEE Transactions on Multimedia 22, no. 4 (April 2020): 992–1004. http://dx.doi.org/10.1109/tmm.2019.2935680.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Gao, Hao, Tong Zhang, Songqiang Chen, Lina Wang, and Fajiang Yu. "FUSION: Measuring Binary Function Similarity with Code-Specific Embedding and Order-Sensitive GNN." Symmetry 14, no. 12 (December 2, 2022): 2549. http://dx.doi.org/10.3390/sym14122549.

Full text

Abstract:

Binary code similarity measurement is a popular research area in binary analysis with the recent development of deep learning-based models. Current state-of-the-art methods often use the pre-trained language model (PTLM) to embed instructions into basic blocks as representations of nodes within a control flow graph (CFG). These methods will then use the graph neural network (GNN) to embed the whole CFG and measure the binary similarities between these code embeddings. However, these methods almost directly treat the assembly code as a natural language text and ignore its code-specific features when training PTLM. Moreover, They barely consider the direction of edges in the CFG or consider it less efficient. The weaknesses of the above approaches may limit the performances of previous methods. In this paper, we propose a novel method called function similarity using code-specific PPTs and order-sensitive GNN (FUSION). Since the similarity of binary codes is a symmetric/asymmetric problem, we were guided by the ideas of symmetry and asymmetry in our research. They measure the binary function similarity with two code-specific PTLM training strategies and an order-sensitive GNN, which, respectively, alleviate the aforementioned weaknesses. FUSION outperforms the state-of-the-art binary similarity methods by up to 5.4% in accuracy, and performs significantly better.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhang, Daokun, Jie Yin, Xingquan Zhu, and Chengqi Zhang. "Search Efficient Binary Network Embedding." ACM Transactions on Knowledge Discovery from Data 15, no. 4 (June 2021): 1–27. http://dx.doi.org/10.1145/3436892.

Full text

Abstract:

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE.

APA, Harvard, Vancouver, ISO, and other styles

11

Do, Thanh-Toan, Khoa Le, Tuan Hoang, Huu Le, Tam V. Nguyen, and Ngai-Man Cheung. "Simultaneous Feature Aggregating and Hashing for Compact Binary Code Learning." IEEE Transactions on Image Processing 28, no. 10 (October 2019): 4954–69. http://dx.doi.org/10.1109/tip.2019.2913509.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Tian, Donghai, Xiaoqi Jia, Rui Ma, Shuke Liu, Wenjing Liu, and Changzhen Hu. "BinDeep: A deep learning approach to binary code similarity detection." Expert Systems with Applications 168 (April 2021): 114348. http://dx.doi.org/10.1016/j.eswa.2020.114348.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Weng, Zhenyu, and Yuesheng Zhu. "Online Hashing with Efficient Updating of Binary Codes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12354–61. http://dx.doi.org/10.1609/aaai.v34i07.6920.

Full text

Abstract:

Online hashing methods are efficient in learning the hash functions from the streaming data. However, when the hash functions change, the binary codes for the database have to be recomputed to guarantee the retrieval accuracy. Recomputing the binary codes by accumulating the whole database brings a timeliness challenge to the online retrieval process. In this paper, we propose a novel online hashing framework to update the binary codes efficiently without accumulating the whole database. In our framework, the hash functions are fixed and the projection functions are introduced to learn online from the streaming data. Therefore, inefficient updating of the binary codes by accumulating the whole database can be transformed to efficient updating of the binary codes by projecting the binary codes into another binary space. The queries and the binary code database are projected asymmetrically to further improve the retrieval accuracy. The experiments on two multi-label image databases demonstrate the effectiveness and the efficiency of our method for multi-label image retrieval.

APA, Harvard, Vancouver, ISO, and other styles

14

Wang, Jinpeng, Ziyun Zeng, Bin Chen, Tao Dai, and Shu-Tao Xia. "Contrastive Quantization with Code Memory for Unsupervised Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2468–76. http://dx.doi.org/10.1609/aaai.v36i3.20147.

Full text

Abstract:

The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems. To alleviate the reliance on expensive annotations, unsupervised deep hashing becomes an important research problem. This paper provides a novel solution to unsupervised deep quantization, namely Contrastive Quantization with Code Memory (MeCoQ). Different from existing reconstruction-based strategies, we learn unsupervised binary descriptors by contrastive learning, which can better capture discriminative visual semantics. Besides, we uncover that codeword diversity regularization is critical to prevent contrastive learning-based quantization from model degeneration. Moreover, we introduce a novel quantization code memory module that boosts contrastive learning with lower feature drift than conventional feature memories. Extensive experiments on benchmark datasets show that MeCoQ outperforms state-of-the-art methods. Code and configurations are publicly released.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhuang, Yuan, Baobao Wang, Jianguo Sun, Haoyang Liu, Shuqi Yang, and Qingan Da. "Deep Learning-Based Program-Wide Binary Code Similarity for Smart Contracts." Computers, Materials & Continua 74, no. 1 (2023): 1011–24. http://dx.doi.org/10.32604/cmc.2023.028058.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Georgiopoulos, Michael, Gregory L. Heileman, and Juxin Huang. "Convergence Properties of Learning in ART1." Neural Computation 2, no. 4 (December 1990): 502–9. http://dx.doi.org/10.1162/neco.1990.2.4.502.

Full text

Abstract:

We consider the ART1 neural network architecture. It is shown that in the fast learning case, an ART1 network that is repeatedly presented with an arbitrary list of binary input patterns, self-stabilizes the recognition code of every size-l pattern in at most l list presentations.

APA, Harvard, Vancouver, ISO, and other styles

17

Pholdee, Nantiwat, and Sujin Bureerat. "Estimation of Distribution Algorithm Using Correlation between Binary Elements: A New Binary-Code Metaheuristic." Mathematical Problems in Engineering 2017 (2017): 1–15. http://dx.doi.org/10.1155/2017/6043109.

Full text

Abstract:

A new metaheuristic called estimation of distribution algorithm using correlation between binary elements (EDACE) is proposed. The method searches for optima using a binary string to represent a design solution. A matrix for correlation between binary elements of a design solution is used to represent a binary population. Optimisation search is achieved by iteratively updating such a matrix. The performance assessment is conducted by comparing the new algorithm with existing binary-code metaheuristics including a genetic algorithm, a univariate marginal distribution algorithm, population-based incremental learning, binary particle swarm optimisation, and binary simulated annealing by using the test problems of CEC2015 competition and one real-world application which is an optimal flight control problem. The comparative results show that the new algorithm is competitive with other established binary-code metaheuristics.

APA, Harvard, Vancouver, ISO, and other styles

18

FRIEDRICH, JOHANNES, ROBERT URBANCZIK, and WALTER SENN. "CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS." International Journal of Neural Systems 24, no. 05 (May 30, 2014): 1450002. http://dx.doi.org/10.1142/s0129065714500026.

Full text

Abstract:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

APA, Harvard, Vancouver, ISO, and other styles

19

Mohapatra, Sudhir Kumar, Srinivas Prasad, Dwiti Krishna Bebarta, Tapan Kumar Das, Kathiravan Srinivasan, and Yuh-Chung Hu. "Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques." Applied Sciences 11, no. 18 (September 15, 2021): 8575. http://dx.doi.org/10.3390/app11188575.

Full text

Abstract:

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Yazhou, Bing Li, Yan Zhang, Jiaxin Wu, and Qianya Ma. "A Secure Biometric Key Generation Mechanism via Deep Learning and Its Application." Applied Sciences 11, no. 18 (September 13, 2021): 8497. http://dx.doi.org/10.3390/app11188497.

Full text

Abstract:

Biometric keys are widely used in the digital identity system due to the inherent uniqueness of biometrics. However, existing biometric key generation methods may expose biometric data, which will cause users’ biometric traits to be permanently unavailable in the secure authentication system. To enhance its security and privacy, we propose a secure biometric key generation method based on deep learning in this paper. Firstly, to prevent the information leakage of biometric data, we utilize random binary codes to represent biometric data and adopt a deep learning model to establish the relationship between biometric data and random binary code for each user. Secondly, to protect the privacy and guarantee the revocability of the biometric key, we add a random permutation operation to shuffle the elements of binary code and update a new biometric key. Thirdly, to further enhance the reliability and security of the biometric key, we construct a fuzzy commitment module to generate the helper data without revealing any biometric information during enrollment. Three benchmark datasets including ORL, Extended YaleB, and CMU-PIE are used for evaluation. The experiment results show our scheme achieves a genuine accept rate (GAR) higher than the state-of-the-art methods at a 1% false accept rate (FAR), and meanwhile satisfies the properties of revocability and randomness of biometric keys. The security analyses show that our model can effectively resist information leakage, cross-matching, and other attacks. Moreover, the proposed model is applied to a data encryption scenario in our local computer, which takes less than 0.5 s to complete the whole encryption and decryption at different key lengths.

APA, Harvard, Vancouver, ISO, and other styles

21

Song, Yang, Qiyu Kang, and Wee Peng Tay. "Error-Correcting Output Codes with Ensemble Diversity for Robust Learning in Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 9722–29. http://dx.doi.org/10.1609/aaai.v35i11.17169.

Full text

Abstract:

Though deep learning has been applied successfully in many scenarios, malicious inputs with human-imperceptible perturbations can make it vulnerable in real applications. This paper proposes an error-correcting neural network (ECNN) that combines a set of binary classifiers to combat adversarial examples in the multi-class classification problem. To build an ECNN, we propose to design a code matrix so that the minimum Hamming distance between any two rows (i.e., two codewords) and the minimum shared information distance between any two columns (i.e., two partitions of class labels) are simultaneously maximized. Maximizing row distances can increase the system fault tolerance while maximizing column distances helps increase the diversity between binary classifiers. We propose an end-to-end training method for our ECNN, which allows further improvement of the diversity between binary classifiers. The end-to-end training renders our proposed ECNN different from the traditional error-correcting output code (ECOC) based methods that train binary classifiers independently. ECNN is complementary to other existing defense approaches such as adversarial training and can be applied in conjunction with them. We empirically demonstrate that our proposed ECNN is effective against the state-of-the-art white-box and black-box attacks on several datasets while maintaining good classification accuracy on normal examples.

APA, Harvard, Vancouver, ISO, and other styles

22

Escalada, Javier, Francisco Ortin, and Ted Scully. "An Efficient Platform for the Automatic Extraction of Patterns in Native Code." Scientific Programming 2017 (2017): 1–16. http://dx.doi.org/10.1155/2017/3273891.

Full text

Abstract:

Different software tools, such as decompilers, code quality analyzers, recognizers of packed executable files, authorship analyzers, and malware detectors, search for patterns in binary code. The use of machine learning algorithms, trained with programs taken from the huge number of applications in the existing open source code repositories, allows finding patterns not detected with the manual approach. To this end, we have created a versatile platform for the automatic extraction of patterns from native code, capable of processing big binary files. Its implementation has been parallelized, providing important runtime performance benefits for multicore architectures. Compared to the single-processor execution, the average performance improvement obtained with the best configuration is 3.5 factors over the maximum theoretical gain of 4 factors.

APA, Harvard, Vancouver, ISO, and other styles

23

Keller, Patrick, Abdoul Kader Kaboré, Laura Plein, Jacques Klein, Yves Le Traon, and Tegawendé F. Bissyandé. "What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning." ACM Transactions on Software Engineering and Methodology 31, no. 2 (April 30, 2022): 1–34. http://dx.doi.org/10.1145/3485135.

Full text

Abstract:

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

APA, Harvard, Vancouver, ISO, and other styles

24

Rocha, Kyle Akira, Jeff J. Andrews, Christopher P. L. Berry, Zoheyr Doctor, Aggelos K. Katsaggelos, Juan Gabriel Serra Pérez, Pablo Marchant, et al. "Active Learning for Computationally Efficient Distribution of Binary Evolution Simulations." Astrophysical Journal 938, no. 1 (October 1, 2022): 64. http://dx.doi.org/10.3847/1538-4357/ac8b05.

Full text

Abstract:

Abstract Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observations. Binary population synthesis with full simulation of stellar structure and evolution is computationally expensive, requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations that are interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively target simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only ∼1/4 the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable new population synthesis codes such as POSYDON to cover more input parameters while preserving interpolation accuracies.

APA, Harvard, Vancouver, ISO, and other styles

25

BADUROWICZ, Marcin. "DETECTION OF SOURCE CODE IN INTERNET TEXTS USING AUTOMATICALLY GENERATED MACHINE LEARNING MODELS." Applied Computer Science 18, no. 1 (March 30, 2022): 89–98. http://dx.doi.org/10.35784/acs-2022-7.

Full text

Abstract:

In the paper, the authors are presenting the outcome of web scraping software allowing for the automated classification of source code. The software system was prepared for a discussion forum for software developers to find fragments of source code that were published without marking them as code snippets. The analyzer software is using a Machine Learning binary classification model for differentiating between a programming language source code and highly technical text about software. The analyzer model was prepared using the AutoML subsystem without human intervention and fine-tuning and its accuracy in a described problem exceeds 95%. The analyzer based on the automatically generated model has been deployed and after the first year of continuous operation, its False Positive Rate is less than 3%. The similar process may be introduced in document management in software development process, where automatic tagging and search for code or pseudo-code may be useful for archiving purposes.

APA, Harvard, Vancouver, ISO, and other styles

26

Ren, Yanduo, Jiangbo Qian, Yihong Dong, Yu Xin, and Huahui Chen. "AVBH: Asymmetric Learning to Hash with Variable Bit Encoding." Scientific Programming 2020 (January 21, 2020): 1–11. http://dx.doi.org/10.1155/2020/2424381.

Full text

Abstract:

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem. The AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. The hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. The query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Sangwoo, and Jungwon Cho. "Malware Authorship Attribution Model using Runtime Modules based on Automated Analysis." JOIV : International Journal on Informatics Visualization 6, no. 1-2 (May 31, 2022): 214. http://dx.doi.org/10.30630/joiv.6.1-2.941.

Full text

Abstract:

Malware authorship attribution is a research field that identifies the author of malware by extracting and analyzing features that relate the authors from the source code or binary code of malware. Currently, it is being used as one of the detection techniques based on malware forensics or identifying patterns of continuous attacks such as APT attacks. The analysis methods to identify the author are as follows. One is a source code-based analysis method that extracts features from the source code, and the other is a binary-based analysis method that extracts features from the binary. However, to handle the modularization and the increasing amount of malicious code with these methods, both time and manpower are insufficient to figure out the characteristics of the malware. Therefore, we propose the model for malware authorship attribution by rapidly extracting and analyzing features using automated analysis. Automated analysis uses a tool and can be analyzed through a file of malware and the specific hash values without experts. Furthermore, it is the fastest to figure out among other malware analysis methods. We have experimented by applying various machine learning classification algorithms to six malware author groups, and Runtime Modules and Kernel32.dll API extracted from the automated analysis were selected as features for author identification. The result shows more high accuracy than the previous studies. By using the automated analysis, it extracts features of malware faster than source code and binary-based analysis methods.

APA, Harvard, Vancouver, ISO, and other styles

28

Schmidhuber, Jürgen. "Learning Factorial Codes by Predictability Minimization." Neural Computation 4, no. 6 (November 1992): 863–79. http://dx.doi.org/10.1162/neco.1992.4.6.863.

Full text

Abstract:

I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor, which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter "abstract concepts" out of the environmental input such that these concepts are statistically independent of those on which the other units focus. I discuss various simple yet potentially powerful implementations of the principle that aim at finding binary factorial codes (Barlow et al. 1989), i.e., codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, and (3) novelty detection. Methods for finding factorial codes automatically implement Occam's razor for finding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also nonlinear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The final part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.

APA, Harvard, Vancouver, ISO, and other styles

29

Wang, Zhen, Nannan Wu, Xiaohan Yang, Bingqi Yan, and Pingping Liu. "Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image Retrieval Task." Remote Sensing 13, no. 23 (November 26, 2021): 4786. http://dx.doi.org/10.3390/rs13234786.

Full text

Abstract:

As satellite observation technology rapidly develops, the number of remote sensing (RS) images dramatically increases, and this leads RS image retrieval tasks to be more challenging in terms of speed and accuracy. Recently, an increasing number of researchers have turned their attention to this issue, as well as hashing algorithms, which map real-valued data onto a low-dimensional Hamming space and have been widely utilized to respond quickly to large-scale RS image search tasks. However, most existing hashing algorithms only emphasize preserving point-wise or pair-wise similarity, which may lead to an inferior approximate nearest neighbor (ANN) search result. To fix this problem, we propose a novel triplet ordinal cross entropy hashing (TOCEH). In TOCEH, to enhance the ability of preserving the ranking orders in different spaces, we establish a tensor graph representing the Euclidean triplet ordinal relationship among RS images and minimize the cross entropy between the probability distribution of the established Euclidean similarity graph and that of the Hamming triplet ordinal relation with the given binary code. During the training process, to avoid the non-deterministic polynomial (NP) hard problem, we utilize a continuous function instead of the discrete encoding process. Furthermore, we design a quantization objective function based on the principle of preserving triplet ordinal relation to minimize the loss caused by the continuous relaxation procedure. The comparative RS image retrieval experiments are conducted on three publicly available datasets, including UC Merced Land Use Dataset (UCMD), SAT-4 and SAT-6. The experimental results show that the proposed TOCEH algorithm outperforms many existing hashing algorithms in RS image retrieval tasks.

APA, Harvard, Vancouver, ISO, and other styles

30

Hung, Cheng-An, and Sheng-Fuu Lin. "Supervised Adaptive Hamming Net for Classification of Multiple-Valued Patterns." International Journal of Neural Systems 08, no. 02 (April 1997): 181–200. http://dx.doi.org/10.1142/s0129065797000203.

Full text

Abstract:

A Supervised Adaptive Hamming Net (SAHN) is introduced for incremental learning of recognition categories in response to arbitrary sequence of multiple-valued or binary-valued input patterns. The binary-valued SAHN derived from the Adaptive Hamming Net (AHN) is functionally equivalent to a simplified ARTMAP, which is specifically designed to establish many-to-one mappings. The generalization to learning multiple-valued input patterns is achieved by incorporating multiple-valued logic into the AHN. In this paper, we examine some useful properties of learning in a P-valued SAHN. In particular, an upper bound is derived on the number of epochs required by the P-valued SAHN to learn a list of input-output pairs that is repeatedly presented to the architecture. Furthermore, we connect the P-valued SAHN with the binary-valued SAHN via the thermometer code.

APA, Harvard, Vancouver, ISO, and other styles

31

Naeem, Muhammad Rehan, Rashid Amin, Sultan S. Alshamrani, and Abdullah Alshehri. "Digital Forensics for Malware Classification: An Approach for Binary Code to Pixel Vector Transition." Computational Intelligence and Neuroscience 2022 (April 21, 2022): 1–12. http://dx.doi.org/10.1155/2022/6294058.

Full text

Abstract:

The most often reported danger to computer security is malware. Antivirus company AV-Test Institute reports that more than 5 million malware samples are created each day. A malware classification method is frequently required to prioritize these occurrences because security teams cannot address all of that malware at once. Malware’s variety, volume, and sophistication are all growing at an alarming rate. Hackers and attackers routinely design systems that can automatically rearrange and encrypt their code to escape discovery. Traditional machine learning approaches, in which classifiers learn based on a hand-crafted feature vector, are ineffective for classifying malware. Recently, deep convolutional neural networks (CNNs) successfully identified and classified malware. To categorize malware, a smart system has been suggested in this research. A novel model of deep learning is introduced to categorize malware families and multiclassification. The malware file is converted to a grayscale picture, and the image is then classified using a convolutional neural network. To evaluate the performance of our technique, we used a Microsoft malware dataset of 10,000 samples with nine distinct classifications. The findings stood out among the deep learning models with 99.97% accuracy for nine malware types.

APA, Harvard, Vancouver, ISO, and other styles

32

Midyanti, Dwi Marisa. "Combination of SOM-RBF for drought code prediction using rainfall and air temperature data." Jurnal Teknologi dan Sistem Komputer 8, no. 1 (November 18, 2019): 64–68. http://dx.doi.org/10.14710/jtsiskom.8.1.2020.64-68.

Full text

Abstract:

This study aims to predict Drought Code (DC) in Kabupaten Kubu Raya using a combination of SOM-RBF. The final weight value of SOM was used as a center on the RBF network. The input data variables are rainfall data and air temperature data for three days with three binary outputs to predict DC values. This study also observed the effect of the number of neurons, learning rates, and the number of iterations on the results of the SOM-RBF network training. The smallest MSE of training result from the SOM-RBF network was 0.159933 using 65 neurons in the hidden layer, learning rate 0.007, and epoch 45000. The detection accuracy of SOM-RBF was 91.34 % from 245 test data.

APA, Harvard, Vancouver, ISO, and other styles

33

Jiang, Jian, and Fen Zhang. "Detecting Portable Executable Malware by Binary Code Using an Artificial Evolutionary Fuzzy LSTM Immune System." Security and Communication Networks 2021 (July 7, 2021): 1–12. http://dx.doi.org/10.1155/2021/3578695.

Full text

Abstract:

As the planet watches in shock the evolution of the COVID-19 pandemic, new forms of sophisticated, versatile, and extremely difficult-to-detect malware expose society and especially the global economy. Machine learning techniques are posing an increasingly important role in the field of malware identification and analysis. However, due to the complexity of the problem, the training of intelligent systems proves to be insufficient in recognizing advanced cyberthreats. The biggest challenge in information systems security using machine learning methods is to understand the polymorphism and metamorphism mechanisms used by malware developers and how to effectively address them. This work presents an innovative Artificial Evolutionary Fuzzy LSTM Immune System which, by using a heuristic machine learning method that combines evolutionary intelligence, Long-Short-Term Memory (LSTM), and fuzzy knowledge, proves to be able to adequately protect modern information system from Portable Executable Malware. The main innovation in the technical implementation of the proposed approach is the fact that the machine learning system can only be trained from raw bytes of an executable file to determine if the file is malicious. The performance of the proposed system was tested on a sophisticated dataset of high complexity, which emerged after extensive research on PE malware that offered us a realistic representation of their operating states. The high accuracy of the developed model significantly supports the validity of the proposed method. The final evaluation was carried out with in-depth comparisons to corresponding machine learning algorithms and it has revealed the superiority of the proposed immune system.

APA, Harvard, Vancouver, ISO, and other styles

34

Zhang, Zheng, Xiaofeng Zhu, Guangming Lu, and Yudong Zhang. "Probability Ordinal-Preserving Semantic Hashing for Large-Scale Image Retrieval." ACM Transactions on Knowledge Discovery from Data 15, no. 3 (April 12, 2021): 1–22. http://dx.doi.org/10.1145/3442204.

Full text

Abstract:

Semantic hashing enables computation and memory-efficient image retrieval through learning similarity-preserving binary representations. Most existing hashing methods mainly focus on preserving the piecewise class information or pairwise correlations of samples into the learned binary codes while failing to capture the mutual triplet-level ordinal structure in similarity preservation. In this article, we propose a novel Probability Ordinal-preserving Semantic Hashing (POSH) framework, which for the first time defines the ordinal-preserving hashing concept under a non-parametric Bayesian theory. Specifically, we derive the whole learning framework of the ordinal similarity-preserving hashing based on the maximum posteriori estimation, where the probabilistic ordinal similarity preservation, probabilistic quantization function, and probabilistic semantic-preserving function are jointly considered into one unified learning framework. In particular, the proposed triplet-ordering correlation preservation scheme can effectively improve the interpretation of the learned hash codes under an economical anchor-induced asymmetric graph learning model. Moreover, the sparsity-guided selective quantization function is designed to minimize the loss of space transformation, and the regressive semantic function is explored to promote the flexibility of the formulated semantics in hash code learning. The final joint learning objective is formulated to concurrently preserve the ordinal locality of original data and explore potentials of semantics for producing discriminative hash codes. Importantly, an efficient alternating optimization algorithm with the strictly proof convergence guarantee is developed to solve the resulting objective problem. Extensive experiments on several large-scale datasets validate the superiority of the proposed method against state-of-the-art hashing-based retrieval methods.

APA, Harvard, Vancouver, ISO, and other styles

35

Panthaplackel, Sheena, Milos Gligoric, Raymond J. Mooney, and Junyi Jessy Li. "Associating Natural Language Comment and Source Code Entities." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8592–99. http://dx.doi.org/10.1609/aaai.v34i05.6382.

Full text

Abstract:

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

APA, Harvard, Vancouver, ISO, and other styles

36

Tang, Xu, Chao Liu, Jingjing Ma, Xiangrong Zhang, Fang Liu, and Licheng Jiao. "Large-Scale Remote Sensing Image Retrieval Based on Semi-Supervised Adversarial Hashing." Remote Sensing 11, no. 17 (September 1, 2019): 2055. http://dx.doi.org/10.3390/rs11172055.

Full text

Abstract:

Remote sensing image retrieval (RSIR), a superior content organization technique, plays an important role in the remote sensing (RS) community. With the number of RS images increases explosively, not only the retrieval precision but also the retrieval efficiency is emphasized in the large-scale RSIR scenario. Therefore, the approximate nearest neighborhood (ANN) search attracts the researchers’ attention increasingly. In this paper, we propose a new hash learning method, named semi-supervised deep adversarial hashing (SDAH), to accomplish the ANN for the large-scale RSIR task. The assumption of our model is that the RS images have been represented by the proper visual features. First, a residual auto-encoder (RAE) is developed to generate the class variable and hash code. Second, two multi-layer networks are constructed to regularize the obtained latent vectors using the prior distribution. These two modules mentioned are integrated under the generator adversarial framework. Through the minimax learning, the class variable would be a one-hot-like vector while the hash code would be the binary-like vector. Finally, a specific hashing function is formulated to enhance the quality of the generated hash code. The effectiveness of the hash codes learned by our SDAH model was proved by the positive experimental results counted on three public RS image archives. Compared with the existing hash learning methods, the proposed method reaches improved performance.

APA, Harvard, Vancouver, ISO, and other styles

37

Akgun, Devrim. "PyTorch Operations Based Approach for Computing Local Binary Patterns." U.Porto Journal of Engineering 7, no. 4 (November 26, 2021): 61–69. http://dx.doi.org/10.24840/2183-6493_007.004_0005.

Full text

Abstract:

Advances in machine learning frameworks like PyTorch provides users with various machine learning algorithms together with general purpose operations. PyTorch framework provides Numpy like functions and makes it practical to use computational resources for accelerating computations. Also users may define their custom layers or operations for feature extraction algorithms based on the tensor operations. In this paper, Local Binary Patterns (LBP) which is one of the important feature extraction approaches in computer vision were realized using tensor operations of PyTorch framework. The algorithm was written both using Python code with standard libraries and tensor operations of PyTorch in Python. According to experimental measurements which were realized for various batches of images, the algorithm based on tensor operations considerably reduced the computation time and provides significant accelerations over Python implementation with standard libraries.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Ling-qing, Mei-ting Wu, Li-fang Pan, and Ru-bin Zheng. "Grade Prediction in Blended Learning Using Multisource Data." Scientific Programming 2021 (September 11, 2021): 1–15. http://dx.doi.org/10.1155/2021/4513610.

Full text

Abstract:

Today, blended learning is widely carried out in many colleges. Different online learning platforms have accumulated a large number of fine granularity records of students’ learning behavior, which provides us with an excellent opportunity to analyze students’ learning behavior. In this paper, based on the behavior log data in four consecutive years of blended learning in a college’s programming course, we propose a novel multiclassification frame to predict students’ learning outcomes. First, the data obtained from diverse platforms, i.e., MOOC, Cnblogs, Programming Teaching Assistant (PTA) system, and Rain Classroom, are integrated and preprocessed. Second, a novel error-correcting output codes (ECOC) multiclassification framework, based on genetic algorithm (GA) and ternary bitwise calculator, is designed to effectively predict the grade levels of students by optimizing the code-matrix, feature subset, and binary classifiers of ECOC. Experimental results show that the proposed algorithm in this paper significantly outperforms other alternatives in predicting students’ grades. In addition, the performance of the algorithm can be further improved by adding the grades of prerequisite courses.

APA, Harvard, Vancouver, ISO, and other styles

39

Li, Jun Yi, and Jian Hua Li. "Fast Image Search with Pixel-Based Deep Learning Framework via Efficient Compact Binary Code and Addictive Latent Layer." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 03 (November 22, 2017): 1859004. http://dx.doi.org/10.1142/s0218001418590048.

Full text

Abstract:

As we know, the nearest neighbor search is a good and effective method for good-sized image search. This paper mainly introduced how to learn an outstanding image feature representation form and a series of compact binary Hash coding functions under deep learning framework. Our concept is that binary codes can be obtained using a hidden layer to present some latent concepts dominating the class labels with usable data labels. Our method is effective in obtaining hash codes and image representations, so it is suitable for good-sized dataset. It is demonstrated in our experiment that the performances of the proposed algorithms were then verified on three different databases, MNIST, CIFAR-10 and Caltech-101. The experimental results reveal that two-proposed image Hash retrieval algorithm based on pixel-level automatic feature learning show higher search accuracy than the other algorithms; moreover, these two algorithms were proved to be more favorable in scalability and generality.

APA, Harvard, Vancouver, ISO, and other styles

40

Kong, Vungsovanreach, Oui Somakhamixay, Wan-Sup Cho, Gilwon Kang, Heesun Won, HyungChul Rah, and Heui Je Bang. "Recurrence risk prediction of acute coronary syndrome per patient as a personalized ACS recurrence risk: a retrospective study." PeerJ 10 (November 15, 2022): e14348. http://dx.doi.org/10.7717/peerj.14348.

Full text

Abstract:

Acute coronary syndrome (ACS) has been one of the most important issues in global public health. The high recurrence risk of patients with coronary heart disease (CHD) has led to the importance of post-discharge care and secondary prevention of CHD. Previous studies provided binary results of ACS recurrence risk; however, studies providing the recurrence risk of an individual patient are rare. In this study, we conducted a model which provides the recurrence risk probability for each patient, along with the binary result, with two datasets from the Korea Health Insurance Review and Assessment Service and Chungbuk National University Hospital. The total data of 6,535 patients who had been diagnosed with ACS were used to build a machine learning model by using logistic regression. Data including age, gender, procedure codes, procedure reason, prescription drug codes, and condition codes were used as the model predictors. The model performance showed 0.893, 0.894, 0.851, 0.869, and 0.921 for accuracy, precision, recall, F1-score, and AUC, respectively. Our model provides the ACS recurrence probability of each patient as a personalized ACS recurrence risk, which may help motivate the patient to reduce their own ACS recurrence risk. The model also shows that acute transmural myocardial infarction of an unspecified site, and other sites and acute transmural myocardial infarction of an unspecified site contributed most significantly to ACS recurrence with an odds ratio of 97.908 as a procedure reason code and with an odds ratio of 58.215 as a condition code, respectively.

APA, Harvard, Vancouver, ISO, and other styles

41

Zhao, Yujie, Zhanyong Tang, Guixin Ye, Xiaoqing Gong, and Dingyi Fang. "Input-Output Example-Guided Data Deobfuscation on Binary." Security and Communication Networks 2021 (December 13, 2021): 1–16. http://dx.doi.org/10.1155/2021/4646048.

Full text

Abstract:

Data obfuscation is usually used by malicious software to avoid detection and reverse analysis. When analyzing the malware, such obfuscations have to be removed to restore the program into an easier understandable form (deobfuscation). The deobfuscation based on program synthesis provides a good solution for treating the target program as a black box. Thus, deobfuscation becomes a problem of finding the shortest instruction sequence to synthesize a program with the same input-output behavior as the target program. Existing work has two limitations: assuming that obfuscated code snippets in the target program are known and using a stochastic search algorithm resulting in low efficiency. In this paper, we propose fine-grained obfuscation detection for locating obfuscated code snippets by machine learning. Besides, we also combine the program synthesis and a heuristic search algorithm of Nested Monte Carlo Search. We have applied a prototype implementation of our ideas to data obfuscation in different tools, including OLLVM and Tigress. Our experimental results suggest that this approach is highly effective in locating and deobfuscating the binaries with data obfuscation, with an accuracy of at least 90.34%. Compared with the state-of-the-art deobfuscation technique, our approach’s efficiency has increased by 75%, with the success rate increasing by 5%.

APA, Harvard, Vancouver, ISO, and other styles

42

Kiger, John, Shen-Shyang Ho, and Vahid Heydari. "Malware Binary Image Classification Using Convolutional Neural Networks." International Conference on Cyber Warfare and Security 17, no. 1 (March 2, 2022): 469–78. http://dx.doi.org/10.34190/iccws.17.1.59.

Full text

Abstract:

The persistent shortage of cybersecurity professionals combined with enterprise networks tasked with processing more data than ever before has led many cybersecurity experts to consider automating some of the most common and time-consuming security tasks using machine learning. One of these cybersecurity tasks where machine learning may prove advantageous is malware analysis and classification. To evade traditional detection techniques, malware developers are creating more complex malware. This is achieved through more advanced methods of code obfuscation and conducting more sophisticated attacks. This can make the manual process of analyzing malware an infinitely more complex task. Furthermore, the proliferation of malicious files and new malware signatures increases year by year. As of March 2020, the total number of new malware detections worldwide amounted to 677.66 million programs. In 2020, there was a 35.4% increase in new malware variants over the previous year. This paper examines the viability of classifying malware binaries represented as fixed-size grayscale using convolutional neural networks. Several Convolutional Neural Network (CNN) architectures are evaluated on multiple performance metrics to analyze their effectiveness at solving this classification problem.

APA, Harvard, Vancouver, ISO, and other styles

43

Leventi-Peetz, Anastasia-Maria, and Kai Weber. "Probabilistic machine learning for breast cancer classification." Mathematical Biosciences and Engineering 20, no. 1 (2022): 624–55. http://dx.doi.org/10.3934/mbe.2023029.

Full text

Abstract:

<abstract><p>A probabilistic neural network has been implemented to predict the malignancy of breast cancer cells, based on a data set, the features of which are used for the formulation and training of a model for a binary classification problem. The focus is placed on considerations when building the model, in order to achieve not only accuracy but also a safe quantification of the expected uncertainty of the calculated network parameters and the medical prognosis. The source code is included to make the results reproducible, also in accordance with the latest trending in machine learning research, named <italic>Papers with Code</italic>. The various steps taken for the code development are introduced in detail but also the results are visually displayed and critically analyzed also in the sense of explainable artificial intelligence. In statistical-classification problems, the decision boundary is the region of the problem space in which the classification label of the classifier is ambiguous. Problem aspects and model parameters which influence the decision boundary are a special aspect of practical investigation considered in this work. Classification results issued by technically transparent machine learning software can inspire more confidence, as regards their trustworthiness which is very important, especially in the case of medical prognosis. Furthermore, transparency allows the user to adapt models and learning processes to the specific needs of a problem and has a boosting influence on the development of new methods in relevant machine learning fields (transfer learning).</p></abstract>

APA, Harvard, Vancouver, ISO, and other styles

44

Bajaber, Asrar, and Lamiaa Elrefaei. "Biometric Template Protection for Dynamic Touch Gestures Based on Fuzzy Commitment Scheme and Deep Learning." Mathematics 10, no. 3 (January 25, 2022): 362. http://dx.doi.org/10.3390/math10030362.

Full text

Abstract:

Privacy plays an important role in biometric authentication systems. Touch authentication systems have been widely used since touch devices reached their current level of development. In this work, a fuzzy commitment scheme (FCS) is proposed based on deep learning (DL) to protect the touch-gesture template in a touch authentication system. The binary Bose–Ray-Chaudhuri code (BCH) is used with FCS to deal with touch variations. The BCH code is described by the triplet (n, k, t) where n denotes the code word’s length, k denotes the length of the key and t denotes error-correction capability. In our proposed system, the system performance is investigated using different lengths k. The learning-based approach is applied to extract touch features from raw touch data, as the recurrent neural network (RNN) is used based on a convolutional neural network (CNN). The proposed system has been evaluated on two different touch datasets: the Touchalytics dataset and BioIdent dataset. The best results obtained were with a key length k = 99 and n = 255; the false accept rate (FAR) was 0.00 and false reject rate (FRR) was 0.5854 for the Touchalytics dataset, while the FAR was 0.00 and FRR was 0.5399 with the BioIdent dataset. The FCS shows its effectiveness in dynamic authentication systems, as good results are obtained and compared with other works.

APA, Harvard, Vancouver, ISO, and other styles

45

Москаленко, В’ячеслав Васильович, Микола Олександрович Зарецький, Артем Геннадійович Коробов, Ярослав Юрійович Ковальський, Артур Фанісович Шаєхов, Віктор Анатолійович Семашко, and Андрій Олександрович Панич. "Модель та метод навчання для класифікаційного аналізу рівня води в стічних трубах за даними відео інспекції." RADIOELECTRONIC AND COMPUTER SYSTEMS, no. 2 (June 2, 2021): 4–15. http://dx.doi.org/10.32620/reks.2021.2.01.

Full text

Abstract:

Models and training methods for water-level classification analysis on the footage of sewage pipe inspections have been developed and investigated. The object of the research is the process of water-level recognition, considering the spatial and temporal context during the inspection of sewage pipes. The subject of the research is a model and machine learning method for water-level classification analysis on video sequences of pipe inspections under conditions of limited size and an unbalanced set of training data. A four-stage algorithm for training the classifier is proposed. At the first stage of training, training occurs with a softmax triplet loss function and a regularizing component to penalize the rounding error of the network output to a binary code. The next step is to define a binary code (reference vector) for each class according to the principles of error-correcting output codes, but considering the intraclass and interclass relations. The computed reference vector of each class is used as the target label of the sample for further training using the joint cross-entropy loss function. The last stage of machine learning involves optimizing the parameters of the decision rules based on the information criterion to account for the boundaries of deviation of the binary representation of the observations of each class from the corresponding reference vectors. As a classifier model, a combination of 2D convolutional feature extractor for each frame and temporal network to analyze inter-frame dependencies is considered. The different variants of the temporal network are compared. We consider a 1D regular convolutional network with dilated convolutions, 1D causal convolutional network with dilated convolutions, recurrent LSTM-network, recurrent GRU-network. The performance of the models is compared by the micro-averaged metric F1 computed on the test subset. The results obtained on the dataset from Ace Pipe Cleaning (Kansas City, USA) confirm the suitability of the model and training method for practical use, the obtained value of F1-metric is 0.88. The results of training by the proposed method were compared with the results obtained using the traditional method. It was shown that the proposed method provides a 9 % increase in the value of micro-averaged F1-measure.

APA, Harvard, Vancouver, ISO, and other styles

46

Москаленко, В’ячеслав Васильович, Микола Олександрович Зарецький, Ярослав Юрійович Ковальський, and Сергій Сергійович Мартиненко. "МОДЕЛЬ І МЕТОД НАВЧАННЯ КЛАСИФІКАТОРА КОНТЕКСТІВ СПОСТЕРЕЖЕННЯ НА ЗОБРАЖЕННЯХ ВІДЕОІНСПЕКЦІЇ СТІЧНИХ ТРУБ." RADIOELECTRONIC AND COMPUTER SYSTEMS, no. 3 (September 28, 2020): 59–66. http://dx.doi.org/10.32620/reks.2020.3.06.

Full text

Abstract:

Video inspection is often used to diagnose sewer pipe defects. To correctly encode founded defects according to existing standards, it is necessary to consider a lot of contextual information about the orientation and location of the camera from sewer pipe video inspection. A model for the classification of context on frames during observations in the video inspection of sewer pipes and a five-stage method of machine learning is proposed. The main idea of the proposed approach is to combine the methods of deep machine learning with the principles of information maximization and coding with self-correcting Hamming codes. The proposed model consists of a deep convolutional neural network with a sigmoid layer followed by the rounding output layer and information-extreme decision rules. The first stages of the method are data augmentation and training of the feature extractor in the Siamese model with softmax triplet loss function. The next steps involve calculating a binary code for each class of recognition that is used as a label in learning with a binary cross-entropy loss function to increase the compactness of the distribution of each class's observations in the Hamming binary space. At the last stage of the training method, it is supposed to optimize the parameters of radial-basis decision rules in the Hamming space for each class according to the existing information-extreme criterion. The information criterion, expressed as a logarithmic function of the accuracy characteristics of the decision rules, provides the maximum generalization and reliability of the model under the most difficult conditions in the statistical sense. The effectiveness of this approach was tested on data provided by Ace Pipe Cleaning (Kansas City, USA) and MPWiK (Wroclaw, Poland) by comparing learning results according to the proposed and traditional models and training schemes. The obtained model of the image frame classifier provides acceptable for practical use classification accuracy on the test sample, which is 96.8 % and exceeds the result of the traditional scheme of training with the softmax output layer by 6.8 %.

APA, Harvard, Vancouver, ISO, and other styles

47

Pachganov, Stepan, Khalimat Murtazalieva, Aleksei Zarubin, Dmitry Sokolov, Duane R. Chartier, and Tatiana V. Tatarinova. "TransPrise: a novel machine learning approach for eukaryotic promoter prediction." PeerJ 7 (November 1, 2019): e7990. http://dx.doi.org/10.7717/peerj.7990.

Full text

Abstract:

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise—an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Xize, Xiaoyu Qu, Yufeng Qian, and Yiyao Zhang. "Music Recognition Using Blockchain Technology and Deep Learning." Computational Intelligence and Neuroscience 2022 (August 8, 2022): 1–13. http://dx.doi.org/10.1155/2022/7025338.

Full text

Abstract:

The purposes are to recognize and classify different music characteristics and strengthen the copyright protection system for original digital music in the big data era. Deep learning (DL) and blockchain technology are applied and researched herein. Based on CNN (Convolutional Neural Network), a music recognition method combined with hashing learning is proposed. The error generated when outputting the binary hash code is considered, and the semantic similarity of the hash code is ensured. Besides, the application of blockchain technology in the current intellectual property protection in original music is discussed. According to digital music property rights protection needs, the system is divided into modules, and its functions are designed. The system ensures its various functions by applying the application protocol designed in the Algor and network. In the experiments, the MagnaTagATune dataset is selected to verify the performance of the proposed CRNNH (Convolutional Recurrent Neural Network Hashing) algorithm. The algorithm shows the best music recognition performance under different bit numbers. When the number of connections is about 100, the QPS value of the blockchain-based music property rights protection system can be stabilized at about 20,000. At any number of threads, the system pressure will increase dramatically with the increase in the number of analog connections. The music recognition algorithm based on DL and hash method discussed is of great significance in improving the classification accuracy of music recognition. The application of blockchain technology in the copyright protection platform of original music works can protect the copyright of digital music and ensure the operation performance of the system.

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Xinyue, Qinghua Li, and Yuangang Li. "Count-Based Exploration via Embedded State Space for Deep Reinforcement Learning." Wireless Communications and Mobile Computing 2022 (May 17, 2022): 1–8. http://dx.doi.org/10.1155/2022/1238571.

Full text

Abstract:

Count-based exploration algorithms have shown to be effective in dealing with various deep reinforcement learning tasks. However, existing count-based exploration algorithms cannot work well in high-dimensional state space due to the complexity of state representation. In this paper, we propose a novel count-based exploration method, which can explore high-dimensional continuous state space and combine with any reinforcement learning algorithms. Specifically, by introducing the embedding network to encode the state space and to merge the states with similar key characteristics, we can compress the high-dimensional state space. By utilizing the state binary code to count the occurrence number of states, we generate additional rewards which can encourage the agent to explore the environment. Extensive experimental results on several commonly used environments show that our proposed method outperforms other strong baselines significantly.

APA, Harvard, Vancouver, ISO, and other styles

50

Lin, Chun, Yijia Xu, Yong Fang, and Zhonglin Liu. "VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application." Applied Sciences 13, no. 2 (January 6, 2023): 825. http://dx.doi.org/10.3390/app13020825.

Full text

Abstract:

Following advances in machine learning and deep learning processing, cyber security experts are committed to creating deep intelligent approaches for automatically detecting software vulnerabilities. Nowadays, many practices are for C and C++ programs, and methods rarely target PHP application. Moreover, many of these methods use LSTM (Long Short-Term Memory) but not GNN (Graph Neural Networks) to learn the token dependencies within the source code through different transformations. That may lose a lot of semantic information in terms of code representation. This article presents a novel Graph Neural Network vulnerability detection approach, VulEye, for PHP applications. VulEye can assist security researchers in finding vulnerabilities in PHP projects quickly. VulEye first constructs the PDG (Program Dependence Graph) of the PHP source code, slices PDG with sensitive functions contained in the source code into sub-graphs called SDG (Sub-Dependence Graph), and then makes SDG the model input to train with a Graph Neural Network model which contains three stack units with a GCN layer, Top-k pooling layer, and attention layer, and finally uses MLP (Multi-Layer Perceptron) and softmax as a classifier to predict if the SDG is vulnerable. We evaluated VulEye on the PHP vulnerability test suite in Software Assurance Reference Dataset. The experiment reports show that the best macro-average F1 score of the VulEye reached 99% in the binary classification task and 95% in the multi-classes classification task. VulEye achieved the best result compared with the existing open-source vulnerability detection implements and other state-of-art deep learning models. Moreover, VulEye can also locate the precise area of the flaw, since our SDG contains code slices closely related to vulnerabilities with a key triggering sensitive/sink function.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!