To see the other types of publications on this topic, follow the link: Backdoor attacks.

Journal articles on the topic 'Backdoor attacks'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Backdoor attacks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhu, Biru, Ganqu Cui, Yangyi Chen, Yujia Qin, Lifan Yuan, Chong Fu, Yangdong Deng, Zhiyuan Liu, Maosong Sun, and Ming Gu. "Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training." Transactions of the Association for Computational Linguistics 11 (2023): 1608–23. http://dx.doi.org/10.1162/tacl_a_00622.

Full text
Abstract:
Abstract Recent research has revealed that pre-trained models (PTMs) are vulnerable to backdoor attacks before the fine-tuning stage. The attackers can implant transferable task-agnostic backdoors in PTMs, and control model outputs on any downstream task, which poses severe security threats to all downstream applications. Existing backdoor-removal defenses focus on task-specific classification models and they are not suitable for defending PTMs against task-agnostic backdoor attacks. To this end, we propose the first task-agnostic backdoor removal method for PTMs. Based on the selective activation phenomenon in backdoored PTMs, we design a simple and effective backdoor eraser, which continually pre-trains the backdoored PTMs with a regularization term in an end-to-end approach. The regularization term removes backdoor functionalities from PTMs while the continual pre-training maintains the normal functionalities of PTMs. We conduct extensive experiments on pre-trained models across different modalities and architectures. The experimental results show that our method can effectively remove backdoors inside PTMs and preserve benign functionalities of PTMs with a few downstream-task-irrelevant auxiliary data, e.g., unlabeled plain texts. The average attack success rate on three downstream datasets is reduced from 99.88% to 8.10% after our defense on the backdoored BERT. The codes are publicly available at https://github.com/thunlp/RECIPE.
APA, Harvard, Vancouver, ISO, and other styles
2

Yuan, Guotao, Hong Huang, and Xin Li. "Self-supervised learning backdoor defense mixed with self-attention mechanism." Journal of Computing and Electronic Information Management 12, no. 2 (March 30, 2024): 81–88. http://dx.doi.org/10.54097/7hx9afkw.

Full text
Abstract:
Recent studies have shown that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors into the DNN models by poisoning a small number of training samples. The attacked models perform normally on benign samples, but when the backdoor is activated, their prediction results will be maliciously altered. To address the issues of suboptimal backdoor defense effectiveness and limited generality, a hybrid self-attention mechanism-based self-supervised learning method for backdoor defense is proposed. This method defends against backdoor attacks by leveraging the attack characteristics of backdoor threats, aiming to mitigate their impact. It adopts a decoupling approach, disconnecting the association between poisoned samples and target labels, and enhances the connection between feature labels and clean labels by optimizing the feature extractor. Experimental results on CIFAR-10 and CIFAR-100 datasets show that this method performs moderately in terms of Clean Accuracy (CA), ranking at the median level. However, it achieves significant effectiveness in reducing the Attack Success Rate (ASR), especially against BadNets and Blended attacks, where its defense capability is notably superior to other methods, with attack success rates below 2%.
APA, Harvard, Vancouver, ISO, and other styles
3

Saha, Aniruddha, Akshayvarun Subramanya, and Hamed Pirsiavash. "Hidden Trigger Backdoor Attacks." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11957–65. http://dx.doi.org/10.1609/aaai.v34i07.6871.

Full text
Abstract:
With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
APA, Harvard, Vancouver, ISO, and other styles
4

Duan, Qiuyu, Zhongyun Hua, Qing Liao, Yushu Zhang, and Leo Yu Zhang. "Conditional Backdoor Attack via JPEG Compression." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 18 (March 24, 2024): 19823–31. http://dx.doi.org/10.1609/aaai.v38i18.29957.

Full text
Abstract:
Deep neural network (DNN) models have been proven vulnerable to backdoor attacks. One trend of backdoor attacks is developing more invisible and dynamic triggers to make attacks stealthier. However, these invisible and dynamic triggers can be inadvertently mitigated by some widely used passive denoising operations, such as image compression, making the efforts under this trend questionable. Another trend is to exploit the full potential of backdoor attacks by proposing new triggering paradigms, such as hibernated or opportunistic backdoors. In line with these trends, our work investigates the first conditional backdoor attack, where the backdoor is activated by a specific condition rather than pre-defined triggers. Specifically, we take the JPEG compression as our condition and jointly optimize the compression operator and the target model's loss function, which can force the target model to accurately learn the JPEG compression behavior as the triggering condition. In this case, besides the conditional triggering feature, our attack is also stealthy and robust to denoising operations. Extensive experiments on the MNIST, GTSRB and CelebA verify our attack's effectiveness, stealthiness and resistance to existing backdoor defenses and denoising operations. As a new triggering paradigm, the conditional backdoor attack brings a new angle for assessing the vulnerability of DNN models, and conditioned over JPEG compression magnifies its threat due to the universal usage of JPEG.
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Zihao, Tianhao Wang, Mengdi Huai, and Chenglin Miao. "Backdoor Attacks via Machine Unlearning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 13 (March 24, 2024): 14115–23. http://dx.doi.org/10.1609/aaai.v38i13.29321.

Full text
Abstract:
As a new paradigm to erase data from a model and protect user privacy, machine unlearning has drawn significant attention. However, existing studies on machine unlearning mainly focus on its effectiveness and efficiency, neglecting the security challenges introduced by this technique. In this paper, we aim to bridge this gap and study the possibility of conducting malicious attacks leveraging machine unlearning. Specifically, we consider the backdoor attack via machine unlearning, where an attacker seeks to inject a backdoor in the unlearned model by submitting malicious unlearning requests, so that the prediction made by the unlearned model can be changed when a particular trigger presents. In our study, we propose two attack approaches. The first attack approach does not require the attacker to poison any training data of the model. The attacker can achieve the attack goal only by requesting to unlearn a small subset of his contributed training data. The second approach allows the attacker to poison a few training instances with a pre-defined trigger upfront, and then activate the attack via submitting a malicious unlearning request. Both attack approaches are proposed with the goal of maximizing the attack utility while ensuring attack stealthiness. The effectiveness of the proposed attacks is demonstrated with different machine unlearning algorithms as well as different models on different datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Tong, Yuan Yao, Feng Xu, Miao Xu, Shengwei An, and Ting Wang. "Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 24, 2024): 274–82. http://dx.doi.org/10.1609/aaai.v38i1.27780.

Full text
Abstract:
Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTInspector against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTInspector then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.
APA, Harvard, Vancouver, ISO, and other styles
7

Huynh, Tran, Dang Nguyen, Tung Pham, and Anh Tran. "COMBAT: Alternated Training for Effective Clean-Label Backdoor Attacks." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2436–44. http://dx.doi.org/10.1609/aaai.v38i3.28019.

Full text
Abstract:
Backdoor attacks pose a critical concern to the practice of using third-party data for AI development. The data can be poisoned to make a trained model misbehave when a predefined trigger pattern appears, granting the attackers illegal benefits. While most proposed backdoor attacks are dirty-label, clean-label attacks are more desirable by keeping data labels unchanged to dodge human inspection. However, designing a working clean-label attack is a challenging task, and existing clean-label attacks show underwhelming performance. In this paper, we propose a novel mechanism to develop clean-label attacks with outstanding attack performance. The key component is a trigger pattern generator, which is trained together with a surrogate model in an alternating manner. Our proposed mechanism is flexible and customizable, allowing different backdoor trigger types and behaviors for either single or multiple target labels. Our backdoor attacks can reach near-perfect attack success rates and bypass all state-of-the-art backdoor defenses, as illustrated via comprehensive experiments on standard benchmark datasets. Our code is available at https://github.com/VinAIResearch/COMBAT.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Xianda, Baolin Zheng, Jianbao Hu, Chengyang Li, and Xiaoying Bai. "From Toxic to Trustworthy: Using Self-Distillation and Semi-supervised Methods to Refine Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16873–80. http://dx.doi.org/10.1609/aaai.v38i15.29629.

Full text
Abstract:
Despite the tremendous success of deep neural networks (DNNs) across various fields, their susceptibility to potential backdoor attacks seriously threatens their application security, particularly in safety-critical or security-sensitive ones. Given this growing threat, there is a pressing need for research into purging backdoors from DNNs. However, prior efforts on erasing backdoor triggers not only failed to withstand increasingly powerful attacks but also resulted in reduced model performance. In this paper, we propose From Toxic to Trustworthy (FTT), an innovative approach to eliminate backdoor triggers while simultaneously enhancing model accuracy. Following the stringent and practical assumption of limited availability of clean data, we introduce a self-attention distillation (SAD) method to remove the backdoor by aligning the shallow and deep parts of the network. Furthermore, we first devise a semi-supervised learning (SSL) method that leverages ubiquitous and available poisoned data to further purify backdoors and improve accuracy. Extensive experiments on various attacks and models have shown that our FTT can reduce the attack success rate from 97% to 1% and improve the accuracy of 4% on average, demonstrating its effectiveness in mitigating backdoor attacks and improving model performance. Compared to state-of-the-art (SOTA) methods, our FTT can reduce the attack success rate by 2 times and improve the accuracy by 5%, shedding light on backdoor cleansing.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Tao, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, and Wu Yang. "Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21359–67. http://dx.doi.org/10.1609/aaai.v38i19.30131.

Full text
Abstract:
Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted, we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, post-attack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Lei, Ya Peng, Lifei Wei, Congcong Chen, and Xiaoyu Zhang. "DeepDefense: A Steganalysis-Based Backdoor Detecting and Mitigating Protocol in Deep Neural Networks for AI Security." Security and Communication Networks 2023 (May 9, 2023): 1–12. http://dx.doi.org/10.1155/2023/9308909.

Full text
Abstract:
Backdoor attacks have been recognized as a major AI security threat in deep neural networks (DNNs) recently. The attackers inject backdoors into DNNs during the model training such as federated learning. The infected model behaves normally on the clean samples in AI applications while the backdoors are only activated by the predefined triggers and resulted in the specified results. Most of the existing defensing approaches assume that the trigger settings on different poisoned samples are visible and identical just like a white square in the corner of the image. Besides, the sample-specific triggers are always invisible and difficult to detect in DNNs, which also becomes a great challenge against the existing defensing protocols. In this paper, to address the above problems, we propose a backdoor detecting and mitigating protocol based on a wider separate-then-reunion network (WISERNet) equipped with a cryptographic deep steganalyzer for color images, which detects the backdoors hiding behind the poisoned samples even if the embedding algorithm is unknown and further feeds the poisoned samples into the infected model for backdoor unlearning and mitigation. The experimental results show that our work performs better in the backdoor defensing effect compared to state-of-the-art backdoor defensing methods such as fine-pruning and ABL against three typical backdoor attacks. Our protocol reduces the attack success rate close to 0% on the test data and slightly decreases the classification accuracy on the clean samples within 3%.
APA, Harvard, Vancouver, ISO, and other styles
11

Huang, Yihao, Felix Juefei-Xu, Qing Guo, Jie Zhang, Yutong Wu, Ming Hu, Tianlin Li, Geguang Pu, and Yang Liu. "Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21169–78. http://dx.doi.org/10.1609/aaai.v38i19.30110.

Full text
Abstract:
Although recent personalization methods have democratized high-resolution image synthesis by enabling swift concept acquisition with minimal examples and lightweight computation, they also present an exploitable avenue for highly accessible backdoor attacks. This paper investigates a critical and unexplored aspect of text-to-image (T2I) diffusion models - their potential vulnerability to backdoor attacks via personalization. By studying the prompt processing of popular personalization methods (epitomized by Textual Inversion and DreamBooth), we have devised dedicated personalization-based backdoor attacks according to the different ways of dealing with unseen tokens and divide them into two families: nouveau-token and legacy-token backdoor attacks. In comparison to conventional backdoor attacks involving the fine-tuning of the entire text-to-image diffusion model, our proposed personalization-based backdoor attack method can facilitate more tailored, efficient, and few-shot attacks. Through comprehensive empirical study, we endorse the utilization of the nouveau-token backdoor attack due to its impressive effectiveness, stealthiness, and integrity, markedly outperforming the legacy-token backdoor attack.
APA, Harvard, Vancouver, ISO, and other styles
12

Li, Xi, Songhe Wang, Ruiquan Huang, Mahanth Gowda, and George Kesidis. "Temporal-Distributed Backdoor Attack against Video Based Action Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 4 (March 24, 2024): 3199–207. http://dx.doi.org/10.1609/aaai.v38i4.28104.

Full text
Abstract:
Deep neural networks (DNNs) have achieved tremendous success in various applications including video action recognition, yet remain vulnerable to backdoor attacks (Trojans). The backdoor-compromised model will mis-classify to the target class chosen by the attacker when a test instance (from a non-target class) is embedded with a specific trigger, while maintaining high accuracy on attack-free instances. Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored. Current studies are direct extensions of approaches proposed for image data, e.g., the triggers are independently embedded within the frames, which tend to be detectable by existing defenses. In this paper, we introduce a simple yet effective backdoor attack against video data. Our proposed attack, adding perturbations in a transformed domain, plants an imperceptible, temporally distributed trigger across the video frames, and is shown to be resilient to existing defensive strategies. The effectiveness of the proposed attack is demonstrated by extensive experiments with various well-known models on two video recognition benchmarks, UCF101 and HMDB51, and a sign language recognition benchmark, Greek Sign Language (GSL) dataset. We delve into the impact of several influential factors on our proposed attack and identify an intriguing effect termed "collateral damage" through extensive studies.
APA, Harvard, Vancouver, ISO, and other styles
13

Ning, Rui, Jiang Li, Chunsheng Xin, Hongyi Wu, and Chonggang Wang. "Hibernated Backdoor: A Mutual Information Empowered Backdoor Attack to Deep Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 10309–18. http://dx.doi.org/10.1609/aaai.v36i9.21272.

Full text
Abstract:
We report a new neural backdoor attack, named Hibernated Backdoor, which is stealthy, aggressive and devastating. The backdoor is planted in a hibernated mode to avoid being detected. Once deployed and fine-tuned on end-devices, the hibernated backdoor turns into the active state that can be exploited by the attacker. To the best of our knowledge, this is the first hibernated neural backdoor attack. It is achieved by maximizing the mutual information (MI) between the gradients of regular and malicious data on the model. We introduce a practical algorithm to achieve MI maximization to effectively plant the hibernated backdoor. To evade adaptive defenses, we further develop a targeted hibernated backdoor, which can only be activated by specific data samples and thus achieves a higher degree of stealthiness. We show the hibernated backdoor is robust and cannot be removed by existing backdoor removal schemes. It has been fully tested on four datasets with two neural network architectures, compared to five existing backdoor attacks, and evaluated using seven backdoor detection schemes. The experiments demonstrate the effectiveness of the hibernated backdoor attack under various settings.
APA, Harvard, Vancouver, ISO, and other styles
14

Yu, Fangchao, Bo Zeng, Kai Zhao, Zhi Pang, and Lina Wang. "Chronic Poisoning: Backdoor Attack against Split Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16531–38. http://dx.doi.org/10.1609/aaai.v38i15.29591.

Full text
Abstract:
Split learning is a computing resource-friendly distributed learning framework that protects client training data by splitting the model between the client and server. Previous work has proved that split learning faces a severe risk of privacy leakage, as a malicious server can recover the client's private data by hijacking the training process. In this paper, we first explore the vulnerability of split learning to server-side backdoor attacks, where our goal is to compromise the model's integrity. Since the server-side attacker cannot access the training data and client model in split learning, the traditional poisoning-based backdoor attack methods are no longer applicable. Therefore, constructing backdoor attacks in split learning poses significant challenges. Our strategy involves the attacker establishing a shadow model on the server side that can encode backdoor samples and guiding the client model to learn from this model during the training process, thereby enabling the client to acquire the same capability. Based on these insights, we propose a three-stage backdoor attack framework named SFI. Our attack framework minimizes assumptions about the attacker's background knowledge and ensures that the attack process remains imperceptible to the client. We implement SFI on various benchmark datasets, and extensive experimental results demonstrate its effectiveness and generality. For example, success rates of our attack on MNIST, Fashion, and CIFAR10 datasets all exceed 90%, with limited impact on the main task.
APA, Harvard, Vancouver, ISO, and other styles
15

Li, Yiming. "Poisoning-Based Backdoor Attacks in Computer Vision." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16121–22. http://dx.doi.org/10.1609/aaai.v37i13.26921.

Full text
Abstract:
Recent studies demonstrated that the training process of deep neural networks (DNNs) is vulnerable to backdoor attacks if third-party training resources (e.g., samples) are adopted. Specifically, the adversaries intend to embed hidden backdoors into DNNs, where the backdoor can be activated by pre-defined trigger patterns and leading malicious model predictions. My dissertation focuses on poisoning-based backdoor attacks in computer vision. Firstly, I study and propose more stealthy and effective attacks against image classification tasks in both physical and digital spaces. Secondly, I reveal the backdoor threats in visual object tracking, which is representative of critical video-related tasks. Thirdly, I explore how to exploit backdoor attacks as watermark techniques for positive purposes. I design a Python toolbox (i.e., BackdoorBox) that implements representative and advanced backdoor attacks and defenses under a unified and flexible framework, based on which to provide a comprehensive benchmark of existing methods at the end.
APA, Harvard, Vancouver, ISO, and other styles
16

Doan, Khoa D., Yingjie Lao, Peng Yang, and Ping Li. "Defending Backdoor Attacks on Vision Transformer via Patch Processing." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 506–15. http://dx.doi.org/10.1609/aaai.v37i1.25125.

Full text
Abstract:
Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposedds defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks.
APA, Harvard, Vancouver, ISO, and other styles
17

An, Shengwei, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, et al. "Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 10847–55. http://dx.doi.org/10.1609/aaai.v38i10.28958.

Full text
Abstract:
Diffusion models (DM) have become state-of-the-art generative models because of their capability of generating high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on over hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
APA, Harvard, Vancouver, ISO, and other styles
18

Liu, Xinwei, Xiaojun Jia, Jindong Gu, Yuan Xun, Siyuan Liang, and Xiaochun Cao. "Does Few-Shot Learning Suffer from Backdoor Attacks?" Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 18 (March 24, 2024): 19893–901. http://dx.doi.org/10.1609/aaai.v38i18.29965.

Full text
Abstract:
The field of few-shot learning (FSL) has shown promising results in scenarios where training data is limited, but its vulnerability to backdoor attacks remains largely unexplored. We first explore this topic by first evaluating the performance of the existing backdoor attack methods on few-shot learning scenarios. Unlike in standard supervised learning, existing backdoor attack methods failed to perform an effective attack in FSL due to two main issues. Firstly, the model tends to overfit to either benign features or trigger features, causing a tough trade-off between attack success rate and benign accuracy. Secondly, due to the small number of training samples, the dirty label or visible trigger in the support set can be easily detected by victims, which reduces the stealthiness of attacks. It seemed that FSL could survive from backdoor attacks. However, in this paper, we propose the Few-shot Learning Backdoor Attack (FLBA) to show that FSL can still be vulnerable to backdoor attacks. Specifically, we first generate a trigger to maximize the gap between poisoned and benign features. It enables the model to learn both benign and trigger features, which solves the problem of overfitting. To make it more stealthy, we hide the trigger by optimizing two types of imperceptible perturbation, namely attractive and repulsive perturbation, instead of attaching the trigger directly. Once we obtain the perturbations, we can poison all samples in the benign support set into a hidden poisoned support set and fine-tune the model on it. Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms while preserving clean accuracy and maintaining stealthiness. This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
APA, Harvard, Vancouver, ISO, and other styles
19

Xiang, Zhen, David J. Miller, Hang Wang, and George Kesidis. "Detecting Scene-Plausible Perceptible Backdoors in Trained DNNs Without Access to the Training Set." Neural Computation 33, no. 5 (April 13, 2021): 1329–71. http://dx.doi.org/10.1162/neco_a_01376.

Full text
Abstract:
Abstract Backdoor data poisoning attacks add mislabeled examples to the training set, with an embedded backdoor pattern, so that the classifier learns to classify to a target class whenever the backdoor pattern is present in a test sample. Here, we address posttraining detection of scene-plausible perceptible backdoors, a type of backdoor attack that can be relatively easily fashioned, particularly against DNN image classifiers. A post-training defender does not have access to the potentially poisoned training set, only to the trained classifier, as well as some unpoisoned examples that need not be training samples. Without the poisoned training set, the only information about a backdoor pattern is encoded in the DNN's trained weights. This detection scenario is of great import considering legacy and proprietary systems, cell phone apps, as well as training outsourcing, where the user of the classifier will not have access to the entire training set. We identify two important properties of scene-plausible perceptible backdoor patterns, spatial invariance and robustness, based on which we propose a novel detector using the maximum achievable misclassification fraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the source and target classes. Our detector outperforms existing detectors and, coupled with an imperceptible backdoor detector, helps achieve posttraining detection of most evasive backdoors of interest.
APA, Harvard, Vancouver, ISO, and other styles
20

Zhang, Shengchuan, and Suhang Ye. "Backdoor Attack against Face Sketch Synthesis." Entropy 25, no. 7 (June 25, 2023): 974. http://dx.doi.org/10.3390/e25070974.

Full text
Abstract:
Deep neural networks (DNNs) are easily exposed to backdoor threats when training with poisoned training samples. Models using backdoor attack have normal performance for benign samples, and possess poor performance for poisoned samples manipulated with pre-defined trigger patterns. Currently, research on backdoor attacks focuses on image classification and object detection. In this article, we investigated backdoor attacks in facial sketch synthesis, which can be beneficial for many applications, such as animation production and assisting police in searching for suspects. Specifically, we propose a simple yet effective poison-only backdoor attack suitable for generation tasks. We demonstrate that when the backdoor is integrated into the target model via our attack, it can mislead the model to synthesize unacceptable sketches of any photos stamped with the trigger patterns. Extensive experiments are executed on the benchmark datasets. Specifically, the light strokes devised by our backdoor attack strategy can significantly decrease the perceptual quality. However, the FSIM score of light strokes is 68.21% on the CUFS dataset and the FSIM scores of pseudo-sketches generated by FCN, cGAN, and MDAL are 69.35%, 71.53%, and 72.75%, respectively. There is no big difference, which proves the effectiveness of the proposed backdoor attack method.
APA, Harvard, Vancouver, ISO, and other styles
21

Xu, Yixiao, Xiaolei Liu, Kangyi Ding, and Bangzhou Xin. "IBD: An Interpretable Backdoor-Detection Method via Multivariate Interactions." Sensors 22, no. 22 (November 10, 2022): 8697. http://dx.doi.org/10.3390/s22228697.

Full text
Abstract:
Recent work has shown that deep neural networks are vulnerable to backdoor attacks. In comparison with the success of backdoor-attack methods, existing backdoor-defense methods face a lack of theoretical foundations and interpretable solutions. Most defense methods are based on experience with the characteristics of previous attacks, but fail to defend against new attacks. In this paper, we propose IBD, an interpretable backdoor-detection method via multivariate interactions. Using information theory techniques, IBD reveals how the backdoor works from the perspective of multivariate interactions of features. Based on the interpretable theorem, IBD enables defenders to detect backdoor models and poisoned examples without introducing additional information about the specific attack method. Experiments on widely used datasets and models show that IBD achieves a 78% increase in average in detection accuracy and an order-of-magnitude reduction in time cost compared with existing backdoor-detection methods.
APA, Harvard, Vancouver, ISO, and other styles
22

Sun, Xiaofei, Xiaoya Li, Yuxian Meng, Xiang Ao, Lingjuan Lyu, Jiwei Li, and Tianwei Zhang. "Defending against Backdoor Attacks in Natural Language Generation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (June 26, 2023): 5257–65. http://dx.doi.org/10.1609/aaai.v37i4.25656.

Full text
Abstract:
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the one-to-many issue in many NLG tasks such as dialog generation. We hope that this work can raise the awareness of backdoor risks concealed in deep NLG systems and inspire more future work (both attack and defense) towards this direction.
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Yue, Congyi Li, and Kai Chen. "UMA: Facilitating Backdoor Scanning via Unlearning-Based Model Ablation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21823–31. http://dx.doi.org/10.1609/aaai.v38i19.30183.

Full text
Abstract:
Recent advances in backdoor attacks, like leveraging complex triggers or stealthy implanting techniques, have introduced new challenges in backdoor scanning, limiting the usability of Deep Neural Networks (DNNs) in various scenarios. In this paper, we propose Unlearning-based Model Ablation (UMA), a novel approach to facilitate backdoor scanning and defend against advanced backdoor attacks. UMA filters out backdoor-irrelevant features by ablating the inherent features of the target class within the model and subsequently reveals the backdoor through dynamic trigger optimization. We evaluate our method on 1700 models (700 benign and 1000 trojaned) with 6 model structures, 7 different backdoor attacks and 4 datasets. Our results demonstrate that the proposed methodology effectively detect these advanced backdoors. Specifically, our method can achieve 91% AUC-ROC and 86.6% detection accuracy on average, which outperforms the baselines, including Neural Cleanse, ABS, K-Arm and MNTD.
APA, Harvard, Vancouver, ISO, and other styles
24

Fan, Linkun, Fazhi He, Tongzhen Si, Wei Tang, and Bing Li. "Invisible Backdoor Attack against 3D Point Cloud Classifier in Graph Spectral Domain." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21072–80. http://dx.doi.org/10.1609/aaai.v38i19.30099.

Full text
Abstract:
3D point cloud has been wildly used in security crucial domains, such as self-driving and 3D face recognition. Backdoor attack is a serious threat that usually destroy Deep Neural Networks (DNN) in the training stage. Though a few 3D backdoor attacks are designed to achieve guaranteed attack efficiency, their deformation will alarm human inspection. To obtain invisible backdoored point cloud, this paper proposes a novel 3D backdoor attack, named IBAPC, which generates backdoor trigger in the graph spectral domain. The effectiveness is grounded by the advantage of graph spectral signal that it can induce both global structure and local points to be responsible for the caused deformation in spatial domain. In detail, a new backdoor implanting function is proposed whose aim is to transform point cloud to graph spectral signal for conducting backdoor trigger. Then, we design a backdoor training procedure which updates the parameter of backdoor implanting function and victim 3D DNN alternately. Finally, the backdoored 3D DNN and its associated backdoor implanting function is obtained by finishing the backdoor training procedure. Experiment results suggest that IBAPC achieves SOTA attack stealthiness from three aspects including objective distance measurement, subjective human evaluation, graph spectral signal residual. At the same time, it obtains competitive attack efficiency. The code is available at https://github.com/f-lk/IBAPC.
APA, Harvard, Vancouver, ISO, and other styles
25

Chen, Yang, Zhonglin Ye, Haixing Zhao, and Ying Wang. "Feature-Based Graph Backdoor Attack in the Node Classification Task." International Journal of Intelligent Systems 2023 (February 21, 2023): 1–13. http://dx.doi.org/10.1155/2023/5418398.

Full text
Abstract:
Graph neural networks (GNNs) have shown significant performance in various practical applications due to their strong learning capabilities. Backdoor attacks are a type of attack that can produce hidden attacks on machine learning models. GNNs take backdoor datasets as input to produce an adversary-specified output on poisoned data but perform normally on clean data, which can have grave implications for applications. Backdoor attacks are under-researched in the graph domain, and almost existing graph backdoor attacks focus on the graph-level classification task. To close this gap, we propose a novel graph backdoor attack that uses node features as triggers and does not need knowledge of the GNNs parameters. In the experiments, we find that feature triggers can destroy the feature spaces of the original datasets, resulting in GNNs inability to identify poisoned data and clean data well. An adaptive method is proposed to improve the performance of the backdoor model by adjusting the graph structure. We conducted extensive experiments to validate the effectiveness of our model on three benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
26

Cui, Jing, Yufei Han, Yuzhe Ma, Jianbin Jiao, and Junge Zhang. "BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 11687–94. http://dx.doi.org/10.1609/aaai.v38i10.29052.

Full text
Abstract:
Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts (0.003% of total training steps) during training and infrequent attacks during testing. Code is available at: https://github.com/7777777cc/code.
APA, Harvard, Vancouver, ISO, and other styles
27

Wu, Yalun, Yanfeng Gu, Yuanwan Chen, Xiaoshu Cui, Qiong Li, Yingxiao Xiang, Endong Tong, Jianhua Li, Zhen Han, and Jiqiang Liu. "Camouflage Backdoor Attack against Pedestrian Detection." Applied Sciences 13, no. 23 (November 28, 2023): 12752. http://dx.doi.org/10.3390/app132312752.

Full text
Abstract:
Pedestrian detection models in autonomous driving systems heavily rely on deep neural networks (DNNs) to perceive their surroundings. Recent research has unveiled the vulnerability of DNNs to backdoor attacks, in which malicious actors manipulate the system by embedding specific triggers within the training data. In this paper, we propose a tailored camouflaged backdoor attack method designed for pedestrian detection in autonomous driving systems. Our approach begins with the construction of a set of trigger-embedded images. Subsequently, we employ an image scaling function to seamlessly integrate these trigger-embedded images into the original benign images, thereby creating potentially poisoned training images. Importantly, these potentially poisoned images exhibit minimal discernible differences from the original benign images and are virtually imperceptible to the human eye. We then strategically activate these concealed backdoors in specific scenarios, causing the pedestrian detection models to make incorrect judgments. Our study demonstrates that once our attack successfully embeds the backdoor into the target model, it can deceive the model into failing to detect any pedestrians marked with our trigger patterns. Extensive evaluations conducted on a publicly available pedestrian detection dataset confirm the effectiveness and stealthiness of our camouflaged backdoor attacks.
APA, Harvard, Vancouver, ISO, and other styles
28

Ozdayi, Mustafa Safa, Murat Kantarcioglu, and Yulia R. Gel. "Defending against Backdoors in Federated Learning with Robust Learning Rate." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 9268–76. http://dx.doi.org/10.1609/aaai.v35i10.17118.

Full text
Abstract:
Federated learning (FL) allows a set of agents to collaboratively train a model without sharing their potentially sensitive data. This makes FL suitable for privacy-preserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to embed a backdoor functionality to the model during training that can later be activated to cause a desired misclassification. To prevent backdoor attacks, we propose a lightweight defense that requires minimal change to the FL protocol. At a high level, our defense is based on carefully adjusting the aggregation server's learning rate, per dimension and per round, based on the sign information of agents' updates. We first conjecture the necessary steps to carry a successful backdoor attack in FL setting, and then, explicitly formulate the defense based on our conjecture. Through experiments, we provide empirical evidence that supports our conjecture, and we test our defense against backdoor attacks under different settings. We observe that either backdoor is completely eliminated, or its accuracy is significantly reduced. Overall, our experiments suggest that our defense significantly outperforms some of the recently proposed defenses in the literature. We achieve this by having minimal influence over the accuracy of the trained models. In addition, we also provide convergence rate analysis for our proposed scheme.
APA, Harvard, Vancouver, ISO, and other styles
29

Ye, Jianbin, Xiaoyuan Liu, Zheng You, Guowei Li, and Bo Liu. "DriNet: Dynamic Backdoor Attack against Automatic Speech Recognization Models." Applied Sciences 12, no. 12 (June 7, 2022): 5786. http://dx.doi.org/10.3390/app12125786.

Full text
Abstract:
Automatic speech recognition (ASR) is popular in our daily lives (e.g., via voice assistants or voice input). Once its security attributes are destroyed, it poses as a severe threat to a user’s life and ‘property safety’. Prior research has demonstrated that ASR systems are vulnerable to backdoor attacks. A model embedded with a backdoor behaves normally on clean samples yet misclassifies malicious samples that contain triggers. Existing backdoor attacks have mostly been conducted in the image domain. However, they can not be applied in the audio domain because of poor transferability. This paper proposes a dynamic backdoor attack method against ASR models, named DriNet. Significantly, we designed a dynamic trigger generation network to craft a variety of audio triggers. It is trained jointly with the discriminative model incorporated with an attack success rate on poisoned samples and accuracy on clean samples. We demonstrate that DriNet achieves an attack success rate of 86.4% when infecting only 0.5% of the training set without reducing its accuracy. DriNet can still achieve comparable attack performance to backdoor attacks using static triggers, further enjoying richer attack patterns. We further evaluated DriNet’s resistance to a current state-of-the-art defense mechanism. The anomaly index of DriNet is more than 37.4% smaller than that of BadNets method. The triggers generated by DriNet are hard reverse, keeping DriNet from the detectors.
APA, Harvard, Vancouver, ISO, and other styles
30

Jang, Jinhyeok, Yoonsoo An, Dowan Kim, and Daeseon Choi. "Feature Importance-Based Backdoor Attack in NSL-KDD." Electronics 12, no. 24 (December 9, 2023): 4953. http://dx.doi.org/10.3390/electronics12244953.

Full text
Abstract:
In this study, we explore the implications of advancing AI technology on the safety of machine learning models, specifically in decision-making across diverse applications. Our research delves into the domain of network intrusion detection, covering rule-based and anomaly-based detection methods. There is a growing interest in anomaly detection within network intrusion detection systems, accompanied by an increase in adversarial attacks using maliciously crafted examples. However, the vulnerability of intrusion detection systems to backdoor attacks, a form of adversarial attack, is frequently overlooked in untrustworthy environments. This paper proposes a backdoor attack scenario, centering on the “AlertNet” intrusion detection model and utilizing the NSL-KDD dataset, a benchmark widely employed in NIDS research. The attack involves modifying features at the packet level, as network datasets are typically constructed from packets using statistical methods. Evaluation metrics include accuracy, attack success rate, baseline comparisons with clean and random data, and comparisons involving the proposed backdoor. Additionally, the study employs KL-divergence and OneClassSVM for distribution comparisons to demonstrate resilience against manual inspection by a human expert from outliers. In conclusion, the paper outlines applications and limitations and emphasizes the direction and importance of research on backdoor attacks in network intrusion detection systems.
APA, Harvard, Vancouver, ISO, and other styles
31

Fang, Shihong, and Anna Choromanska. "Backdoor Attacks on the DNN Interpretation System." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 561–70. http://dx.doi.org/10.1609/aaai.v36i1.19935.

Full text
Abstract:
Interpretability is crucial to understand the inner workings of deep neural networks (DNNs). Many interpretation methods help to understand the decision-making of DNNs by generating saliency maps that highlight parts of the input image that contribute the most to the prediction made by the DNN. In this paper we design a backdoor attack that alters the saliency map produced by the network for an input image with a specific trigger pattern while not losing the prediction performance significantly. The saliency maps are incorporated in the penalty term of the objective function that is used to train a deep model and its influence on model training is conditioned upon the presence of a trigger. We design two types of attacks: a targeted attack that enforces a specific modification of the saliency map and a non-targeted attack when the importance scores of the top pixels from the original saliency map are significantly reduced. We perform empirical evaluations of the proposed backdoor attacks on gradient-based interpretation methods, Grad-CAM and SimpleGrad, and a gradient-free scheme, VisualBackProp, for a variety of deep learning architectures. We show that our attacks constitute a serious security threat to the reliability of the interpretation methods when deploying models developed by untrusted sources. We furthermore show that existing backdoor defense mechanisms are ineffective in detecting our attacks. Finally, we demonstrate that the proposed methodology can be used in an inverted setting, where the correct saliency map can be obtained only in the presence of a trigger (key), effectively making the interpretation system available only to selected users.
APA, Harvard, Vancouver, ISO, and other styles
32

Gao, Yudong, Honglong Chen, Peng Sun, Junjian Li, Anqing Zhang, Zhibo Wang, and Weifeng Liu. "A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 1851–59. http://dx.doi.org/10.1609/aaai.v38i3.27954.

Full text
Abstract:
Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs containing well-designed triggers, while behaving normally on clean inputs. Prior researches have explored the invisibility of backdoor triggers to enhance attack stealthiness. However, most of them only focus on the invisibility in the spatial domain, neglecting the generation of invisible triggers in the frequency domain. This limitation renders the generated poisoned images easily detectable by recent defense methods. To address this issue, we propose a DUal stealthy BAckdoor attack method named DUBA, which simultaneously considers the invisibility of triggers in both the spatial and frequency domains, to achieve desirable attack performance, while ensuring strong stealthiness. Specifically, we first use Wavelet Transform to embed the high-frequency information of the trigger image into the clean image to ensure attack effectiveness. Then, to attain strong stealthiness, we incorporate Fourier Transform and Cosine Transform to mix the poisoned image and clean image in the frequency domain. Moreover, DUBA adopts a novel attack strategy, training the model with weak triggers and attacking with strong triggers to further enhance attack performance and stealthiness. DUBA is evaluated extensively on four datasets against popular image classifiers, showing significant superiority over state-of-the-art backdoor attacks in attack success rate and stealthiness.
APA, Harvard, Vancouver, ISO, and other styles
33

Islam, Kazi Aminul, Hongyi Wu, Chunsheng Xin, Rui Ning, Liuwan Zhu, and Jiang Li. "Sub-Band Backdoor Attack in Remote Sensing Imagery." Algorithms 17, no. 5 (April 28, 2024): 182. http://dx.doi.org/10.3390/a17050182.

Full text
Abstract:
Remote sensing datasets usually have a wide range of spatial and spectral resolutions. They provide unique advantages in surveillance systems, and many government organizations use remote sensing multispectral imagery to monitor security-critical infrastructures or targets. Artificial Intelligence (AI) has advanced rapidly in recent years and has been widely applied to remote image analysis, achieving state-of-the-art (SOTA) performance. However, AI models are vulnerable and can be easily deceived or poisoned. A malicious user may poison an AI model by creating a stealthy backdoor. A backdoored AI model performs well on clean data but behaves abnormally when a planted trigger appears in the data. Backdoor attacks have been extensively studied in machine learning-based computer vision applications with natural images. However, much less research has been conducted on remote sensing imagery, which typically consists of many more bands in addition to the red, green, and blue bands found in natural images. In this paper, we first extensively studied a popular backdoor attack, BadNets, applied to a remote sensing dataset, where the trigger was planted in all of the bands in the data. Our results showed that SOTA defense mechanisms, including Neural Cleanse, TABOR, Activation Clustering, Fine-Pruning, GangSweep, Strip, DeepInspect, and Pixel Backdoor, had difficulties detecting and mitigating the backdoor attack. We then proposed an explainable AI-guided backdoor attack specifically for remote sensing imagery by placing triggers in the image sub-bands. Our proposed attack model even poses stronger challenges to these SOTA defense mechanisms, and no method was able to defend it. These results send an alarming message about the catastrophic effects the backdoor attacks may have on satellite imagery.
APA, Harvard, Vancouver, ISO, and other styles
34

Zhao, Feng, Li Zhou, Qi Zhong, Rushi Lan, and Leo Yu Zhang. "Natural Backdoor Attacks on Deep Neural Networks via Raindrops." Security and Communication Networks 2022 (March 26, 2022): 1–11. http://dx.doi.org/10.1155/2022/4593002.

Full text
Abstract:
Recently, deep learning has made significant inroads into the Internet of Things due to its great potential for processing big data. Backdoor attacks, which try to influence model prediction on specific inputs, have become a serious threat to deep neural network models. However, because the poisoned data used to plant a backdoor into the victim model typically follows a fixed specific pattern, most existing backdoor attacks can be readily prevented by common defense. In this paper, we leverage natural behavior and present a stealthy backdoor attack for image classification tasks: the raindrop backdoor attack (RDBA). We use raindrops as the backdoor trigger, and they are naturally merged with clean instances to synthesize poisoned data that are close to their natural counterparts in the rain. The raindrops dispersed over images are more diversified than the triggers in the literature, which are fixed, confined, and unpleasant patterns to the host content, making the triggers more stealthy. Extensive experiments on ImageNet and GTSRB datasets demonstrate the fidelity, effectiveness, stealthiness, and sustainability of RDBA in attacking models with current popular defense mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
35

Kwon, Hyun, and Sanghyun Lee. "Textual Backdoor Attack for the Text Classification System." Security and Communication Networks 2021 (October 22, 2021): 1–11. http://dx.doi.org/10.1155/2021/2938386.

Full text
Abstract:
Deep neural networks provide good performance for image recognition, speech recognition, text recognition, and pattern recognition. However, such networks are vulnerable to backdoor attacks. In a backdoor attack, normal data that do not include a specific trigger are correctly classified by the target model, but backdoor data that include the trigger are incorrectly classified by the target model. One advantage of a backdoor attack is that the attacker can use a specific trigger to attack at a desired time. In this study, we propose a backdoor attack targeting the BERT model, which is a classification system designed for use in the text domain. Under the proposed method, the model is additionally trained on a backdoor sentence that includes a specific trigger, and afterward, if the trigger is attached before or after an original sentence, it will be misclassified by the model. In our experimental evaluation, we used two movie review datasets (MR and IMDB). The results show that using the trigger word “ATTACK” at the beginning of an original sentence, the proposed backdoor method had a 100% attack success rate when approximately 1.0% and 0.9% of the training data consisted of backdoor samples, and it allowed the model to maintain an accuracy of 86.88% and 90.80% on the original samples in the MR and IMDB datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles
36

Jia, Jinyuan, Yupei Liu, Xiaoyu Cao, and Neil Zhenqiang Gong. "Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 9575–83. http://dx.doi.org/10.1609/aaai.v36i9.21191.

Full text
Abstract:
Data poisoning attacks and backdoor attacks aim to corrupt a machine learning classifier via modifying, adding, and/or removing some carefully selected training examples, such that the corrupted classifier makes incorrect predictions as the attacker desires. The key idea of state-of-the-art certified defenses against data poisoning attacks and backdoor attacks is to create a majority vote mechanism to predict the label of a testing example. Moreover, each voter is a base classifier trained on a subset of the training dataset. Classical simple learning algorithms such as k nearest neighbors (kNN) and radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this work, we show that the intrinsic majority vote mechanisms in kNN and rNN already provide certified robustness guarantees against data poisoning attacks and backdoor attacks. Moreover, our evaluation results on MNIST and CIFAR10 show that the intrinsic certified robustness guarantees of kNN and rNN outperform those provided by state-of-the-art certified defenses. Our results serve as standard baselines for future certified defenses against data poisoning attacks and backdoor attacks.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Jiawang, Changgen Peng, Weijie Tan, and Chenghui Shi. "Federated Learning Backdoor Attack Based on Frequency Domain Injection." Entropy 26, no. 2 (February 14, 2024): 164. http://dx.doi.org/10.3390/e26020164.

Full text
Abstract:
Federated learning (FL) is a distributed machine learning framework that enables scattered participants to collaboratively train machine learning models without revealing information to other participants. Due to its distributed nature, FL is susceptible to being manipulated by malicious clients. These malicious clients can launch backdoor attacks by contaminating local data or tampering with local model gradients, thereby damaging the global model. However, existing backdoor attacks in distributed scenarios have several vulnerabilities. For example, 1) the triggers in distributed backdoor attacks are mostly visible and easily perceivable by humans; 2) these triggers are mostly applied in the spatial domain, inevitably corrupting the semantic information of the contaminated pixels. To address these issues, this paper introduces a frequency-domain injection-based backdoor attack in FL. Specifically, by performing a Fourier transform, the trigger and the clean image are linearly mixed in the frequency domain, injecting the low-frequency information of the trigger into the clean image while preserving its semantic information. Experiments on multiple image classification datasets demonstrate that the attack method proposed in this paper is stealthier and more effective in FL scenarios compared to existing attack methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Matsuo, Yuki, and Kazuhiro Takemoto. "Backdoor Attacks on Deep Neural Networks via Transfer Learning from Natural Images." Applied Sciences 12, no. 24 (December 8, 2022): 12564. http://dx.doi.org/10.3390/app122412564.

Full text
Abstract:
Backdoor attacks are a serious security threat to open-source and outsourced development of computational systems based on deep neural networks (DNNs). In particular, the transferability of backdoors is remarkable; that is, they can remain effective after transfer learning is performed. Given that transfer learning from natural images is widely used in real-world applications, the question of whether backdoors can be transferred from neural models pretrained on natural images involves considerable security implications. However, this topic has not been evaluated rigorously in prior studies. Hence, in this study, we configured backdoors in 10 representative DNN models pretrained on a natural image dataset, and then fine-tuned the backdoored models via transfer learning for four real-world applications, including pneumonia classification from chest X-ray images, emergency response monitoring from aerial images, facial recognition, and age classification from images of faces. Our experimental results show that the backdoors generally remained effective after transfer learning from natural images, except for small DNN models. Moreover, the backdoors were difficult to detect using a common method. Our findings indicate that backdoor attacks can exhibit remarkable transferability in more realistic transfer learning processes, and highlight the need for the development of more advanced security countermeasures in developing systems using DNN models for sensitive or mission-critical applications.
APA, Harvard, Vancouver, ISO, and other styles
39

Shao, Kun, Yu Zhang, Junan Yang, and Hui Liu. "Textual Backdoor Defense via Poisoned Sample Recognition." Applied Sciences 11, no. 21 (October 25, 2021): 9938. http://dx.doi.org/10.3390/app11219938.

Full text
Abstract:
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.
APA, Harvard, Vancouver, ISO, and other styles
40

Mercier, Arthur, Nikita Smolin, Oliver Sihlovec, Stefanos Koffas, and Stjepan Picek. "Backdoor Pony: Evaluating backdoor attacks and defenses in different domains." SoftwareX 22 (May 2023): 101387. http://dx.doi.org/10.1016/j.softx.2023.101387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Chen, Yiming, Haiwei Wu, and Jiantao Zhou. "Progressive Poisoned Data Isolation for Training-Time Backdoor Defense." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 11425–33. http://dx.doi.org/10.1609/aaai.v38i10.29023.

Full text
Abstract:
Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning. It is hence imperative to develop a strategy for training a clean model using a potentially poisoned dataset. Previous training-time defense mechanisms typically employ an one-time isolation process, often leading to suboptimal isolation outcomes. In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD), that progressively isolates poisoned data to enhance the isolation accuracy and mitigate the risk of benign samples being misclassified as poisoned ones. Once the poisoned portion of the dataset has been identified, we introduce a selective training process to train a clean model. Through the implementation of these techniques, we ensure that the trained model manifests a significantly diminished attack success rate against the poisoned data. Extensive experiments on multiple benchmark datasets and DNN models, assessed against nine state-of-the-art backdoor attacks, demonstrate the superior performance of our PIPD method for backdoor defense. For instance, our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset, markedly surpassing the performance of state-of-the-art methods. The code is available at https://github.com/RorschachChen/PIPD.git.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Zhen, Buhong Wang, Chuanlei Zhang, Yaohui Liu, and Jianxin Guo. "Robust Feature-Guided Generative Adversarial Network for Aerial Image Semantic Segmentation against Backdoor Attacks." Remote Sensing 15, no. 10 (May 15, 2023): 2580. http://dx.doi.org/10.3390/rs15102580.

Full text
Abstract:
Profiting from the powerful feature extraction and representation capabilities of deep learning (DL), aerial image semantic segmentation based on deep neural networks (DNNs) has achieved remarkable success in recent years. Nevertheless, the security and robustness of DNNs deserve attention when dealing with safety-critical earth observation tasks. As a typical attack pattern in adversarial machine learning (AML), backdoor attacks intend to embed hidden triggers in DNNs by poisoning training data. The attacked DNNs behave normally on benign samples, but when the hidden trigger is activated, its prediction is modified to a specified target label. In this article, we systematically assess the threat of backdoor attacks to aerial image semantic segmentation tasks. To defend against backdoor attacks and maintain better semantic segmentation accuracy, we construct a novel robust generative adversarial network (RFGAN). Motivated by the sensitivity of human visual systems to global and edge information in images, RFGAN designs the robust global feature extractor (RobGF) and the robust edge feature extractor (RobEF) that force DNNs to learn global and edge features. Then, RFGAN uses robust global and edge features as guidance to obtain benign samples by the constructed generator, and the discriminator to obtain semantic segmentation results. Our method is the first attempt to address the backdoor threat to aerial image semantic segmentation by constructing the robust DNNs model architecture. Extensive experiments on real-world scenes aerial image benchmark datasets demonstrate that the constructed RFGAN can effectively defend against backdoor attacks and achieve better semantic segmentation results compared with the existing state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Cheng, Siyuan, Yingqi Liu, Shiqing Ma, and Xiangyu Zhang. "Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1148–56. http://dx.doi.org/10.1609/aaai.v35i2.16201.

Full text
Abstract:
Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.
APA, Harvard, Vancouver, ISO, and other styles
44

Na, Hyunsik, and Daeseon Choi. "Image-Synthesis-Based Backdoor Attack Approach for Face Classification Task." Electronics 12, no. 21 (November 3, 2023): 4535. http://dx.doi.org/10.3390/electronics12214535.

Full text
Abstract:
Although deep neural networks (DNNs) are applied in various fields owing to their remarkable performance, recent studies have indicated that DNN models are vulnerable to backdoor attacks. Backdoored images were generated by adding a backdoor trigger in original training images, which activated the backdoor attack. However, most of the previously used attack methods are noticeable, not natural to the human eye, and easily detected by certain defense methods. Accordingly, we propose an image-synthesis-based backdoor attack, which is a novel approach to avoid this type of attack. To overcome the aforementioned limitations, we set a conditional facial region such as the hair, eyes, or mouth as a trigger and modified that region using an image synthesis technique that replaced the region of original image with the region of target image. Consequently, we achieved an attack success rate of up to 88.37% using 20% of the synthesized backdoored images injected in the training dataset while maintaining the model accuracy for clean images. Moreover, we analyzed the advantages of the proposed approach through image transformation, visualization of activation regions for DNN models, and human tests. In addition to its applicability in both label flipping and clean-label attack scenarios, the proposed method can be utilized as an attack approach to threaten security in the face classification task.
APA, Harvard, Vancouver, ISO, and other styles
45

Shamshiri, Samaneh, Ki Jin Han, and Insoo Sohn. "DB-COVIDNet: A Defense Method against Backdoor Attacks." Mathematics 11, no. 20 (October 10, 2023): 4236. http://dx.doi.org/10.3390/math11204236.

Full text
Abstract:
With the emergence of COVID-19 disease in 2019, machine learning (ML) techniques, specifically deep learning networks (DNNs), played a key role in diagnosing the disease in the medical industry due to their superior performance. However, the computational cost of deep learning networks (DNNs) can be quite high, making it necessary to often outsource the training process to third-party providers, such as machine learning as a service (MLaaS). Therefore, careful consideration is required to achieve robustness in DNN-based systems against cyber-security attacks. In this paper, we propose a method called the dropout-bagging (DB-COVIDNet) algorithm, which works as a robust defense mechanism against poisoning backdoor attacks. In this model, the trigger-related features will be removed by the modified dropout algorithm, and then we will use the new voting method in the bagging algorithm to achieve the final results. We considered AC-COVIDNet as the main inducer of the bagging algorithm, which is an attention-guided contrastive convolutional neural network (CNN), and evaluated the performance of the proposed method with the malicious COVIDx dataset. The results demonstrated that DB-COVIDNet has strong robustness and can significantly reduce the effect of the backdoor attack. The proposed DB-COVIDNet nullifies backdoors before the attack has been activated, resulting in a tremendous reduction in the attack success rate from 99.5% to 3% with high accuracy on the clean data.
APA, Harvard, Vancouver, ISO, and other styles
46

Matsuo, Yuki, and Kazuhiro Takemoto. "Backdoor Attacks to Deep Neural Network-Based System for COVID-19 Detection from Chest X-ray Images." Applied Sciences 11, no. 20 (October 14, 2021): 9556. http://dx.doi.org/10.3390/app11209556.

Full text
Abstract:
Open-source deep neural networks (DNNs) for medical imaging are significant in emergent situations, such as during the pandemic of the 2019 novel coronavirus disease (COVID-19), since they accelerate the development of high-performance DNN-based systems. However, adversarial attacks are not negligible during open-source development. Since DNNs are used as computer-aided systems for COVID-19 screening from radiography images, we investigated the vulnerability of the COVID-Net model, a representative open-source DNN for COVID-19 detection from chest X-ray images to backdoor attacks that modify DNN models and cause their misclassification when a specific trigger input is added. The results showed that backdoors for both non-targeted attacks, for which DNNs classify inputs into incorrect labels, and targeted attacks, for which DNNs classify inputs into a specific target class, could be established in the COVID-Net model using a small trigger and small fraction of training data. Moreover, the backdoors were effective for models fine-tuned from the backdoored COVID-Net models, although the performance of non-targeted attacks was limited. This indicated that backdoored models could be spread via fine-tuning (thereby becoming a significant security threat). The findings showed that emphasis is required on open-source development and practical applications of DNNs for COVID-19 detection.
APA, Harvard, Vancouver, ISO, and other styles
47

Matsuo, Yuki, and Kazuhiro Takemoto. "Backdoor Attacks to Deep Neural Network-Based System for COVID-19 Detection from Chest X-ray Images." Applied Sciences 11, no. 20 (October 14, 2021): 9556. http://dx.doi.org/10.3390/app11209556.

Full text
Abstract:
Open-source deep neural networks (DNNs) for medical imaging are significant in emergent situations, such as during the pandemic of the 2019 novel coronavirus disease (COVID-19), since they accelerate the development of high-performance DNN-based systems. However, adversarial attacks are not negligible during open-source development. Since DNNs are used as computer-aided systems for COVID-19 screening from radiography images, we investigated the vulnerability of the COVID-Net model, a representative open-source DNN for COVID-19 detection from chest X-ray images to backdoor attacks that modify DNN models and cause their misclassification when a specific trigger input is added. The results showed that backdoors for both non-targeted attacks, for which DNNs classify inputs into incorrect labels, and targeted attacks, for which DNNs classify inputs into a specific target class, could be established in the COVID-Net model using a small trigger and small fraction of training data. Moreover, the backdoors were effective for models fine-tuned from the backdoored COVID-Net models, although the performance of non-targeted attacks was limited. This indicated that backdoored models could be spread via fine-tuning (thereby becoming a significant security threat). The findings showed that emphasis is required on open-source development and practical applications of DNNs for COVID-19 detection.
APA, Harvard, Vancouver, ISO, and other styles
48

Oyama, Tatsuya, Shunsuke Okura, Kota Yoshida, and Takeshi Fujino. "Backdoor Attack on Deep Neural Networks Triggered by Fault Injection Attack on Image Sensor Interface." Sensors 23, no. 10 (May 14, 2023): 4742. http://dx.doi.org/10.3390/s23104742.

Full text
Abstract:
A backdoor attack is a type of attack method that induces deep neural network (DNN) misclassification. The adversary who aims to trigger the backdoor attack inputs the image with a specific pattern (the adversarial mark) into the DNN model (backdoor model). In general, the adversary mark is created on the physical object input to an image by capturing a photo. With this conventional method, the success of the backdoor attack is not stable because the size and position change depending on the shooting environment. So far, we have proposed a method of creating an adversarial mark for triggering backdoor attacks by means of a fault injection attack on the mobile industry processor interface (MIPI), which is the image sensor interface. We propose the image tampering model, with which the adversarial mark can be generated in the actual fault injection to create the adversarial mark pattern. Then, the backdoor model was trained with poison data images, which the proposed simulation model created. We conducted a backdoor attack experiment using a backdoor model trained on a dataset containing 5% poison data. The clean data accuracy in normal operation was 91%; nevertheless, the attack success rate with fault injection was 83%.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Derui, Sheng Wen, Alireza Jolfaei, Mohammad Sayad Haghighi, Surya Nepal, and Yang Xiang. "On the Neural Backdoor of Federated Generative Models in Edge Computing." ACM Transactions on Internet Technology 22, no. 2 (May 31, 2022): 1–21. http://dx.doi.org/10.1145/3425662.

Full text
Abstract:
Edge computing, as a relatively recent evolution of cloud computing architecture, is the newest way for enterprises to distribute computational power and lower repetitive referrals to central authorities. In the edge computing environment, Generative Models (GMs) have been found to be valuable and useful in machine learning tasks such as data augmentation and data pre-processing. Federated learning and distributed learning refer to training machine learning models in the edge computing network. However, federated learning and distributed learning also bring additional risks to GMs since all peers in the network have access to the model under training. In this article, we study the vulnerabilities of federated GMs to data-poisoning-based backdoor attacks via gradient uploading. We additionally enhance the attack to reduce the required poisonous data samples and cope with dynamic network environments. Last but not least, the attacks are formally proven to be stealthy and effective toward federated GMs. According to the experiments, neural backdoors can be successfully embedded by including merely 5\% poisonous samples in the local training dataset of an attacker.
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Chien-Lun, Sara Babakniya, Marco Paolieri, and Leana Golubchik. "Defending against Poisoning Backdoor Attacks on Federated Meta-learning." ACM Transactions on Intelligent Systems and Technology 13, no. 5 (October 31, 2022): 1–25. http://dx.doi.org/10.1145/3523062.

Full text
Abstract:
Federated learning allows multiple users to collaboratively train a shared classification model while preserving data privacy. This approach, where model updates are aggregated by a central server, was shown to be vulnerable to poisoning backdoor attacks : a malicious user can alter the shared model to arbitrarily classify specific inputs from a given class. In this article, we analyze the effects of backdoor attacks on federated meta-learning , where users train a model that can be adapted to different sets of output classes using only a few examples. While the ability to adapt could, in principle, make federated learning frameworks more robust to backdoor attacks (when new training examples are benign), we find that even one-shot attacks can be very successful and persist after additional training. To address these vulnerabilities, we propose a defense mechanism inspired by matching networks , where the class of an input is predicted from the similarity of its features with a support set of labeled examples. By removing the decision logic from the model shared with the federation, the success and persistence of backdoor attacks are greatly reduced.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography