Under a commonly-studied "backdoor" poisoning attack against classification models, an attacker adds a small "trigger" to a subset of the training data, such that the presence of this trigger at test time causes the classifier to always predict some target class. It is often implicitly assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger. In this paper, we show empirically that this view of backdoored classifiers is fundamentally incorrect. We demonstrate that anyone with access to the classifier, even without access to any original training data or trigger, can construct several alternative triggers that are as effective or more so at eliciting the target class at test time. We construct these alternative triggers by first generating adversarial examples for a smoothed version of the classifier, created with a recent process called Denoised Smoothing, and then extracting colors or cropped portions of adversarial images. We demonstra

Date: 2020/10/20 08:22

@bbr_bbq バックドアが設置された分類器に対し、本来のトリガーではない入力データを使用してバックドアを活性化させる手法。Denoised Smoothingで作成したAEsをベースに新たなトリガーを作成するとのこと。面白いアイデアだが、ブラックボックスでは攻撃が難しそう。 #aisec #jpsecai t.co/pjYCv4fBiN
@Eric_jie_thu Is the backdoor secret? Checkout our new work on ''breaking'' poisoned classifiers, where we use neat ideas in adversarial robustness to analyze backdoored classifiers. Joint work with @agsidd10 & @zicokolter. Paper: t.co/IRSS1q65Ky Code: t.co/KCDrek7FTP t.co/yAyY8zsDD3

