Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385570811> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4385570811 abstract "Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent. Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results and therefore are more likely to be poison triggers. Additionally, we further utilize an external pre-trained language model to distinguish whether input is poisoned or not. We show that our proposed method can generalize sufficiently well in two common attack scenarios (poisoning training data and testing data), which consistently improves previous methods. For instance, AttDef can successfully mitigate both attacks with an average accuracy of 79.97% (56.59% up) and 48.34% (3.99% up) under pre-training and post-training attack defense respectively, achieving the new state-of-the-art performance on prediction recovery over four benchmark datasets." @default.
- W4385570811 created "2023-08-05" @default.
- W4385570811 creator A5005843046 @default.
- W4385570811 creator A5023356394 @default.
- W4385570811 creator A5049870254 @default.
- W4385570811 creator A5050525772 @default.
- W4385570811 creator A5068340305 @default.
- W4385570811 date "2023-01-01" @default.
- W4385570811 modified "2023-09-24" @default.
- W4385570811 title "Defending against Insertion-based Textual Backdoor Attacks via Attribution" @default.
- W4385570811 doi "https://doi.org/10.18653/v1/2023.findings-acl.561" @default.
- W4385570811 hasPublicationYear "2023" @default.
- W4385570811 type Work @default.
- W4385570811 citedByCount "0" @default.
- W4385570811 crossrefType "proceedings-article" @default.
- W4385570811 hasAuthorship W4385570811A5005843046 @default.
- W4385570811 hasAuthorship W4385570811A5023356394 @default.
- W4385570811 hasAuthorship W4385570811A5049870254 @default.
- W4385570811 hasAuthorship W4385570811A5050525772 @default.
- W4385570811 hasAuthorship W4385570811A5068340305 @default.
- W4385570811 hasBestOaLocation W43855708111 @default.
- W4385570811 hasConcept C119857082 @default.
- W4385570811 hasConcept C13280743 @default.
- W4385570811 hasConcept C143299363 @default.
- W4385570811 hasConcept C154945302 @default.
- W4385570811 hasConcept C15744967 @default.
- W4385570811 hasConcept C185798385 @default.
- W4385570811 hasConcept C205649164 @default.
- W4385570811 hasConcept C2781045450 @default.
- W4385570811 hasConcept C38652104 @default.
- W4385570811 hasConcept C41008148 @default.
- W4385570811 hasConcept C51632099 @default.
- W4385570811 hasConcept C77805123 @default.
- W4385570811 hasConceptScore W4385570811C119857082 @default.
- W4385570811 hasConceptScore W4385570811C13280743 @default.
- W4385570811 hasConceptScore W4385570811C143299363 @default.
- W4385570811 hasConceptScore W4385570811C154945302 @default.
- W4385570811 hasConceptScore W4385570811C15744967 @default.
- W4385570811 hasConceptScore W4385570811C185798385 @default.
- W4385570811 hasConceptScore W4385570811C205649164 @default.
- W4385570811 hasConceptScore W4385570811C2781045450 @default.
- W4385570811 hasConceptScore W4385570811C38652104 @default.
- W4385570811 hasConceptScore W4385570811C41008148 @default.
- W4385570811 hasConceptScore W4385570811C51632099 @default.
- W4385570811 hasConceptScore W4385570811C77805123 @default.
- W4385570811 hasLocation W43855708111 @default.
- W4385570811 hasOpenAccess W4385570811 @default.
- W4385570811 hasPrimaryLocation W43855708111 @default.
- W4385570811 hasRelatedWork W112744582 @default.
- W4385570811 hasRelatedWork W1485630101 @default.
- W4385570811 hasRelatedWork W2188112492 @default.
- W4385570811 hasRelatedWork W2498017833 @default.
- W4385570811 hasRelatedWork W2565902605 @default.
- W4385570811 hasRelatedWork W2800394942 @default.
- W4385570811 hasRelatedWork W2961085424 @default.
- W4385570811 hasRelatedWork W3184896852 @default.
- W4385570811 hasRelatedWork W4240701543 @default.
- W4385570811 hasRelatedWork W4306674287 @default.
- W4385570811 isParatext "false" @default.
- W4385570811 isRetracted "false" @default.
- W4385570811 workType "article" @default.