Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385571902> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4385571902 abstract "Pre-trained language model (PLM) can be stealthily misled to target outputs by backdoor attacks when encountering poisoned samples, without performance degradation on clean samples. The stealthiness of backdoor attacks is commonly attained through minimal cross-entropy loss fine-tuning on a union of poisoned and clean samples. Existing defense paradigms provide a workaround by detecting and removing poisoned samples at pre-training or inference time. On the contrary, we provide a new perspective where the backdoor attack is directly reversed. Specifically, maximum entropy loss is incorporated in training to neutralize the minimal cross-entropy loss fine-tuning on poisoned data. We defend against a range of backdoor attacks on classification tasks and significantly lower the attack success rate. In extension, we explore the relationship between intended backdoor attacks and unintended dataset bias, and demonstrate the feasibility of the maximum entropy principle in de-biasing." @default.
- W4385571902 created "2023-08-05" @default.
- W4385571902 creator A5006391718 @default.
- W4385571902 creator A5016006900 @default.
- W4385571902 creator A5021385133 @default.
- W4385571902 creator A5053914891 @default.
- W4385571902 creator A5060197177 @default.
- W4385571902 date "2023-01-01" @default.
- W4385571902 modified "2023-09-24" @default.
- W4385571902 title "Maximum Entropy Loss, the Silver Bullet Targeting Backdoor Attacks in Pre-trained Language Models" @default.
- W4385571902 doi "https://doi.org/10.18653/v1/2023.findings-acl.237" @default.
- W4385571902 hasPublicationYear "2023" @default.
- W4385571902 type Work @default.
- W4385571902 citedByCount "0" @default.
- W4385571902 crossrefType "proceedings-article" @default.
- W4385571902 hasAuthorship W4385571902A5006391718 @default.
- W4385571902 hasAuthorship W4385571902A5016006900 @default.
- W4385571902 hasAuthorship W4385571902A5021385133 @default.
- W4385571902 hasAuthorship W4385571902A5053914891 @default.
- W4385571902 hasAuthorship W4385571902A5060197177 @default.
- W4385571902 hasBestOaLocation W43855719021 @default.
- W4385571902 hasConcept C106301342 @default.
- W4385571902 hasConcept C121332964 @default.
- W4385571902 hasConcept C154945302 @default.
- W4385571902 hasConcept C2776214188 @default.
- W4385571902 hasConcept C2781045450 @default.
- W4385571902 hasConcept C38652104 @default.
- W4385571902 hasConcept C41008148 @default.
- W4385571902 hasConcept C62520636 @default.
- W4385571902 hasConcept C9679016 @default.
- W4385571902 hasConceptScore W4385571902C106301342 @default.
- W4385571902 hasConceptScore W4385571902C121332964 @default.
- W4385571902 hasConceptScore W4385571902C154945302 @default.
- W4385571902 hasConceptScore W4385571902C2776214188 @default.
- W4385571902 hasConceptScore W4385571902C2781045450 @default.
- W4385571902 hasConceptScore W4385571902C38652104 @default.
- W4385571902 hasConceptScore W4385571902C41008148 @default.
- W4385571902 hasConceptScore W4385571902C62520636 @default.
- W4385571902 hasConceptScore W4385571902C9679016 @default.
- W4385571902 hasLocation W43855719021 @default.
- W4385571902 hasOpenAccess W4385571902 @default.
- W4385571902 hasPrimaryLocation W43855719021 @default.
- W4385571902 hasRelatedWork W1481035447 @default.
- W4385571902 hasRelatedWork W2170442128 @default.
- W4385571902 hasRelatedWork W2188112492 @default.
- W4385571902 hasRelatedWork W2800394942 @default.
- W4385571902 hasRelatedWork W2956252641 @default.
- W4385571902 hasRelatedWork W3038107571 @default.
- W4385571902 hasRelatedWork W3184896852 @default.
- W4385571902 hasRelatedWork W4230289150 @default.
- W4385571902 hasRelatedWork W4240701543 @default.
- W4385571902 hasRelatedWork W1840384731 @default.
- W4385571902 isParatext "false" @default.
- W4385571902 isRetracted "false" @default.
- W4385571902 workType "article" @default.