Matches in SemOpenAlex for { <https://semopenalex.org/work/W4281570223> ?p ?o ?g. }
Showing items 1 to 60 of
60
with 100 items per page.
- W4281570223 abstract "Existing backdoor defense methods are only effective for limited trigger types. To defend different trigger types at once, we start from the class-irrelevant nature of the poisoning process and propose a novel weakly supervised backdoor defense framework WeDef. Recent advances in weak supervision make it possible to train a reasonably accurate text classifier using only a small number of user-provided, class-indicative seed words. Such seed words shall be considered independent of the triggers. Therefore, a weakly supervised text classifier trained by only the poisoned documents without their labels will likely have no backdoor. Inspired by this observation, in WeDef, we define the reliability of samples based on whether the predictions of the weak classifier agree with their labels in the poisoned training set. We further improve the results through a two-phase sanitization: (1) iteratively refine the weak classifier based on the reliable samples and (2) train a binary poison classifier by distinguishing the most unreliable samples from the most reliable samples. Finally, we train the sanitized model on the samples that the poison classifier predicts as benign. Extensive experiments show that WeDefis effective against popular trigger-based attacks (e.g., words, sentences, and paraphrases), outperforming existing defense methods." @default.
- W4281570223 created "2022-05-27" @default.
- W4281570223 creator A5038711316 @default.
- W4281570223 creator A5039500313 @default.
- W4281570223 creator A5086724583 @default.
- W4281570223 date "2022-05-24" @default.
- W4281570223 modified "2023-09-24" @default.
- W4281570223 title "WeDef: Weakly Supervised Backdoor Defense for Text Classification" @default.
- W4281570223 doi "https://doi.org/10.48550/arxiv.2205.11803" @default.
- W4281570223 hasPublicationYear "2022" @default.
- W4281570223 type Work @default.
- W4281570223 citedByCount "0" @default.
- W4281570223 crossrefType "posted-content" @default.
- W4281570223 hasAuthorship W4281570223A5038711316 @default.
- W4281570223 hasAuthorship W4281570223A5039500313 @default.
- W4281570223 hasAuthorship W4281570223A5086724583 @default.
- W4281570223 hasBestOaLocation W42815702231 @default.
- W4281570223 hasConcept C119857082 @default.
- W4281570223 hasConcept C12267149 @default.
- W4281570223 hasConcept C153180895 @default.
- W4281570223 hasConcept C154945302 @default.
- W4281570223 hasConcept C2781045450 @default.
- W4281570223 hasConcept C33923547 @default.
- W4281570223 hasConcept C38652104 @default.
- W4281570223 hasConcept C41008148 @default.
- W4281570223 hasConcept C48372109 @default.
- W4281570223 hasConcept C51632099 @default.
- W4281570223 hasConcept C66905080 @default.
- W4281570223 hasConcept C94375191 @default.
- W4281570223 hasConcept C95623464 @default.
- W4281570223 hasConceptScore W4281570223C119857082 @default.
- W4281570223 hasConceptScore W4281570223C12267149 @default.
- W4281570223 hasConceptScore W4281570223C153180895 @default.
- W4281570223 hasConceptScore W4281570223C154945302 @default.
- W4281570223 hasConceptScore W4281570223C2781045450 @default.
- W4281570223 hasConceptScore W4281570223C33923547 @default.
- W4281570223 hasConceptScore W4281570223C38652104 @default.
- W4281570223 hasConceptScore W4281570223C41008148 @default.
- W4281570223 hasConceptScore W4281570223C48372109 @default.
- W4281570223 hasConceptScore W4281570223C51632099 @default.
- W4281570223 hasConceptScore W4281570223C66905080 @default.
- W4281570223 hasConceptScore W4281570223C94375191 @default.
- W4281570223 hasConceptScore W4281570223C95623464 @default.
- W4281570223 hasLocation W42815702231 @default.
- W4281570223 hasLocation W42815702232 @default.
- W4281570223 hasOpenAccess W4281570223 @default.
- W4281570223 hasPrimaryLocation W42815702231 @default.
- W4281570223 hasRelatedWork W2112343299 @default.
- W4281570223 hasRelatedWork W2275058042 @default.
- W4281570223 hasRelatedWork W2888934269 @default.
- W4281570223 hasRelatedWork W2964083560 @default.
- W4281570223 hasRelatedWork W3043252291 @default.
- W4281570223 hasRelatedWork W4205288553 @default.
- W4281570223 hasRelatedWork W4221015625 @default.
- W4281570223 hasRelatedWork W4281570223 @default.
- W4281570223 hasRelatedWork W4308671172 @default.
- W4281570223 hasRelatedWork W4328092580 @default.
- W4281570223 isParatext "false" @default.
- W4281570223 isRetracted "false" @default.
- W4281570223 workType "article" @default.