Matches in SemOpenAlex for { <https://semopenalex.org/work/W2970290563> ?p ?o ?g. }
- W2970290563 abstract "Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model produce a specific prediction when concatenated any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example, triggers cause SNLI entailment accuracy drop from 89.94% 0.55%, 72% of why questions in SQuAD be answered to kill american people, and the GPT-2 language model spew racist output even when conditioned on non-racial contexts. Furthermore, although the triggers are optimized using white-box access a specific model, they transfer other models for all tasks we consider. Finally, since triggers are input-agnostic, they provide an analysis of global model behavior. For instance, they confirm that SNLI models exploit dataset biases and help diagnose heuristics learned by reading comprehension models." @default.
- W2970290563 created "2019-09-05" @default.
- W2970290563 creator A5005779128 @default.
- W2970290563 creator A5035088083 @default.
- W2970290563 creator A5042990962 @default.
- W2970290563 creator A5056645742 @default.
- W2970290563 creator A5057958949 @default.
- W2970290563 date "2019-08-20" @default.
- W2970290563 modified "2023-09-23" @default.
- W2970290563 title "Universal Adversarial Triggers for Attacking and Analyzing NLP" @default.
- W2970290563 cites W1840435438 @default.
- W2970290563 cites W2250539671 @default.
- W2970290563 cites W2251939518 @default.
- W2970290563 cites W2413794162 @default.
- W2970290563 cites W2543927648 @default.
- W2970290563 cites W2551396370 @default.
- W2970290563 cites W2562979205 @default.
- W2970290563 cites W2799007037 @default.
- W2970290563 cites W2799194071 @default.
- W2970290563 cites W2811010710 @default.
- W2970290563 cites W2890365488 @default.
- W2970290563 cites W2917779306 @default.
- W2970290563 cites W2922293812 @default.
- W2970290563 cites W2949895294 @default.
- W2970290563 cites W2951286828 @default.
- W2970290563 cites W2962736243 @default.
- W2970290563 cites W2962739339 @default.
- W2970290563 cites W2962772361 @default.
- W2970290563 cites W2962784628 @default.
- W2970290563 cites W2962816513 @default.
- W2970290563 cites W2962843521 @default.
- W2970290563 cites W2963126845 @default.
- W2970290563 cites W2963564796 @default.
- W2970290563 cites W2963661177 @default.
- W2970290563 cites W2963748441 @default.
- W2970290563 cites W2963834268 @default.
- W2970290563 cites W2963969878 @default.
- W2970290563 cites W2964153729 @default.
- W2970290563 cites W2966491090 @default.
- W2970290563 cites W2998277219 @default.
- W2970290563 cites W3015001695 @default.
- W2970290563 cites W9657784 @default.
- W2970290563 cites W3023071679 @default.
- W2970290563 hasPublicationYear "2019" @default.
- W2970290563 type Work @default.
- W2970290563 sameAs 2970290563 @default.
- W2970290563 citedByCount "0" @default.
- W2970290563 crossrefType "posted-content" @default.
- W2970290563 hasAuthorship W2970290563A5005779128 @default.
- W2970290563 hasAuthorship W2970290563A5035088083 @default.
- W2970290563 hasAuthorship W2970290563A5042990962 @default.
- W2970290563 hasAuthorship W2970290563A5056645742 @default.
- W2970290563 hasAuthorship W2970290563A5057958949 @default.
- W2970290563 hasConcept C111919701 @default.
- W2970290563 hasConcept C119857082 @default.
- W2970290563 hasConcept C127705205 @default.
- W2970290563 hasConcept C137293760 @default.
- W2970290563 hasConcept C138885662 @default.
- W2970290563 hasConcept C154945302 @default.
- W2970290563 hasConcept C165696696 @default.
- W2970290563 hasConcept C199360897 @default.
- W2970290563 hasConcept C204321447 @default.
- W2970290563 hasConcept C37736160 @default.
- W2970290563 hasConcept C38652104 @default.
- W2970290563 hasConcept C41008148 @default.
- W2970290563 hasConcept C41895202 @default.
- W2970290563 hasConcept C527412718 @default.
- W2970290563 hasConcept C90805587 @default.
- W2970290563 hasConceptScore W2970290563C111919701 @default.
- W2970290563 hasConceptScore W2970290563C119857082 @default.
- W2970290563 hasConceptScore W2970290563C127705205 @default.
- W2970290563 hasConceptScore W2970290563C137293760 @default.
- W2970290563 hasConceptScore W2970290563C138885662 @default.
- W2970290563 hasConceptScore W2970290563C154945302 @default.
- W2970290563 hasConceptScore W2970290563C165696696 @default.
- W2970290563 hasConceptScore W2970290563C199360897 @default.
- W2970290563 hasConceptScore W2970290563C204321447 @default.
- W2970290563 hasConceptScore W2970290563C37736160 @default.
- W2970290563 hasConceptScore W2970290563C38652104 @default.
- W2970290563 hasConceptScore W2970290563C41008148 @default.
- W2970290563 hasConceptScore W2970290563C41895202 @default.
- W2970290563 hasConceptScore W2970290563C527412718 @default.
- W2970290563 hasConceptScore W2970290563C90805587 @default.
- W2970290563 hasLocation W29702905631 @default.
- W2970290563 hasOpenAccess W2970290563 @default.
- W2970290563 hasPrimaryLocation W29702905631 @default.
- W2970290563 hasRelatedWork W1591706642 @default.
- W2970290563 hasRelatedWork W2333611780 @default.
- W2970290563 hasRelatedWork W2798966449 @default.
- W2970290563 hasRelatedWork W2885119660 @default.
- W2970290563 hasRelatedWork W2903359200 @default.
- W2970290563 hasRelatedWork W2945058366 @default.
- W2970290563 hasRelatedWork W2971104547 @default.
- W2970290563 hasRelatedWork W2972433973 @default.
- W2970290563 hasRelatedWork W2976855161 @default.
- W2970290563 hasRelatedWork W2982756474 @default.
- W2970290563 hasRelatedWork W3031924365 @default.
- W2970290563 hasRelatedWork W3105511442 @default.
- W2970290563 hasRelatedWork W3115146967 @default.
- W2970290563 hasRelatedWork W3115607146 @default.