Matches in SemOpenAlex for { <https://semopenalex.org/work/W3119904598> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W3119904598 abstract "Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical evidence suggests that these two seemingly disparate issues are actually connected. In particular, robust models tend to provide more interpretable gradients than non-robust models. However, whether this relationship works in the opposite direction remains obscure. With this paper, we seek empirical answers to the following question: can models acquire adversarial robustness when they are trained to have interpretable gradients? We introduce a theoretically inspired technique called Interpretation Regularization (IR), which encourages a model’s gradients to (1) match the direction of interpretable target salience maps and (2) have small magnitude. To assess model performance and tease apart factors that contribute to adversarial robustness, we conduct extensive experiments on MNIST and CIFAR-10 with both (ell _2) and (ell _infty) attacks. We demonstrate that training the networks to have interpretable gradients improves their robustness to adversarial perturbations. Applying the network interpretation technique SmoothGrad [59] yields additional performance gains, especially in cross-norm attacks and under heavy perturbations. The results indicate that the interpretability of the model gradients is a crucial factor for adversarial robustness. Code for the experiments can be found at https://github.com/a1noack/interp_regularization." @default.
- W3119904598 created "2021-01-18" @default.
- W3119904598 creator A5036820958 @default.
- W3119904598 creator A5065154024 @default.
- W3119904598 creator A5066063885 @default.
- W3119904598 creator A5076817525 @default.
- W3119904598 date "2021-01-11" @default.
- W3119904598 modified "2023-09-30" @default.
- W3119904598 title "An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness" @default.
- W3119904598 cites W1849277567 @default.
- W3119904598 cites W2180612164 @default.
- W3119904598 cites W2243397390 @default.
- W3119904598 cites W2529714286 @default.
- W3119904598 cites W2543927648 @default.
- W3119904598 cites W2603766943 @default.
- W3119904598 cites W2657631929 @default.
- W3119904598 cites W2746600820 @default.
- W3119904598 cites W2768346313 @default.
- W3119904598 cites W2898365215 @default.
- W3119904598 cites W2962821226 @default.
- W3119904598 cites W2962847335 @default.
- W3119904598 cites W2962858109 @default.
- W3119904598 cites W2963564844 @default.
- W3119904598 cites W2963610729 @default.
- W3119904598 cites W2963749936 @default.
- W3119904598 cites W2963857521 @default.
- W3119904598 cites W2963952467 @default.
- W3119904598 cites W2964082701 @default.
- W3119904598 cites W2964137095 @default.
- W3119904598 cites W2979200397 @default.
- W3119904598 cites W2987875759 @default.
- W3119904598 cites W586722081 @default.
- W3119904598 doi "https://doi.org/10.1007/s42979-020-00390-x" @default.
- W3119904598 hasPublicationYear "2021" @default.
- W3119904598 type Work @default.
- W3119904598 sameAs 3119904598 @default.
- W3119904598 citedByCount "14" @default.
- W3119904598 countsByYear W31199045982021 @default.
- W3119904598 countsByYear W31199045982022 @default.
- W3119904598 countsByYear W31199045982023 @default.
- W3119904598 crossrefType "journal-article" @default.
- W3119904598 hasAuthorship W3119904598A5036820958 @default.
- W3119904598 hasAuthorship W3119904598A5065154024 @default.
- W3119904598 hasAuthorship W3119904598A5066063885 @default.
- W3119904598 hasAuthorship W3119904598A5076817525 @default.
- W3119904598 hasBestOaLocation W31199045982 @default.
- W3119904598 hasConcept C104317684 @default.
- W3119904598 hasConcept C105795698 @default.
- W3119904598 hasConcept C108154423 @default.
- W3119904598 hasConcept C119857082 @default.
- W3119904598 hasConcept C120936955 @default.
- W3119904598 hasConcept C154945302 @default.
- W3119904598 hasConcept C185592680 @default.
- W3119904598 hasConcept C190502265 @default.
- W3119904598 hasConcept C2781067378 @default.
- W3119904598 hasConcept C2984842247 @default.
- W3119904598 hasConcept C33923547 @default.
- W3119904598 hasConcept C37736160 @default.
- W3119904598 hasConcept C41008148 @default.
- W3119904598 hasConcept C50644808 @default.
- W3119904598 hasConcept C55493867 @default.
- W3119904598 hasConcept C63479239 @default.
- W3119904598 hasConceptScore W3119904598C104317684 @default.
- W3119904598 hasConceptScore W3119904598C105795698 @default.
- W3119904598 hasConceptScore W3119904598C108154423 @default.
- W3119904598 hasConceptScore W3119904598C119857082 @default.
- W3119904598 hasConceptScore W3119904598C120936955 @default.
- W3119904598 hasConceptScore W3119904598C154945302 @default.
- W3119904598 hasConceptScore W3119904598C185592680 @default.
- W3119904598 hasConceptScore W3119904598C190502265 @default.
- W3119904598 hasConceptScore W3119904598C2781067378 @default.
- W3119904598 hasConceptScore W3119904598C2984842247 @default.
- W3119904598 hasConceptScore W3119904598C33923547 @default.
- W3119904598 hasConceptScore W3119904598C37736160 @default.
- W3119904598 hasConceptScore W3119904598C41008148 @default.
- W3119904598 hasConceptScore W3119904598C50644808 @default.
- W3119904598 hasConceptScore W3119904598C55493867 @default.
- W3119904598 hasConceptScore W3119904598C63479239 @default.
- W3119904598 hasFunder F4320332180 @default.
- W3119904598 hasIssue "1" @default.
- W3119904598 hasLocation W31199045981 @default.
- W3119904598 hasLocation W31199045982 @default.
- W3119904598 hasOpenAccess W3119904598 @default.
- W3119904598 hasPrimaryLocation W31199045981 @default.
- W3119904598 hasRelatedWork W2973831015 @default.
- W3119904598 hasRelatedWork W2989784533 @default.
- W3119904598 hasRelatedWork W2991526369 @default.
- W3119904598 hasRelatedWork W2993049325 @default.
- W3119904598 hasRelatedWork W3018979822 @default.
- W3119904598 hasRelatedWork W3091172437 @default.
- W3119904598 hasRelatedWork W3127679336 @default.
- W3119904598 hasRelatedWork W3208923398 @default.
- W3119904598 hasRelatedWork W4210274113 @default.
- W3119904598 hasRelatedWork W4288018014 @default.
- W3119904598 hasVolume "2" @default.
- W3119904598 isParatext "false" @default.
- W3119904598 isRetracted "false" @default.
- W3119904598 magId "3119904598" @default.
- W3119904598 workType "article" @default.