Matches in SemOpenAlex for { <https://semopenalex.org/work/W4311000596> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W4311000596 abstract "Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abilities. In this paper, we focus on a variant of SAM known as mSAM, which, during training, averages the updates generated by adversarial perturbations across several disjoint shards of a mini-batch. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. However, a comprehensive empirical study of mSAM is missing from the literature -- previous results have mostly been limited to specific architectures and datasets. To that end, this paper presents a thorough empirical evaluation of mSAM on various tasks and datasets. We provide a flexible implementation of mSAM and compare the generalization performance of mSAM to the performance of SAM and vanilla training on different image classification and natural language processing tasks. We also conduct careful experiments to understand the computational cost of training with mSAM, its sensitivity to hyperparameters and its correlation with the flatness of the loss landscape. Our analysis reveals that mSAM yields superior generalization performance and flatter minima, compared to SAM, across a wide range of tasks without significantly increasing computational costs." @default.
- W4311000596 created "2022-12-22" @default.
- W4311000596 creator A5003977430 @default.
- W4311000596 creator A5004541836 @default.
- W4311000596 creator A5014264501 @default.
- W4311000596 creator A5029767449 @default.
- W4311000596 creator A5045271820 @default.
- W4311000596 creator A5048859133 @default.
- W4311000596 creator A5075423316 @default.
- W4311000596 date "2022-12-06" @default.
- W4311000596 modified "2023-10-17" @default.
- W4311000596 title "Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization" @default.
- W4311000596 doi "https://doi.org/10.48550/arxiv.2212.04343" @default.
- W4311000596 hasPublicationYear "2022" @default.
- W4311000596 type Work @default.
- W4311000596 citedByCount "0" @default.
- W4311000596 crossrefType "posted-content" @default.
- W4311000596 hasAuthorship W4311000596A5003977430 @default.
- W4311000596 hasAuthorship W4311000596A5004541836 @default.
- W4311000596 hasAuthorship W4311000596A5014264501 @default.
- W4311000596 hasAuthorship W4311000596A5029767449 @default.
- W4311000596 hasAuthorship W4311000596A5045271820 @default.
- W4311000596 hasAuthorship W4311000596A5048859133 @default.
- W4311000596 hasAuthorship W4311000596A5075423316 @default.
- W4311000596 hasBestOaLocation W43110005961 @default.
- W4311000596 hasConcept C108583219 @default.
- W4311000596 hasConcept C11413529 @default.
- W4311000596 hasConcept C119857082 @default.
- W4311000596 hasConcept C134306372 @default.
- W4311000596 hasConcept C147764199 @default.
- W4311000596 hasConcept C154945302 @default.
- W4311000596 hasConcept C159985019 @default.
- W4311000596 hasConcept C165464430 @default.
- W4311000596 hasConcept C177148314 @default.
- W4311000596 hasConcept C186633575 @default.
- W4311000596 hasConcept C192562407 @default.
- W4311000596 hasConcept C199360897 @default.
- W4311000596 hasConcept C204323151 @default.
- W4311000596 hasConcept C2776502983 @default.
- W4311000596 hasConcept C33923547 @default.
- W4311000596 hasConcept C41008148 @default.
- W4311000596 hasConcept C50644808 @default.
- W4311000596 hasConcept C81363708 @default.
- W4311000596 hasConcept C8642999 @default.
- W4311000596 hasConceptScore W4311000596C108583219 @default.
- W4311000596 hasConceptScore W4311000596C11413529 @default.
- W4311000596 hasConceptScore W4311000596C119857082 @default.
- W4311000596 hasConceptScore W4311000596C134306372 @default.
- W4311000596 hasConceptScore W4311000596C147764199 @default.
- W4311000596 hasConceptScore W4311000596C154945302 @default.
- W4311000596 hasConceptScore W4311000596C159985019 @default.
- W4311000596 hasConceptScore W4311000596C165464430 @default.
- W4311000596 hasConceptScore W4311000596C177148314 @default.
- W4311000596 hasConceptScore W4311000596C186633575 @default.
- W4311000596 hasConceptScore W4311000596C192562407 @default.
- W4311000596 hasConceptScore W4311000596C199360897 @default.
- W4311000596 hasConceptScore W4311000596C204323151 @default.
- W4311000596 hasConceptScore W4311000596C2776502983 @default.
- W4311000596 hasConceptScore W4311000596C33923547 @default.
- W4311000596 hasConceptScore W4311000596C41008148 @default.
- W4311000596 hasConceptScore W4311000596C50644808 @default.
- W4311000596 hasConceptScore W4311000596C81363708 @default.
- W4311000596 hasConceptScore W4311000596C8642999 @default.
- W4311000596 hasLocation W43110005961 @default.
- W4311000596 hasLocation W43110005962 @default.
- W4311000596 hasOpenAccess W4311000596 @default.
- W4311000596 hasPrimaryLocation W43110005961 @default.
- W4311000596 hasRelatedWork W1504381128 @default.
- W4311000596 hasRelatedWork W1985711950 @default.
- W4311000596 hasRelatedWork W2051058708 @default.
- W4311000596 hasRelatedWork W2121922170 @default.
- W4311000596 hasRelatedWork W2127982566 @default.
- W4311000596 hasRelatedWork W2153731865 @default.
- W4311000596 hasRelatedWork W2375684291 @default.
- W4311000596 hasRelatedWork W4210838092 @default.
- W4311000596 hasRelatedWork W4250124476 @default.
- W4311000596 hasRelatedWork W2471196694 @default.
- W4311000596 isParatext "false" @default.
- W4311000596 isRetracted "false" @default.
- W4311000596 workType "article" @default.