Matches in SemOpenAlex for { <https://semopenalex.org/work/W2954097612> ?p ?o ?g. }
- W2954097612 abstract "We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a noise which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes (Standard and NTK). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases." @default.
- W2954097612 created "2019-07-12" @default.
- W2954097612 creator A5002784136 @default.
- W2954097612 creator A5007069562 @default.
- W2954097612 creator A5064986566 @default.
- W2954097612 creator A5088551093 @default.
- W2954097612 date "2019-05-09" @default.
- W2954097612 modified "2023-09-27" @default.
- W2954097612 title "The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study" @default.
- W2954097612 cites W1533861849 @default.
- W2954097612 cites W1677182931 @default.
- W2954097612 cites W2112796928 @default.
- W2954097612 cites W2194775991 @default.
- W2954097612 cites W2523060838 @default.
- W2954097612 cites W2604117713 @default.
- W2954097612 cites W2622263826 @default.
- W2954097612 cites W2626017178 @default.
- W2954097612 cites W2626325961 @default.
- W2954097612 cites W2750384547 @default.
- W2954097612 cites W2765393161 @default.
- W2954097612 cites W2768267830 @default.
- W2954097612 cites W2806504075 @default.
- W2954097612 cites W2807299122 @default.
- W2954097612 cites W2809090039 @default.
- W2954097612 cites W2900167092 @default.
- W2954097612 cites W2903697572 @default.
- W2954097612 cites W2903995489 @default.
- W2954097612 cites W2904367110 @default.
- W2954097612 cites W2911742574 @default.
- W2954097612 cites W2949117887 @default.
- W2954097612 cites W2949146054 @default.
- W2954097612 cites W2952892739 @default.
- W2954097612 cites W2962760235 @default.
- W2954097612 cites W2962915600 @default.
- W2954097612 cites W2963000508 @default.
- W2954097612 cites W2963069632 @default.
- W2954097612 cites W2963208657 @default.
- W2954097612 cites W2963285844 @default.
- W2954097612 cites W2963702144 @default.
- W2954097612 cites W2964052793 @default.
- W2954097612 cites W2964313743 @default.
- W2954097612 cites W3118608800 @default.
- W2954097612 cites W3137695714 @default.
- W2954097612 hasPublicationYear "2019" @default.
- W2954097612 type Work @default.
- W2954097612 sameAs 2954097612 @default.
- W2954097612 citedByCount "15" @default.
- W2954097612 countsByYear W29540976122019 @default.
- W2954097612 countsByYear W29540976122020 @default.
- W2954097612 countsByYear W29540976122021 @default.
- W2954097612 crossrefType "posted-content" @default.
- W2954097612 hasAuthorship W2954097612A5002784136 @default.
- W2954097612 hasAuthorship W2954097612A5007069562 @default.
- W2954097612 hasAuthorship W2954097612A5064986566 @default.
- W2954097612 hasAuthorship W2954097612A5088551093 @default.
- W2954097612 hasConcept C11413529 @default.
- W2954097612 hasConcept C114466953 @default.
- W2954097612 hasConcept C115961682 @default.
- W2954097612 hasConcept C121332964 @default.
- W2954097612 hasConcept C126255220 @default.
- W2954097612 hasConcept C134306372 @default.
- W2954097612 hasConcept C136886441 @default.
- W2954097612 hasConcept C14036430 @default.
- W2954097612 hasConcept C144024400 @default.
- W2954097612 hasConcept C153258448 @default.
- W2954097612 hasConcept C154945302 @default.
- W2954097612 hasConcept C177148314 @default.
- W2954097612 hasConcept C19165224 @default.
- W2954097612 hasConcept C199360897 @default.
- W2954097612 hasConcept C206688291 @default.
- W2954097612 hasConcept C2778755073 @default.
- W2954097612 hasConcept C28826006 @default.
- W2954097612 hasConcept C33923547 @default.
- W2954097612 hasConcept C34388435 @default.
- W2954097612 hasConcept C41008148 @default.
- W2954097612 hasConcept C50644808 @default.
- W2954097612 hasConcept C62520636 @default.
- W2954097612 hasConcept C78458016 @default.
- W2954097612 hasConcept C86803240 @default.
- W2954097612 hasConcept C99498987 @default.
- W2954097612 hasConceptScore W2954097612C11413529 @default.
- W2954097612 hasConceptScore W2954097612C114466953 @default.
- W2954097612 hasConceptScore W2954097612C115961682 @default.
- W2954097612 hasConceptScore W2954097612C121332964 @default.
- W2954097612 hasConceptScore W2954097612C126255220 @default.
- W2954097612 hasConceptScore W2954097612C134306372 @default.
- W2954097612 hasConceptScore W2954097612C136886441 @default.
- W2954097612 hasConceptScore W2954097612C14036430 @default.
- W2954097612 hasConceptScore W2954097612C144024400 @default.
- W2954097612 hasConceptScore W2954097612C153258448 @default.
- W2954097612 hasConceptScore W2954097612C154945302 @default.
- W2954097612 hasConceptScore W2954097612C177148314 @default.
- W2954097612 hasConceptScore W2954097612C19165224 @default.
- W2954097612 hasConceptScore W2954097612C199360897 @default.
- W2954097612 hasConceptScore W2954097612C206688291 @default.
- W2954097612 hasConceptScore W2954097612C2778755073 @default.
- W2954097612 hasConceptScore W2954097612C28826006 @default.
- W2954097612 hasConceptScore W2954097612C33923547 @default.
- W2954097612 hasConceptScore W2954097612C34388435 @default.
- W2954097612 hasConceptScore W2954097612C41008148 @default.