Matches in SemOpenAlex for { <https://semopenalex.org/work/W3130602232> ?p ?o ?g. }
- W3130602232 abstract "Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always portable to new architectures. This paper presents GradInit, an automated and architecture agnostic method for initializing neural networks. GradInit is based on a simple heuristic; the norm of each network layer is adjusted so that a single step of SGD or Adam with prescribed hyperparameters results in the smallest possible loss value. This adjustment is done by introducing a scalar multiplier variable in front of each parameter block, and then optimizing these variables using a simple numerical scheme. GradInit accelerates the convergence and test performance of many convolutional architectures, both with or without skip connections, and even without normalization layers. It also improves the stability of the original Transformer architecture for machine translation, enabling training it without learning rate warmup using either Adam or SGD under a wide range of learning rates and momentum coefficients. Code is available at https://github.com/zhuchen03/gradinit." @default.
- W3130602232 created "2021-03-01" @default.
- W3130602232 creator A5006471838 @default.
- W3130602232 creator A5032642601 @default.
- W3130602232 creator A5037237895 @default.
- W3130602232 creator A5060687985 @default.
- W3130602232 creator A5082943001 @default.
- W3130602232 creator A5083225492 @default.
- W3130602232 date "2021-02-16" @default.
- W3130602232 modified "2023-09-25" @default.
- W3130602232 title "GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training" @default.
- W3130602232 cites W1533861849 @default.
- W3130602232 cites W1677182931 @default.
- W3130602232 cites W1686810756 @default.
- W3130602232 cites W1836465849 @default.
- W3130602232 cites W2108598243 @default.
- W3130602232 cites W2194775991 @default.
- W3130602232 cites W2401231614 @default.
- W3130602232 cites W2518108298 @default.
- W3130602232 cites W2622263826 @default.
- W3130602232 cites W2746314669 @default.
- W3130602232 cites W2767286248 @default.
- W3130602232 cites W2785940809 @default.
- W3130602232 cites W2907762094 @default.
- W3130602232 cites W2908510526 @default.
- W3130602232 cites W2933138175 @default.
- W3130602232 cites W2962761235 @default.
- W3130602232 cites W2963000090 @default.
- W3130602232 cites W2963037478 @default.
- W3130602232 cites W2963399829 @default.
- W3130602232 cites W2963403868 @default.
- W3130602232 cites W2963446712 @default.
- W3130602232 cites W2963504252 @default.
- W3130602232 cites W2964121744 @default.
- W3130602232 cites W2965373594 @default.
- W3130602232 cites W2970674435 @default.
- W3130602232 cites W2994689640 @default.
- W3130602232 cites W2997347790 @default.
- W3130602232 cites W3010768098 @default.
- W3130602232 cites W3016635207 @default.
- W3130602232 cites W3035548130 @default.
- W3130602232 cites W3035618017 @default.
- W3130602232 cites W3041866211 @default.
- W3130602232 cites W3098903812 @default.
- W3130602232 cites W3100971012 @default.
- W3130602232 cites W3118608800 @default.
- W3130602232 cites W3121334524 @default.
- W3130602232 cites W3128633047 @default.
- W3130602232 cites W3157805269 @default.
- W3130602232 cites W3202983332 @default.
- W3130602232 hasPublicationYear "2021" @default.
- W3130602232 type Work @default.
- W3130602232 sameAs 3130602232 @default.
- W3130602232 citedByCount "7" @default.
- W3130602232 countsByYear W31306022322021 @default.
- W3130602232 crossrefType "posted-content" @default.
- W3130602232 hasAuthorship W3130602232A5006471838 @default.
- W3130602232 hasAuthorship W3130602232A5032642601 @default.
- W3130602232 hasAuthorship W3130602232A5037237895 @default.
- W3130602232 hasAuthorship W3130602232A5060687985 @default.
- W3130602232 hasAuthorship W3130602232A5082943001 @default.
- W3130602232 hasAuthorship W3130602232A5083225492 @default.
- W3130602232 hasBestOaLocation W31306022321 @default.
- W3130602232 hasConcept C11413529 @default.
- W3130602232 hasConcept C114466953 @default.
- W3130602232 hasConcept C119857082 @default.
- W3130602232 hasConcept C123657996 @default.
- W3130602232 hasConcept C142362112 @default.
- W3130602232 hasConcept C153349607 @default.
- W3130602232 hasConcept C154945302 @default.
- W3130602232 hasConcept C193415008 @default.
- W3130602232 hasConcept C199360897 @default.
- W3130602232 hasConcept C38652104 @default.
- W3130602232 hasConcept C41008148 @default.
- W3130602232 hasConcept C50644808 @default.
- W3130602232 hasConcept C81363708 @default.
- W3130602232 hasConcept C8642999 @default.
- W3130602232 hasConceptScore W3130602232C11413529 @default.
- W3130602232 hasConceptScore W3130602232C114466953 @default.
- W3130602232 hasConceptScore W3130602232C119857082 @default.
- W3130602232 hasConceptScore W3130602232C123657996 @default.
- W3130602232 hasConceptScore W3130602232C142362112 @default.
- W3130602232 hasConceptScore W3130602232C153349607 @default.
- W3130602232 hasConceptScore W3130602232C154945302 @default.
- W3130602232 hasConceptScore W3130602232C193415008 @default.
- W3130602232 hasConceptScore W3130602232C199360897 @default.
- W3130602232 hasConceptScore W3130602232C38652104 @default.
- W3130602232 hasConceptScore W3130602232C41008148 @default.
- W3130602232 hasConceptScore W3130602232C50644808 @default.
- W3130602232 hasConceptScore W3130602232C81363708 @default.
- W3130602232 hasConceptScore W3130602232C8642999 @default.
- W3130602232 hasLocation W31306022321 @default.
- W3130602232 hasOpenAccess W3130602232 @default.
- W3130602232 hasPrimaryLocation W31306022321 @default.
- W3130602232 hasRelatedWork W10809924 @default.
- W3130602232 hasRelatedWork W11546141 @default.
- W3130602232 hasRelatedWork W11937450 @default.
- W3130602232 hasRelatedWork W12251780 @default.
- W3130602232 hasRelatedWork W12428677 @default.
- W3130602232 hasRelatedWork W2433769 @default.