Matches in SemOpenAlex for { <https://semopenalex.org/work/W3123638774> ?p ?o ?g. }
- W3123638774 abstract "The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning." @default.
- W3123638774 created "2021-02-01" @default.
- W3123638774 creator A5006011235 @default.
- W3123638774 creator A5030804320 @default.
- W3123638774 creator A5030838773 @default.
- W3123638774 creator A5033354550 @default.
- W3123638774 creator A5048262902 @default.
- W3123638774 creator A5090542476 @default.
- W3123638774 date "2021-05-03" @default.
- W3123638774 modified "2023-10-03" @default.
- W3123638774 title "Neural networks with late-phase weights" @default.
- W3123638774 cites W114517082 @default.
- W3123638774 cites W1525859397 @default.
- W3123638774 cites W1677182931 @default.
- W3123638774 cites W1836465849 @default.
- W3123638774 cites W2007700189 @default.
- W3123638774 cites W2043276874 @default.
- W3123638774 cites W2062546738 @default.
- W3123638774 cites W2064675550 @default.
- W3123638774 cites W2071048859 @default.
- W3123638774 cites W2086161653 @default.
- W3123638774 cites W2089217417 @default.
- W3123638774 cites W2095705004 @default.
- W3123638774 cites W2111051539 @default.
- W3123638774 cites W2112796928 @default.
- W3123638774 cites W2117539524 @default.
- W3123638774 cites W2120420045 @default.
- W3123638774 cites W2167433878 @default.
- W3123638774 cites W2172734211 @default.
- W3123638774 cites W2194775991 @default.
- W3123638774 cites W2335728318 @default.
- W3123638774 cites W2519766976 @default.
- W3123638774 cites W2535697732 @default.
- W3123638774 cites W2552194003 @default.
- W3123638774 cites W2604763608 @default.
- W3123638774 cites W2605372163 @default.
- W3123638774 cites W2612983688 @default.
- W3123638774 cites W2626325961 @default.
- W3123638774 cites W2732547613 @default.
- W3123638774 cites W2746314669 @default.
- W3123638774 cites W2795900505 @default.
- W3123638774 cites W2867167548 @default.
- W3123638774 cites W2903105043 @default.
- W3123638774 cites W2940195430 @default.
- W3123638774 cites W2962971773 @default.
- W3123638774 cites W2963000508 @default.
- W3123638774 cites W2963026770 @default.
- W3123638774 cites W2963060032 @default.
- W3123638774 cites W2963125457 @default.
- W3123638774 cites W2963173418 @default.
- W3123638774 cites W2963177640 @default.
- W3123638774 cites W2963211188 @default.
- W3123638774 cites W2963238274 @default.
- W3123638774 cites W2963263347 @default.
- W3123638774 cites W2963376662 @default.
- W3123638774 cites W2963384892 @default.
- W3123638774 cites W2963446712 @default.
- W3123638774 cites W2963748792 @default.
- W3123638774 cites W2963804082 @default.
- W3123638774 cites W2963842222 @default.
- W3123638774 cites W2963921132 @default.
- W3123638774 cites W2963939958 @default.
- W3123638774 cites W2963959597 @default.
- W3123638774 cites W2964059111 @default.
- W3123638774 cites W2964121744 @default.
- W3123638774 cites W2964137095 @default.
- W3123638774 cites W2964212410 @default.
- W3123638774 cites W2970247852 @default.
- W3123638774 cites W2971130081 @default.
- W3123638774 cites W2992525328 @default.
- W3123638774 cites W2994797252 @default.
- W3123638774 cites W2994821360 @default.
- W3123638774 cites W2994848047 @default.
- W3123638774 cites W2995892679 @default.
- W3123638774 cites W2996144997 @default.
- W3123638774 cites W2998438489 @default.
- W3123638774 cites W3007700590 @default.
- W3123638774 cites W3118608800 @default.
- W3123638774 cites W3127844431 @default.
- W3123638774 cites W3141595720 @default.
- W3123638774 cites W61316658 @default.
- W3123638774 cites W967544008 @default.
- W3123638774 hasPublicationYear "2021" @default.
- W3123638774 type Work @default.
- W3123638774 sameAs 3123638774 @default.
- W3123638774 citedByCount "5" @default.
- W3123638774 countsByYear W31236387742021 @default.
- W3123638774 crossrefType "proceedings-article" @default.
- W3123638774 hasAuthorship W3123638774A5006011235 @default.
- W3123638774 hasAuthorship W3123638774A5030804320 @default.
- W3123638774 hasAuthorship W3123638774A5030838773 @default.
- W3123638774 hasAuthorship W3123638774A5033354550 @default.
- W3123638774 hasAuthorship W3123638774A5048262902 @default.
- W3123638774 hasAuthorship W3123638774A5090542476 @default.
- W3123638774 hasConcept C11413529 @default.
- W3123638774 hasConcept C119857082 @default.
- W3123638774 hasConcept C129844170 @default.
- W3123638774 hasConcept C134306372 @default.
- W3123638774 hasConcept C153258448 @default.
- W3123638774 hasConcept C154945302 @default.