Matches in SemOpenAlex for { <https://semopenalex.org/work/W2912556620> ?p ?o ?g. }
- W2912556620 abstract "Many popular first-order optimization methods (e.g., Momentum, AdaGrad, Adam) accelerate the convergence rate of deep learning models. However, these algorithms require auxiliary parameters, which cost additional memory proportional to the number of parameters in the model. The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets. Our proposed solution is to maintain a linear sketch to compress the auxiliary variables. We demonstrate that our technique has the same performance as the full-sized baseline, while using significantly less space for the auxiliary variables. Theoretically, we prove that count-sketch optimization maintains the SGD convergence rate, while gracefully reducing memory usage for large-models. On the large-scale 1-Billion Word dataset, we save 25% of the memory used during training (8.6 GB instead of 11.7 GB) by compressing the Adam optimizer in the Embedding and Softmax layers with negligible accuracy and performance loss. For an Amazon extreme classification task with over 49.5 million classes, we also reduce the training time by 38%, by increasing the mini-batch size 3.5x using our count-sketch optimizer." @default.
- W2912556620 created "2019-02-21" @default.
- W2912556620 creator A5013062640 @default.
- W2912556620 creator A5024280658 @default.
- W2912556620 creator A5024993683 @default.
- W2912556620 creator A5074551561 @default.
- W2912556620 date "2019-02-01" @default.
- W2912556620 modified "2023-09-25" @default.
- W2912556620 title "Compressing Gradient Optimizers via Count-Sketches" @default.
- W2912556620 cites W104184427 @default.
- W2912556620 cites W1493892051 @default.
- W2912556620 cites W1723811852 @default.
- W2912556620 cites W1988720110 @default.
- W2912556620 cites W2080234606 @default.
- W2912556620 cites W2100664567 @default.
- W2912556620 cites W2119144962 @default.
- W2912556620 cites W2146502635 @default.
- W2912556620 cites W2259472270 @default.
- W2912556620 cites W2279057335 @default.
- W2912556620 cites W2338908902 @default.
- W2912556620 cites W2525246036 @default.
- W2912556620 cites W2525332836 @default.
- W2912556620 cites W2622263826 @default.
- W2912556620 cites W2754526845 @default.
- W2912556620 cites W2795767639 @default.
- W2912556620 cites W2804456057 @default.
- W2912556620 cites W2891952073 @default.
- W2912556620 cites W2897031196 @default.
- W2912556620 cites W2951714314 @default.
- W2912556620 cites W2952339051 @default.
- W2912556620 cites W2952754453 @default.
- W2912556620 cites W2962911098 @default.
- W2912556620 cites W2962943936 @default.
- W2912556620 cites W2963056065 @default.
- W2912556620 cites W2963112338 @default.
- W2912556620 cites W2963341956 @default.
- W2912556620 cites W2963403868 @default.
- W2912556620 cites W2963537482 @default.
- W2912556620 cites W2963807318 @default.
- W2912556620 cites W2964121744 @default.
- W2912556620 hasPublicationYear "2019" @default.
- W2912556620 type Work @default.
- W2912556620 sameAs 2912556620 @default.
- W2912556620 citedByCount "0" @default.
- W2912556620 crossrefType "posted-content" @default.
- W2912556620 hasAuthorship W2912556620A5013062640 @default.
- W2912556620 hasAuthorship W2912556620A5024280658 @default.
- W2912556620 hasAuthorship W2912556620A5024993683 @default.
- W2912556620 hasAuthorship W2912556620A5074551561 @default.
- W2912556620 hasConcept C108583219 @default.
- W2912556620 hasConcept C11413529 @default.
- W2912556620 hasConcept C121332964 @default.
- W2912556620 hasConcept C154945302 @default.
- W2912556620 hasConcept C162324750 @default.
- W2912556620 hasConcept C187736073 @default.
- W2912556620 hasConcept C188441871 @default.
- W2912556620 hasConcept C2524010 @default.
- W2912556620 hasConcept C26517878 @default.
- W2912556620 hasConcept C2777303404 @default.
- W2912556620 hasConcept C2778755073 @default.
- W2912556620 hasConcept C2779231336 @default.
- W2912556620 hasConcept C2780451532 @default.
- W2912556620 hasConcept C33923547 @default.
- W2912556620 hasConcept C38652104 @default.
- W2912556620 hasConcept C41008148 @default.
- W2912556620 hasConcept C41608201 @default.
- W2912556620 hasConcept C50522688 @default.
- W2912556620 hasConcept C57869625 @default.
- W2912556620 hasConcept C62520636 @default.
- W2912556620 hasConcept C90805587 @default.
- W2912556620 hasConceptScore W2912556620C108583219 @default.
- W2912556620 hasConceptScore W2912556620C11413529 @default.
- W2912556620 hasConceptScore W2912556620C121332964 @default.
- W2912556620 hasConceptScore W2912556620C154945302 @default.
- W2912556620 hasConceptScore W2912556620C162324750 @default.
- W2912556620 hasConceptScore W2912556620C187736073 @default.
- W2912556620 hasConceptScore W2912556620C188441871 @default.
- W2912556620 hasConceptScore W2912556620C2524010 @default.
- W2912556620 hasConceptScore W2912556620C26517878 @default.
- W2912556620 hasConceptScore W2912556620C2777303404 @default.
- W2912556620 hasConceptScore W2912556620C2778755073 @default.
- W2912556620 hasConceptScore W2912556620C2779231336 @default.
- W2912556620 hasConceptScore W2912556620C2780451532 @default.
- W2912556620 hasConceptScore W2912556620C33923547 @default.
- W2912556620 hasConceptScore W2912556620C38652104 @default.
- W2912556620 hasConceptScore W2912556620C41008148 @default.
- W2912556620 hasConceptScore W2912556620C41608201 @default.
- W2912556620 hasConceptScore W2912556620C50522688 @default.
- W2912556620 hasConceptScore W2912556620C57869625 @default.
- W2912556620 hasConceptScore W2912556620C62520636 @default.
- W2912556620 hasConceptScore W2912556620C90805587 @default.
- W2912556620 hasLocation W29125566201 @default.
- W2912556620 hasOpenAccess W2912556620 @default.
- W2912556620 hasPrimaryLocation W29125566201 @default.
- W2912556620 hasRelatedWork W1970576786 @default.
- W2912556620 hasRelatedWork W2737100304 @default.
- W2912556620 hasRelatedWork W2750933313 @default.
- W2912556620 hasRelatedWork W2891491149 @default.
- W2912556620 hasRelatedWork W2896386620 @default.
- W2912556620 hasRelatedWork W2914261901 @default.