Matches in SemOpenAlex for { <https://semopenalex.org/work/W4353113381> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4353113381 abstract "Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper, we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem with the ground truth $U_star in mathbb{R}^{d times r}$ and the overparameterized model $U in mathbb{R}^{d times k}$ with $k gg r$. We study the approximate local minima of the mean square error, augmented with a smooth version of a group Lasso regularizer, $sum_{i=1}^k | U e_i |_2$. In particular, we provably show that pruning all the columns below a certain explicit $ell_2$-norm threshold results in a solution $U_{text{prune}}$ which has the minimum number of columns $r$, yet close to the ground truth in training loss. Moreover, in the subsequent fine-tuning phase, gradient descent initialized at $U_{text{prune}}$ converges at a linear rate to its limit. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their $ell_2$ norm comparable to that of the maximum. To the best of our knowledge, our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well." @default.
- W4353113381 created "2023-03-23" @default.
- W4353113381 creator A5008417632 @default.
- W4353113381 creator A5030620564 @default.
- W4353113381 creator A5062114138 @default.
- W4353113381 creator A5064641699 @default.
- W4353113381 date "2023-03-20" @default.
- W4353113381 modified "2023-10-16" @default.
- W4353113381 title "Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing" @default.
- W4353113381 doi "https://doi.org/10.48550/arxiv.2303.11453" @default.
- W4353113381 hasPublicationYear "2023" @default.
- W4353113381 type Work @default.
- W4353113381 citedByCount "0" @default.
- W4353113381 crossrefType "posted-content" @default.
- W4353113381 hasAuthorship W4353113381A5008417632 @default.
- W4353113381 hasAuthorship W4353113381A5030620564 @default.
- W4353113381 hasAuthorship W4353113381A5062114138 @default.
- W4353113381 hasAuthorship W4353113381A5064641699 @default.
- W4353113381 hasBestOaLocation W43531133811 @default.
- W4353113381 hasConcept C106487976 @default.
- W4353113381 hasConcept C108010975 @default.
- W4353113381 hasConcept C11413529 @default.
- W4353113381 hasConcept C114614502 @default.
- W4353113381 hasConcept C121332964 @default.
- W4353113381 hasConcept C134306372 @default.
- W4353113381 hasConcept C151201525 @default.
- W4353113381 hasConcept C153258448 @default.
- W4353113381 hasConcept C154945302 @default.
- W4353113381 hasConcept C159985019 @default.
- W4353113381 hasConcept C163716315 @default.
- W4353113381 hasConcept C186633575 @default.
- W4353113381 hasConcept C192562407 @default.
- W4353113381 hasConcept C2776135515 @default.
- W4353113381 hasConcept C2778459887 @default.
- W4353113381 hasConcept C28826006 @default.
- W4353113381 hasConcept C33923547 @default.
- W4353113381 hasConcept C41008148 @default.
- W4353113381 hasConcept C50644808 @default.
- W4353113381 hasConcept C62520636 @default.
- W4353113381 hasConcept C6557445 @default.
- W4353113381 hasConcept C86803240 @default.
- W4353113381 hasConceptScore W4353113381C106487976 @default.
- W4353113381 hasConceptScore W4353113381C108010975 @default.
- W4353113381 hasConceptScore W4353113381C11413529 @default.
- W4353113381 hasConceptScore W4353113381C114614502 @default.
- W4353113381 hasConceptScore W4353113381C121332964 @default.
- W4353113381 hasConceptScore W4353113381C134306372 @default.
- W4353113381 hasConceptScore W4353113381C151201525 @default.
- W4353113381 hasConceptScore W4353113381C153258448 @default.
- W4353113381 hasConceptScore W4353113381C154945302 @default.
- W4353113381 hasConceptScore W4353113381C159985019 @default.
- W4353113381 hasConceptScore W4353113381C163716315 @default.
- W4353113381 hasConceptScore W4353113381C186633575 @default.
- W4353113381 hasConceptScore W4353113381C192562407 @default.
- W4353113381 hasConceptScore W4353113381C2776135515 @default.
- W4353113381 hasConceptScore W4353113381C2778459887 @default.
- W4353113381 hasConceptScore W4353113381C28826006 @default.
- W4353113381 hasConceptScore W4353113381C33923547 @default.
- W4353113381 hasConceptScore W4353113381C41008148 @default.
- W4353113381 hasConceptScore W4353113381C50644808 @default.
- W4353113381 hasConceptScore W4353113381C62520636 @default.
- W4353113381 hasConceptScore W4353113381C6557445 @default.
- W4353113381 hasConceptScore W4353113381C86803240 @default.
- W4353113381 hasLocation W43531133811 @default.
- W4353113381 hasOpenAccess W4353113381 @default.
- W4353113381 hasPrimaryLocation W43531133811 @default.
- W4353113381 hasRelatedWork W2082482750 @default.
- W4353113381 hasRelatedWork W2113517874 @default.
- W4353113381 hasRelatedWork W2364230301 @default.
- W4353113381 hasRelatedWork W2373152179 @default.
- W4353113381 hasRelatedWork W2380451229 @default.
- W4353113381 hasRelatedWork W2791240763 @default.
- W4353113381 hasRelatedWork W2797540493 @default.
- W4353113381 hasRelatedWork W3081084973 @default.
- W4353113381 hasRelatedWork W3185486575 @default.
- W4353113381 hasRelatedWork W4298059767 @default.
- W4353113381 isParatext "false" @default.
- W4353113381 isRetracted "false" @default.
- W4353113381 workType "article" @default.