Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387389928> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4387389928 abstract "Large Language Models (LLMs) have recently gained popularity due to their impressive few-shot performance across various downstream tasks. However, fine-tuning all parameters and storing a unique model for each downstream task or domain becomes impractical because of the massive size of checkpoints (e.g., 350GB in GPT-3). Current literature, such as LoRA, showcases the potential of low-rank modifications to the original weights of an LLM, enabling efficient adaptation and storage for task-specific models. These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude. Yet, these methods face two primary limitations: 1) the parameter reduction is lower-bounded by the rank one decomposition, and 2) the extent of reduction is heavily influenced by both the model architecture and the chosen rank. For instance, in larger models, even a rank one decomposition might exceed the number of parameters truly needed for adaptation. In this paper, we introduce NOLA, which overcomes the rank one lower bound present in LoRA. It achieves this by re-parameterizing the low-rank matrices in LoRA using linear combinations of randomly generated matrices (basis) and optimizing the linear mixture coefficients only. This approach allows us to decouple the number of trainable parameters from both the choice of rank and the network architecture. We present adaptation results using GPT-2 and ViT in natural language and computer vision tasks. NOLA performs as well as, or better than models with equivalent parameter counts. Furthermore, we demonstrate that we can halve the parameters in larger models compared to LoRA with rank one, without sacrificing performance." @default.
- W4387389928 created "2023-10-06" @default.
- W4387389928 creator A5031860916 @default.
- W4387389928 creator A5054529203 @default.
- W4387389928 creator A5068682350 @default.
- W4387389928 creator A5084289151 @default.
- W4387389928 creator A5089166523 @default.
- W4387389928 date "2023-10-03" @default.
- W4387389928 modified "2023-10-17" @default.
- W4387389928 title "NOLA: Networks as Linear Combination of Low Rank Random Basis" @default.
- W4387389928 doi "https://doi.org/10.48550/arxiv.2310.02556" @default.
- W4387389928 hasPublicationYear "2023" @default.
- W4387389928 type Work @default.
- W4387389928 citedByCount "0" @default.
- W4387389928 crossrefType "posted-content" @default.
- W4387389928 hasAuthorship W4387389928A5031860916 @default.
- W4387389928 hasAuthorship W4387389928A5054529203 @default.
- W4387389928 hasAuthorship W4387389928A5068682350 @default.
- W4387389928 hasAuthorship W4387389928A5084289151 @default.
- W4387389928 hasAuthorship W4387389928A5089166523 @default.
- W4387389928 hasBestOaLocation W43873899281 @default.
- W4387389928 hasConcept C111335779 @default.
- W4387389928 hasConcept C11413529 @default.
- W4387389928 hasConcept C114614502 @default.
- W4387389928 hasConcept C12426560 @default.
- W4387389928 hasConcept C124681953 @default.
- W4387389928 hasConcept C15744967 @default.
- W4387389928 hasConcept C162324750 @default.
- W4387389928 hasConcept C164226766 @default.
- W4387389928 hasConcept C187736073 @default.
- W4387389928 hasConcept C18903297 @default.
- W4387389928 hasConcept C2524010 @default.
- W4387389928 hasConcept C2780451532 @default.
- W4387389928 hasConcept C2780586970 @default.
- W4387389928 hasConcept C33923547 @default.
- W4387389928 hasConcept C41008148 @default.
- W4387389928 hasConcept C77805123 @default.
- W4387389928 hasConcept C80444323 @default.
- W4387389928 hasConcept C86803240 @default.
- W4387389928 hasConceptScore W4387389928C111335779 @default.
- W4387389928 hasConceptScore W4387389928C11413529 @default.
- W4387389928 hasConceptScore W4387389928C114614502 @default.
- W4387389928 hasConceptScore W4387389928C12426560 @default.
- W4387389928 hasConceptScore W4387389928C124681953 @default.
- W4387389928 hasConceptScore W4387389928C15744967 @default.
- W4387389928 hasConceptScore W4387389928C162324750 @default.
- W4387389928 hasConceptScore W4387389928C164226766 @default.
- W4387389928 hasConceptScore W4387389928C187736073 @default.
- W4387389928 hasConceptScore W4387389928C18903297 @default.
- W4387389928 hasConceptScore W4387389928C2524010 @default.
- W4387389928 hasConceptScore W4387389928C2780451532 @default.
- W4387389928 hasConceptScore W4387389928C2780586970 @default.
- W4387389928 hasConceptScore W4387389928C33923547 @default.
- W4387389928 hasConceptScore W4387389928C41008148 @default.
- W4387389928 hasConceptScore W4387389928C77805123 @default.
- W4387389928 hasConceptScore W4387389928C80444323 @default.
- W4387389928 hasConceptScore W4387389928C86803240 @default.
- W4387389928 hasLocation W43873899281 @default.
- W4387389928 hasOpenAccess W4387389928 @default.
- W4387389928 hasPrimaryLocation W43873899281 @default.
- W4387389928 hasRelatedWork W2142306706 @default.
- W4387389928 hasRelatedWork W2296657975 @default.
- W4387389928 hasRelatedWork W2348524959 @default.
- W4387389928 hasRelatedWork W2368049389 @default.
- W4387389928 hasRelatedWork W2368605798 @default.
- W4387389928 hasRelatedWork W2384861574 @default.
- W4387389928 hasRelatedWork W2518037665 @default.
- W4387389928 hasRelatedWork W2952704802 @default.
- W4387389928 hasRelatedWork W3028700241 @default.
- W4387389928 hasRelatedWork W4294565801 @default.
- W4387389928 isParatext "false" @default.
- W4387389928 isRetracted "false" @default.
- W4387389928 workType "article" @default.