Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386081878> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4386081878 abstract "Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding. We test these methods using three new evaluation tasks (FreeFormQA, AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to be less fine-grained as a measure of long context performance of LLMs. We release the three tasks publicly as datasets on HuggingFace. We discover that linear scaling is the best method for extending context length, and show that further gains can be achieved by using longer scales at evaluation time. We also discover promising extrapolation capabilities in the truncated basis. To support further research in this area, we release three new 13B parameter long-context models which we call Giraffe: 4k and 16k context models trained from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We also release the code to replicate our results." @default.
- W4386081878 created "2023-08-23" @default.
- W4386081878 creator A5018487743 @default.
- W4386081878 creator A5041830853 @default.
- W4386081878 creator A5045970638 @default.
- W4386081878 creator A5047772500 @default.
- W4386081878 creator A5053328053 @default.
- W4386081878 creator A5063355813 @default.
- W4386081878 date "2023-08-21" @default.
- W4386081878 modified "2023-09-27" @default.
- W4386081878 title "Giraffe: Adventures in Expanding Context Lengths in LLMs" @default.
- W4386081878 doi "https://doi.org/10.48550/arxiv.2308.10882" @default.
- W4386081878 hasPublicationYear "2023" @default.
- W4386081878 type Work @default.
- W4386081878 citedByCount "0" @default.
- W4386081878 crossrefType "posted-content" @default.
- W4386081878 hasAuthorship W4386081878A5018487743 @default.
- W4386081878 hasAuthorship W4386081878A5041830853 @default.
- W4386081878 hasAuthorship W4386081878A5045970638 @default.
- W4386081878 hasAuthorship W4386081878A5047772500 @default.
- W4386081878 hasAuthorship W4386081878A5053328053 @default.
- W4386081878 hasAuthorship W4386081878A5063355813 @default.
- W4386081878 hasBestOaLocation W43860818781 @default.
- W4386081878 hasConcept C105795698 @default.
- W4386081878 hasConcept C119857082 @default.
- W4386081878 hasConcept C132459708 @default.
- W4386081878 hasConcept C154945302 @default.
- W4386081878 hasConcept C166957645 @default.
- W4386081878 hasConcept C183322885 @default.
- W4386081878 hasConcept C205649164 @default.
- W4386081878 hasConcept C2779343474 @default.
- W4386081878 hasConcept C2781238097 @default.
- W4386081878 hasConcept C33923547 @default.
- W4386081878 hasConcept C41008148 @default.
- W4386081878 hasConceptScore W4386081878C105795698 @default.
- W4386081878 hasConceptScore W4386081878C119857082 @default.
- W4386081878 hasConceptScore W4386081878C132459708 @default.
- W4386081878 hasConceptScore W4386081878C154945302 @default.
- W4386081878 hasConceptScore W4386081878C166957645 @default.
- W4386081878 hasConceptScore W4386081878C183322885 @default.
- W4386081878 hasConceptScore W4386081878C205649164 @default.
- W4386081878 hasConceptScore W4386081878C2779343474 @default.
- W4386081878 hasConceptScore W4386081878C2781238097 @default.
- W4386081878 hasConceptScore W4386081878C33923547 @default.
- W4386081878 hasConceptScore W4386081878C41008148 @default.
- W4386081878 hasLocation W43860818781 @default.
- W4386081878 hasOpenAccess W4386081878 @default.
- W4386081878 hasPrimaryLocation W43860818781 @default.
- W4386081878 hasRelatedWork W2961085424 @default.
- W4386081878 hasRelatedWork W3046775127 @default.
- W4386081878 hasRelatedWork W3170094116 @default.
- W4386081878 hasRelatedWork W4205958290 @default.
- W4386081878 hasRelatedWork W4285260836 @default.
- W4386081878 hasRelatedWork W4286629047 @default.
- W4386081878 hasRelatedWork W4306321456 @default.
- W4386081878 hasRelatedWork W4306674287 @default.
- W4386081878 hasRelatedWork W4386462264 @default.
- W4386081878 hasRelatedWork W4224009465 @default.
- W4386081878 isParatext "false" @default.
- W4386081878 isRetracted "false" @default.
- W4386081878 workType "article" @default.