Matches in SemOpenAlex for { <https://semopenalex.org/work/W4307934016> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4307934016 abstract "Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU for generative inference. Moreover, we also show that our method can still provide reasonable accuracy in the extreme quantization regime, in which weights are quantized to 2-bit or even ternary quantization levels. We show experimentally that these improvements can be leveraged for end-to-end inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq." @default.
- W4307934016 created "2022-11-06" @default.
- W4307934016 creator A5026990786 @default.
- W4307934016 creator A5060660774 @default.
- W4307934016 creator A5083822059 @default.
- W4307934016 creator A5086409515 @default.
- W4307934016 date "2022-10-31" @default.
- W4307934016 modified "2023-10-16" @default.
- W4307934016 title "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers" @default.
- W4307934016 doi "https://doi.org/10.48550/arxiv.2210.17323" @default.
- W4307934016 hasPublicationYear "2022" @default.
- W4307934016 type Work @default.
- W4307934016 citedByCount "0" @default.
- W4307934016 crossrefType "posted-content" @default.
- W4307934016 hasAuthorship W4307934016A5026990786 @default.
- W4307934016 hasAuthorship W4307934016A5060660774 @default.
- W4307934016 hasAuthorship W4307934016A5083822059 @default.
- W4307934016 hasAuthorship W4307934016A5086409515 @default.
- W4307934016 hasBestOaLocation W43079340161 @default.
- W4307934016 hasConcept C113775141 @default.
- W4307934016 hasConcept C11413529 @default.
- W4307934016 hasConcept C121332964 @default.
- W4307934016 hasConcept C154945302 @default.
- W4307934016 hasConcept C165801399 @default.
- W4307934016 hasConcept C2776214188 @default.
- W4307934016 hasConcept C28855332 @default.
- W4307934016 hasConcept C41008148 @default.
- W4307934016 hasConcept C62520636 @default.
- W4307934016 hasConcept C66322947 @default.
- W4307934016 hasConceptScore W4307934016C113775141 @default.
- W4307934016 hasConceptScore W4307934016C11413529 @default.
- W4307934016 hasConceptScore W4307934016C121332964 @default.
- W4307934016 hasConceptScore W4307934016C154945302 @default.
- W4307934016 hasConceptScore W4307934016C165801399 @default.
- W4307934016 hasConceptScore W4307934016C2776214188 @default.
- W4307934016 hasConceptScore W4307934016C28855332 @default.
- W4307934016 hasConceptScore W4307934016C41008148 @default.
- W4307934016 hasConceptScore W4307934016C62520636 @default.
- W4307934016 hasConceptScore W4307934016C66322947 @default.
- W4307934016 hasLocation W43079340161 @default.
- W4307934016 hasOpenAccess W4307934016 @default.
- W4307934016 hasPrimaryLocation W43079340161 @default.
- W4307934016 hasRelatedWork W2777406049 @default.
- W4307934016 hasRelatedWork W2803935332 @default.
- W4307934016 hasRelatedWork W2963122961 @default.
- W4307934016 hasRelatedWork W2979314664 @default.
- W4307934016 hasRelatedWork W3099092507 @default.
- W4307934016 hasRelatedWork W3146091044 @default.
- W4307934016 hasRelatedWork W3177265267 @default.
- W4307934016 hasRelatedWork W3196579076 @default.
- W4307934016 hasRelatedWork W4287241953 @default.
- W4307934016 hasRelatedWork W4310823283 @default.
- W4307934016 isParatext "false" @default.
- W4307934016 isRetracted "false" @default.
- W4307934016 workType "article" @default.