Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387688051> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4387688051 abstract "We show that the majority of the inference computations for large generative models such as LLaMA and OPT can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. Crucially, our scheme is designed with computational efficiency in mind: we provide GPU kernels with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.1x relative to FP16 execution. Code and models are provided at https://github.com/IST-DASLab/QUIK." @default.
- W4387688051 created "2023-10-17" @default.
- W4387688051 creator A5026990786 @default.
- W4387688051 creator A5032438787 @default.
- W4387688051 creator A5033825187 @default.
- W4387688051 creator A5051243807 @default.
- W4387688051 creator A5060660774 @default.
- W4387688051 creator A5077292051 @default.
- W4387688051 creator A5083822059 @default.
- W4387688051 creator A5086409515 @default.
- W4387688051 date "2023-10-13" @default.
- W4387688051 modified "2023-10-18" @default.
- W4387688051 title "Towards End-to-end 4-Bit Inference on Generative Large Language Models" @default.
- W4387688051 doi "https://doi.org/10.48550/arxiv.2310.09259" @default.
- W4387688051 hasPublicationYear "2023" @default.
- W4387688051 type Work @default.
- W4387688051 citedByCount "0" @default.
- W4387688051 crossrefType "posted-content" @default.
- W4387688051 hasAuthorship W4387688051A5026990786 @default.
- W4387688051 hasAuthorship W4387688051A5032438787 @default.
- W4387688051 hasAuthorship W4387688051A5033825187 @default.
- W4387688051 hasAuthorship W4387688051A5051243807 @default.
- W4387688051 hasAuthorship W4387688051A5060660774 @default.
- W4387688051 hasAuthorship W4387688051A5077292051 @default.
- W4387688051 hasAuthorship W4387688051A5083822059 @default.
- W4387688051 hasAuthorship W4387688051A5086409515 @default.
- W4387688051 hasBestOaLocation W43876880511 @default.
- W4387688051 hasConcept C113775141 @default.
- W4387688051 hasConcept C11413529 @default.
- W4387688051 hasConcept C134306372 @default.
- W4387688051 hasConcept C154945302 @default.
- W4387688051 hasConcept C157764524 @default.
- W4387688051 hasConcept C173608175 @default.
- W4387688051 hasConcept C177264268 @default.
- W4387688051 hasConcept C199360897 @default.
- W4387688051 hasConcept C2776214188 @default.
- W4387688051 hasConcept C2776760102 @default.
- W4387688051 hasConcept C28855332 @default.
- W4387688051 hasConcept C33923547 @default.
- W4387688051 hasConcept C39890363 @default.
- W4387688051 hasConcept C41008148 @default.
- W4387688051 hasConcept C45374587 @default.
- W4387688051 hasConcept C555944384 @default.
- W4387688051 hasConcept C74296488 @default.
- W4387688051 hasConcept C76155785 @default.
- W4387688051 hasConcept C77618280 @default.
- W4387688051 hasConcept C79337645 @default.
- W4387688051 hasConceptScore W4387688051C113775141 @default.
- W4387688051 hasConceptScore W4387688051C11413529 @default.
- W4387688051 hasConceptScore W4387688051C134306372 @default.
- W4387688051 hasConceptScore W4387688051C154945302 @default.
- W4387688051 hasConceptScore W4387688051C157764524 @default.
- W4387688051 hasConceptScore W4387688051C173608175 @default.
- W4387688051 hasConceptScore W4387688051C177264268 @default.
- W4387688051 hasConceptScore W4387688051C199360897 @default.
- W4387688051 hasConceptScore W4387688051C2776214188 @default.
- W4387688051 hasConceptScore W4387688051C2776760102 @default.
- W4387688051 hasConceptScore W4387688051C28855332 @default.
- W4387688051 hasConceptScore W4387688051C33923547 @default.
- W4387688051 hasConceptScore W4387688051C39890363 @default.
- W4387688051 hasConceptScore W4387688051C41008148 @default.
- W4387688051 hasConceptScore W4387688051C45374587 @default.
- W4387688051 hasConceptScore W4387688051C555944384 @default.
- W4387688051 hasConceptScore W4387688051C74296488 @default.
- W4387688051 hasConceptScore W4387688051C76155785 @default.
- W4387688051 hasConceptScore W4387688051C77618280 @default.
- W4387688051 hasConceptScore W4387688051C79337645 @default.
- W4387688051 hasLocation W43876880511 @default.
- W4387688051 hasOpenAccess W4387688051 @default.
- W4387688051 hasPrimaryLocation W43876880511 @default.
- W4387688051 hasRelatedWork W2380075625 @default.
- W4387688051 hasRelatedWork W2917767146 @default.
- W4387688051 hasRelatedWork W3006513224 @default.
- W4387688051 hasRelatedWork W3179968364 @default.
- W4387688051 hasRelatedWork W3183118997 @default.
- W4387688051 hasRelatedWork W3196421258 @default.
- W4387688051 hasRelatedWork W3204296682 @default.
- W4387688051 hasRelatedWork W3204400881 @default.
- W4387688051 hasRelatedWork W3214410901 @default.
- W4387688051 hasRelatedWork W4285609037 @default.
- W4387688051 isParatext "false" @default.
- W4387688051 isRetracted "false" @default.
- W4387688051 workType "article" @default.