Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380136239> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4380136239 abstract "Efficiently serving neural network models with low latency is becoming more challenging due to increasing model complexity and parameter count. Model quantization offers a solution which simultaneously reduces memory footprint and compute requirements. However, aggressive quantization may lead to an unacceptable loss in model accuracy owing to differences in sensitivity to numerical imperfection across different layers in the model. To address this challenge, we propose a mixed-precision post training quantization (PTQ) approach that assigns different numerical precisions to tensors in a network based on their specific needs, for a reduced memory footprint and improved latency while preserving model accuracy. Previous works rely on layer-wise Hessian information to determine numerical precision, but as we demonstrate, Hessian estimation is typically insufficient in determining an effective ordering of layer sensitivities. We address this by augmenting the estimated Hessian with additional information to capture inter-layer dependencies. We demonstrate that this consistently improves PTQ performance along the accuracy-latency Pareto frontier across multiple models. Our method combines second-order information and inter-layer dependencies to guide a bisection search, finding quantization configurations within a user-configurable model accuracy degradation range. We evaluate the effectiveness of our method on the ResNet50, MobileNetV2, and BERT models. Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively, while maintaining model accuracy to within $99.99%$ of the baseline model." @default.
- W4380136239 created "2023-06-10" @default.
- W4380136239 creator A5005732762 @default.
- W4380136239 creator A5008678513 @default.
- W4380136239 creator A5021481018 @default.
- W4380136239 creator A5036593034 @default.
- W4380136239 creator A5049624374 @default.
- W4380136239 creator A5049989522 @default.
- W4380136239 creator A5052883326 @default.
- W4380136239 creator A5062849115 @default.
- W4380136239 creator A5075197561 @default.
- W4380136239 creator A5088463916 @default.
- W4380136239 date "2023-06-07" @default.
- W4380136239 modified "2023-09-23" @default.
- W4380136239 title "Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization" @default.
- W4380136239 doi "https://doi.org/10.48550/arxiv.2306.04879" @default.
- W4380136239 hasPublicationYear "2023" @default.
- W4380136239 type Work @default.
- W4380136239 citedByCount "0" @default.
- W4380136239 crossrefType "posted-content" @default.
- W4380136239 hasAuthorship W4380136239A5005732762 @default.
- W4380136239 hasAuthorship W4380136239A5008678513 @default.
- W4380136239 hasAuthorship W4380136239A5021481018 @default.
- W4380136239 hasAuthorship W4380136239A5036593034 @default.
- W4380136239 hasAuthorship W4380136239A5049624374 @default.
- W4380136239 hasAuthorship W4380136239A5049989522 @default.
- W4380136239 hasAuthorship W4380136239A5052883326 @default.
- W4380136239 hasAuthorship W4380136239A5062849115 @default.
- W4380136239 hasAuthorship W4380136239A5075197561 @default.
- W4380136239 hasAuthorship W4380136239A5088463916 @default.
- W4380136239 hasBestOaLocation W43801362391 @default.
- W4380136239 hasConcept C111919701 @default.
- W4380136239 hasConcept C11413529 @default.
- W4380136239 hasConcept C126255220 @default.
- W4380136239 hasConcept C134306372 @default.
- W4380136239 hasConcept C203616005 @default.
- W4380136239 hasConcept C28826006 @default.
- W4380136239 hasConcept C28855332 @default.
- W4380136239 hasConcept C33923547 @default.
- W4380136239 hasConcept C39927690 @default.
- W4380136239 hasConcept C41008148 @default.
- W4380136239 hasConcept C74912251 @default.
- W4380136239 hasConcept C76155785 @default.
- W4380136239 hasConcept C82876162 @default.
- W4380136239 hasConcept C8642999 @default.
- W4380136239 hasConceptScore W4380136239C111919701 @default.
- W4380136239 hasConceptScore W4380136239C11413529 @default.
- W4380136239 hasConceptScore W4380136239C126255220 @default.
- W4380136239 hasConceptScore W4380136239C134306372 @default.
- W4380136239 hasConceptScore W4380136239C203616005 @default.
- W4380136239 hasConceptScore W4380136239C28826006 @default.
- W4380136239 hasConceptScore W4380136239C28855332 @default.
- W4380136239 hasConceptScore W4380136239C33923547 @default.
- W4380136239 hasConceptScore W4380136239C39927690 @default.
- W4380136239 hasConceptScore W4380136239C41008148 @default.
- W4380136239 hasConceptScore W4380136239C74912251 @default.
- W4380136239 hasConceptScore W4380136239C76155785 @default.
- W4380136239 hasConceptScore W4380136239C82876162 @default.
- W4380136239 hasConceptScore W4380136239C8642999 @default.
- W4380136239 hasLocation W43801362391 @default.
- W4380136239 hasOpenAccess W4380136239 @default.
- W4380136239 hasPrimaryLocation W43801362391 @default.
- W4380136239 hasRelatedWork W1987753576 @default.
- W4380136239 hasRelatedWork W2073082060 @default.
- W4380136239 hasRelatedWork W2983785293 @default.
- W4380136239 hasRelatedWork W3107198815 @default.
- W4380136239 hasRelatedWork W3141806940 @default.
- W4380136239 hasRelatedWork W4226539834 @default.
- W4380136239 hasRelatedWork W4287593291 @default.
- W4380136239 hasRelatedWork W4352981362 @default.
- W4380136239 hasRelatedWork W2779562428 @default.
- W4380136239 hasRelatedWork W4225852813 @default.
- W4380136239 isParatext "false" @default.
- W4380136239 isRetracted "false" @default.
- W4380136239 workType "article" @default.