Matches in SemOpenAlex for { <https://semopenalex.org/work/W4315706512> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4315706512 abstract "The increasing interest in TinyML, i.e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only. Current training algorithms, based on various forms of error and gradient backpropagation, rely on floating-point matrix operations to meet the precision and dynamic range requirements. So far, the energy and power cost of these operations has been considered too high for TinyML scenarios. This paper addresses the open challenge of near-sensor training on a few mW power budget and presents RedMulE - Reduced-Precision Matrix Multiplication Engine, a low-power specialized accelerator conceived for multi-precision floating-point General Matrix-Matrix Operations (GEMM-Ops) acceleration, supporting FP16, as well as hybrid FP8 formats, with {sign, exponent, mantissa}=({1,4,3}, {1,5,2}). We integrate RedMule into a Parallel Ultra-Low-Power (PULP) cluster containing eight energy-efficient RISC-V cores sharing a tightly-coupled data memory and implement the resulting system in a 22 nm technology. At its best efficiency point (@ 470 MHz, 0.65 V), the RedMulE-augmented PULP cluster achieves 755 GFLOPS/W and 920 GFLOPS/W during regular General Matrix-Matrix Multiplication (GEMM), and up to 1.19 TFLOPS/W and 1.67 TFLOPS/W when executing GEMM-Ops, respectively, for FP16 and FP8 input/output tensors. In its best performance point (@ 613 MHz, 0.8 V), RedMulE achieves up to 58.5 GFLOPS and 117 GFLOPS for FP16 and FP8, respectively, with 99.4% utilization of the array of Computing Elements and consuming less than 60 mW on average, thus enabling on-device training of deep learning models in TinyML application scenarios while retaining the flexibility to tackle other classes of common linear algebra problems efficiently." @default.
- W4315706512 created "2023-01-12" @default.
- W4315706512 creator A5015790909 @default.
- W4315706512 creator A5017427307 @default.
- W4315706512 creator A5038717922 @default.
- W4315706512 creator A5043408422 @default.
- W4315706512 creator A5079484467 @default.
- W4315706512 date "2023-01-10" @default.
- W4315706512 modified "2023-09-27" @default.
- W4315706512 title "RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration" @default.
- W4315706512 doi "https://doi.org/10.48550/arxiv.2301.03904" @default.
- W4315706512 hasPublicationYear "2023" @default.
- W4315706512 type Work @default.
- W4315706512 citedByCount "0" @default.
- W4315706512 crossrefType "posted-content" @default.
- W4315706512 hasAuthorship W4315706512A5015790909 @default.
- W4315706512 hasAuthorship W4315706512A5017427307 @default.
- W4315706512 hasAuthorship W4315706512A5038717922 @default.
- W4315706512 hasAuthorship W4315706512A5043408422 @default.
- W4315706512 hasAuthorship W4315706512A5079484467 @default.
- W4315706512 hasBestOaLocation W43157065121 @default.
- W4315706512 hasConcept C106487976 @default.
- W4315706512 hasConcept C11413529 @default.
- W4315706512 hasConcept C117896860 @default.
- W4315706512 hasConcept C121332964 @default.
- W4315706512 hasConcept C134306372 @default.
- W4315706512 hasConcept C159985019 @default.
- W4315706512 hasConcept C163716315 @default.
- W4315706512 hasConcept C17349429 @default.
- W4315706512 hasConcept C173608175 @default.
- W4315706512 hasConcept C186633575 @default.
- W4315706512 hasConcept C192562407 @default.
- W4315706512 hasConcept C33923547 @default.
- W4315706512 hasConcept C3826847 @default.
- W4315706512 hasConcept C41008148 @default.
- W4315706512 hasConcept C459310 @default.
- W4315706512 hasConcept C56372850 @default.
- W4315706512 hasConcept C62520636 @default.
- W4315706512 hasConcept C74650414 @default.
- W4315706512 hasConcept C84114770 @default.
- W4315706512 hasConcept C84211073 @default.
- W4315706512 hasConceptScore W4315706512C106487976 @default.
- W4315706512 hasConceptScore W4315706512C11413529 @default.
- W4315706512 hasConceptScore W4315706512C117896860 @default.
- W4315706512 hasConceptScore W4315706512C121332964 @default.
- W4315706512 hasConceptScore W4315706512C134306372 @default.
- W4315706512 hasConceptScore W4315706512C159985019 @default.
- W4315706512 hasConceptScore W4315706512C163716315 @default.
- W4315706512 hasConceptScore W4315706512C17349429 @default.
- W4315706512 hasConceptScore W4315706512C173608175 @default.
- W4315706512 hasConceptScore W4315706512C186633575 @default.
- W4315706512 hasConceptScore W4315706512C192562407 @default.
- W4315706512 hasConceptScore W4315706512C33923547 @default.
- W4315706512 hasConceptScore W4315706512C3826847 @default.
- W4315706512 hasConceptScore W4315706512C41008148 @default.
- W4315706512 hasConceptScore W4315706512C459310 @default.
- W4315706512 hasConceptScore W4315706512C56372850 @default.
- W4315706512 hasConceptScore W4315706512C62520636 @default.
- W4315706512 hasConceptScore W4315706512C74650414 @default.
- W4315706512 hasConceptScore W4315706512C84114770 @default.
- W4315706512 hasConceptScore W4315706512C84211073 @default.
- W4315706512 hasLocation W43157065121 @default.
- W4315706512 hasOpenAccess W4315706512 @default.
- W4315706512 hasPrimaryLocation W43157065121 @default.
- W4315706512 hasRelatedWork W1970548269 @default.
- W4315706512 hasRelatedWork W2020015841 @default.
- W4315706512 hasRelatedWork W2031460602 @default.
- W4315706512 hasRelatedWork W2057136263 @default.
- W4315706512 hasRelatedWork W2086392083 @default.
- W4315706512 hasRelatedWork W2113921339 @default.
- W4315706512 hasRelatedWork W2273809747 @default.
- W4315706512 hasRelatedWork W2391308973 @default.
- W4315706512 hasRelatedWork W4290784209 @default.
- W4315706512 hasRelatedWork W3215286539 @default.
- W4315706512 isParatext "false" @default.
- W4315706512 isRetracted "false" @default.
- W4315706512 workType "article" @default.