Matches in SemOpenAlex for { <https://semopenalex.org/work/W3048263222> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W3048263222 abstract "With the continuous demand for higher accuracy of deep neural networks, the model size has increased significantly. Quantization is one of the most widely used model compression methods, which can effectively reduce the model size without severe accuracy loss. Modern processors such as ARM CPU and NVIDIA GPU have already provided the support of low-bit arithmetic instructions. However, there lack efficient and practical optimizations for convolution computation towards extremely low-bit on ARM CPU (e.g., 2 ∼ 8-bit) and NVIDIA GPU (e.g., 4-bit and 8-bit). This paper explores the performance optimization methods of extremely low-bit convolution on diverse architectures. On ARM CPU, we propose two instruction schemes for 2 ∼ 3-bit and 4 ∼ 8-bit convolution with corresponding register allocation methods. In addition, we re-design the GEMM computation with data padding and packing optimizations. We also implement winograd algorithm for convolution with some specific bit width (e.g., 4 ∼ 6-bit) to achieve higher performance. On NVIDIA GPU, we propose a data partition mechanism and multi-level memory access optimizations, to better adapt the computation to GPU thread and memory hierarchy. We also propose quantization fusion to eliminate unnecessary data access. The experiment results demonstrate our implementations achieve better performance of extremely low-bit convolution compared to the state-of-the-art frameworks and libraries such as ncnn and cuDNN. To the best of our knowledge, this is the first work that provides efficient implementations of extremely low-bit convolutions covering 2 ∼ 8-bit on ARM CPU and 4-bit/8-bit on NVIDIA GPU." @default.
- W3048263222 created "2020-08-13" @default.
- W3048263222 creator A5011523353 @default.
- W3048263222 creator A5012792934 @default.
- W3048263222 creator A5014528965 @default.
- W3048263222 creator A5018705589 @default.
- W3048263222 creator A5026340829 @default.
- W3048263222 creator A5048414655 @default.
- W3048263222 creator A5070812231 @default.
- W3048263222 creator A5074183877 @default.
- W3048263222 creator A5079362609 @default.
- W3048263222 creator A5082233466 @default.
- W3048263222 creator A5088518659 @default.
- W3048263222 date "2020-08-17" @default.
- W3048263222 modified "2023-10-01" @default.
- W3048263222 title "Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures" @default.
- W3048263222 cites W2002257715 @default.
- W3048263222 cites W2108598243 @default.
- W3048263222 cites W2172654076 @default.
- W3048263222 cites W2194775991 @default.
- W3048263222 cites W2565516711 @default.
- W3048263222 cites W2736230459 @default.
- W3048263222 cites W2766205627 @default.
- W3048263222 cites W2963446712 @default.
- W3048263222 cites W2980186997 @default.
- W3048263222 cites W3004061291 @default.
- W3048263222 cites W3008102594 @default.
- W3048263222 cites W3101543398 @default.
- W3048263222 doi "https://doi.org/10.1145/3404397.3404407" @default.
- W3048263222 hasPublicationYear "2020" @default.
- W3048263222 type Work @default.
- W3048263222 sameAs 3048263222 @default.
- W3048263222 citedByCount "10" @default.
- W3048263222 countsByYear W30482632222021 @default.
- W3048263222 countsByYear W30482632222022 @default.
- W3048263222 countsByYear W30482632222023 @default.
- W3048263222 crossrefType "proceedings-article" @default.
- W3048263222 hasAuthorship W3048263222A5011523353 @default.
- W3048263222 hasAuthorship W3048263222A5012792934 @default.
- W3048263222 hasAuthorship W3048263222A5014528965 @default.
- W3048263222 hasAuthorship W3048263222A5018705589 @default.
- W3048263222 hasAuthorship W3048263222A5026340829 @default.
- W3048263222 hasAuthorship W3048263222A5048414655 @default.
- W3048263222 hasAuthorship W3048263222A5070812231 @default.
- W3048263222 hasAuthorship W3048263222A5074183877 @default.
- W3048263222 hasAuthorship W3048263222A5079362609 @default.
- W3048263222 hasAuthorship W3048263222A5082233466 @default.
- W3048263222 hasAuthorship W3048263222A5088518659 @default.
- W3048263222 hasConcept C113775141 @default.
- W3048263222 hasConcept C11413529 @default.
- W3048263222 hasConcept C115537543 @default.
- W3048263222 hasConcept C154945302 @default.
- W3048263222 hasConcept C173608175 @default.
- W3048263222 hasConcept C2778100165 @default.
- W3048263222 hasConcept C28855332 @default.
- W3048263222 hasConcept C41008148 @default.
- W3048263222 hasConcept C45347329 @default.
- W3048263222 hasConcept C45374587 @default.
- W3048263222 hasConcept C50644808 @default.
- W3048263222 hasConcept C9390403 @default.
- W3048263222 hasConceptScore W3048263222C113775141 @default.
- W3048263222 hasConceptScore W3048263222C11413529 @default.
- W3048263222 hasConceptScore W3048263222C115537543 @default.
- W3048263222 hasConceptScore W3048263222C154945302 @default.
- W3048263222 hasConceptScore W3048263222C173608175 @default.
- W3048263222 hasConceptScore W3048263222C2778100165 @default.
- W3048263222 hasConceptScore W3048263222C28855332 @default.
- W3048263222 hasConceptScore W3048263222C41008148 @default.
- W3048263222 hasConceptScore W3048263222C45347329 @default.
- W3048263222 hasConceptScore W3048263222C45374587 @default.
- W3048263222 hasConceptScore W3048263222C50644808 @default.
- W3048263222 hasConceptScore W3048263222C9390403 @default.
- W3048263222 hasFunder F4320327720 @default.
- W3048263222 hasLocation W30482632221 @default.
- W3048263222 hasOpenAccess W3048263222 @default.
- W3048263222 hasPrimaryLocation W30482632221 @default.
- W3048263222 hasRelatedWork W1509211761 @default.
- W3048263222 hasRelatedWork W1534227216 @default.
- W3048263222 hasRelatedWork W1572523360 @default.
- W3048263222 hasRelatedWork W1586422308 @default.
- W3048263222 hasRelatedWork W2027101952 @default.
- W3048263222 hasRelatedWork W2070391474 @default.
- W3048263222 hasRelatedWork W2187652483 @default.
- W3048263222 hasRelatedWork W2388892594 @default.
- W3048263222 hasRelatedWork W2906973896 @default.
- W3048263222 hasRelatedWork W3037767301 @default.
- W3048263222 isParatext "false" @default.
- W3048263222 isRetracted "false" @default.
- W3048263222 magId "3048263222" @default.
- W3048263222 workType "article" @default.