Matches in SemOpenAlex for { <https://semopenalex.org/work/W2273809747> ?p ?o ?g. }
- W2273809747 endingPage "22" @default.
- W2273809747 startingPage "12" @default.
- W2273809747 abstract "Matrix multiplication is a fundamental linear algebra routine ubiquitous in all areas of science and engineering. Highly optimised BLAS libraries (cuBLAS and clBLAS on GPUs) are the most popular choices for an implementation of the General Matrix Multiply (GEMM) in software. In this paper we present GiMMiK—a generator of bespoke matrix multiplication kernels for the CUDA and OpenCL platforms. GiMMiK exploits a prior knowledge of the operator matrix to generate highly performant code. The performance of GiMMiK’s kernels is particularly apparent in a block-by-panel type of matrix multiplication, where the block matrix is typically small (e.g. dimensions of 96×64). Such operations are characteristic to our motivating application in PyFR—an implementation of Flux Reconstruction schemes for high-order fluid flow simulations on mixed unstructured meshes. GiMMiK fully unrolls the matrix–vector product and embeds matrix entries directly in the code to benefit from the use of the constant cache and compiler optimisations. Further, it reduces the number of floating-point operations by removing multiplications by zeros. Together with the ability of our kernels to avoid the poorly optimised cleanup code, executed by library GEMM, we are able to outperform cuBLAS on two NVIDIA GPUs: GTX 780 Ti and Tesla K40c. We observe speedups of our kernels over cuBLAS GEMM of up to 9.98 and 63.30 times for a 294×1029 99% sparse PyFR matrix in double precision on the Tesla K40c and GTX 780 Ti correspondingly. In single precision, observed speedups reach 12.20 and 13.07 times for a 4×8 50% sparse PyFR matrix on the two aforementioned cards. Using GiMMiK as the matrix multiplication kernel provider allows us to achieve a speedup of up to 1.70 (2.19) for a simulation of an unsteady flow over a cylinder executed with PyFR in double (single) precision on the Tesla K40c. All results were generated with GiMMiK version 1.0." @default.
- W2273809747 created "2016-06-24" @default.
- W2273809747 creator A5000319133 @default.
- W2273809747 creator A5044945209 @default.
- W2273809747 creator A5067736188 @default.
- W2273809747 creator A5067822985 @default.
- W2273809747 creator A5069977488 @default.
- W2273809747 date "2016-05-01" @default.
- W2273809747 modified "2023-10-17" @default.
- W2273809747 title "GiMMiK—Generating bespoke matrix multiplication kernels for accelerators: Application to high-order Computational Fluid Dynamics" @default.
- W2273809747 cites W1964031104 @default.
- W2273809747 cites W2073061372 @default.
- W2273809747 cites W2091371833 @default.
- W2273809747 cites W2099021415 @default.
- W2273809747 cites W2148897203 @default.
- W2273809747 cites W2315982789 @default.
- W2273809747 cites W2332227794 @default.
- W2273809747 doi "https://doi.org/10.1016/j.cpc.2015.12.012" @default.
- W2273809747 hasPublicationYear "2016" @default.
- W2273809747 type Work @default.
- W2273809747 sameAs 2273809747 @default.
- W2273809747 citedByCount "25" @default.
- W2273809747 countsByYear W22738097472015 @default.
- W2273809747 countsByYear W22738097472016 @default.
- W2273809747 countsByYear W22738097472017 @default.
- W2273809747 countsByYear W22738097472018 @default.
- W2273809747 countsByYear W22738097472020 @default.
- W2273809747 countsByYear W22738097472021 @default.
- W2273809747 countsByYear W22738097472022 @default.
- W2273809747 countsByYear W22738097472023 @default.
- W2273809747 crossrefType "journal-article" @default.
- W2273809747 hasAuthorship W2273809747A5000319133 @default.
- W2273809747 hasAuthorship W2273809747A5044945209 @default.
- W2273809747 hasAuthorship W2273809747A5067736188 @default.
- W2273809747 hasAuthorship W2273809747A5067822985 @default.
- W2273809747 hasAuthorship W2273809747A5069977488 @default.
- W2273809747 hasBestOaLocation W22738097471 @default.
- W2273809747 hasConcept C106487976 @default.
- W2273809747 hasConcept C11413529 @default.
- W2273809747 hasConcept C114614502 @default.
- W2273809747 hasConcept C121332964 @default.
- W2273809747 hasConcept C133162039 @default.
- W2273809747 hasConcept C139352143 @default.
- W2273809747 hasConcept C159985019 @default.
- W2273809747 hasConcept C163716315 @default.
- W2273809747 hasConcept C169590947 @default.
- W2273809747 hasConcept C17349429 @default.
- W2273809747 hasConcept C173608175 @default.
- W2273809747 hasConcept C192562407 @default.
- W2273809747 hasConcept C199360897 @default.
- W2273809747 hasConcept C2524010 @default.
- W2273809747 hasConcept C26517878 @default.
- W2273809747 hasConcept C2777210771 @default.
- W2273809747 hasConcept C2778119891 @default.
- W2273809747 hasConcept C2780595030 @default.
- W2273809747 hasConcept C33923547 @default.
- W2273809747 hasConcept C38652104 @default.
- W2273809747 hasConcept C41008148 @default.
- W2273809747 hasConcept C459310 @default.
- W2273809747 hasConcept C56372850 @default.
- W2273809747 hasConcept C62520636 @default.
- W2273809747 hasConcept C84114770 @default.
- W2273809747 hasConcept C84211073 @default.
- W2273809747 hasConceptScore W2273809747C106487976 @default.
- W2273809747 hasConceptScore W2273809747C11413529 @default.
- W2273809747 hasConceptScore W2273809747C114614502 @default.
- W2273809747 hasConceptScore W2273809747C121332964 @default.
- W2273809747 hasConceptScore W2273809747C133162039 @default.
- W2273809747 hasConceptScore W2273809747C139352143 @default.
- W2273809747 hasConceptScore W2273809747C159985019 @default.
- W2273809747 hasConceptScore W2273809747C163716315 @default.
- W2273809747 hasConceptScore W2273809747C169590947 @default.
- W2273809747 hasConceptScore W2273809747C17349429 @default.
- W2273809747 hasConceptScore W2273809747C173608175 @default.
- W2273809747 hasConceptScore W2273809747C192562407 @default.
- W2273809747 hasConceptScore W2273809747C199360897 @default.
- W2273809747 hasConceptScore W2273809747C2524010 @default.
- W2273809747 hasConceptScore W2273809747C26517878 @default.
- W2273809747 hasConceptScore W2273809747C2777210771 @default.
- W2273809747 hasConceptScore W2273809747C2778119891 @default.
- W2273809747 hasConceptScore W2273809747C2780595030 @default.
- W2273809747 hasConceptScore W2273809747C33923547 @default.
- W2273809747 hasConceptScore W2273809747C38652104 @default.
- W2273809747 hasConceptScore W2273809747C41008148 @default.
- W2273809747 hasConceptScore W2273809747C459310 @default.
- W2273809747 hasConceptScore W2273809747C56372850 @default.
- W2273809747 hasConceptScore W2273809747C62520636 @default.
- W2273809747 hasConceptScore W2273809747C84114770 @default.
- W2273809747 hasConceptScore W2273809747C84211073 @default.
- W2273809747 hasFunder F4320334627 @default.
- W2273809747 hasLocation W22738097471 @default.
- W2273809747 hasLocation W22738097472 @default.
- W2273809747 hasOpenAccess W2273809747 @default.
- W2273809747 hasPrimaryLocation W22738097471 @default.
- W2273809747 hasRelatedWork W1508286210 @default.
- W2273809747 hasRelatedWork W1978647314 @default.
- W2273809747 hasRelatedWork W1981557297 @default.
- W2273809747 hasRelatedWork W1993704253 @default.