Matches in SemOpenAlex for { <https://semopenalex.org/work/W2891890103> ?p ?o ?g. }
- W2891890103 abstract "With the unprecedented development of compute capability and extension of memory bandwidth on modern GPUs, parallel communication and synchronization soon becomes a major concern for continuous performance scaling. This is especially the case for emerging big-data applications. Instead of relying on a few heavily-loaded CTAs that may expose opportunities for intra-CTA data reuse, current technology and design trends suggest the performance potential of allocating more lightweighted CTAs for processing individual tasks more independently, as the overheads from synchronization, communication and cooperation may greatly outweigh the benefits from exploiting limited data reuse in heavily-loaded CTAs. This paper proceeds this trend and proposes a novel execution model for modern GPUs that hides the CTA execution hierarchy from the classic GPU execution model; meanwhile exposes the originally hidden warp-level execution. Specifically, it relies on individual warps to undertake the original CTAs' tasks. The major observation is that by replacing traditional inter-warp communication (e.g., via shared memory), cooperation (e.g., via bar primitives) and synchronizations (e.g., via CTA barriers), with more efficient intra-warp communication (e.g., via register shuffling), cooperation (e.g., via warp voting) and synchronizations (naturally lockstep execution) across the SIMD-lanes within a warp, significant performance gain can be achieved. We analyze the pros and cons for this design and propose corresponding solutions to counter potential negative effects. Experimental results on a diverse group of thirty-two representative applications show that our proposed Warp-Consolidation execution model can achieve an average speedup of 1.7x, 2.3x, 1.5x and 1.2x (up to 6.3x, 31x, 6.4x and 3.8x) on NVIDIA Kepler (Tesla-K80), Maxwell (Tesla-M40), Pascal (Tesla-P100) and Volta (Tesla-V100) GPUs, respectively, demonstrating its applicability and portability. Our approach can be directly employed to either transform legacy codes or write new algorithms on modern commodity GPUs." @default.
- W2891890103 created "2018-09-27" @default.
- W2891890103 creator A5010181097 @default.
- W2891890103 creator A5038062469 @default.
- W2891890103 creator A5043209884 @default.
- W2891890103 creator A5059605240 @default.
- W2891890103 creator A5078691114 @default.
- W2891890103 date "2018-06-12" @default.
- W2891890103 modified "2023-10-17" @default.
- W2891890103 title "Warp-Consolidation" @default.
- W2891890103 cites W1967701350 @default.
- W2891890103 cites W1989562524 @default.
- W2891890103 cites W2016563136 @default.
- W2891890103 cites W2041356909 @default.
- W2891890103 cites W2049875313 @default.
- W2891890103 cites W2059301531 @default.
- W2891890103 cites W2067479799 @default.
- W2891890103 cites W2077143534 @default.
- W2891890103 cites W2080592089 @default.
- W2891890103 cites W2090584832 @default.
- W2891890103 cites W2091599698 @default.
- W2891890103 cites W2098274770 @default.
- W2891890103 cites W2098290747 @default.
- W2891890103 cites W2109473404 @default.
- W2891890103 cites W2126830109 @default.
- W2891890103 cites W2130749431 @default.
- W2891890103 cites W2135947393 @default.
- W2891890103 cites W2149234156 @default.
- W2891890103 cites W2152956697 @default.
- W2891890103 cites W2160406723 @default.
- W2891890103 cites W2171399035 @default.
- W2891890103 cites W2232645663 @default.
- W2891890103 cites W2409690919 @default.
- W2891890103 cites W2415973476 @default.
- W2891890103 cites W2469975815 @default.
- W2891890103 cites W2536535850 @default.
- W2891890103 cites W2583110315 @default.
- W2891890103 cites W2605251767 @default.
- W2891890103 cites W2741907101 @default.
- W2891890103 cites W2767068499 @default.
- W2891890103 cites W2768065515 @default.
- W2891890103 cites W2789572737 @default.
- W2891890103 doi "https://doi.org/10.1145/3205289.3205294" @default.
- W2891890103 hasPublicationYear "2018" @default.
- W2891890103 type Work @default.
- W2891890103 sameAs 2891890103 @default.
- W2891890103 citedByCount "26" @default.
- W2891890103 countsByYear W28918901032018 @default.
- W2891890103 countsByYear W28918901032019 @default.
- W2891890103 countsByYear W28918901032020 @default.
- W2891890103 countsByYear W28918901032021 @default.
- W2891890103 countsByYear W28918901032022 @default.
- W2891890103 countsByYear W28918901032023 @default.
- W2891890103 crossrefType "proceedings-article" @default.
- W2891890103 hasAuthorship W2891890103A5010181097 @default.
- W2891890103 hasAuthorship W2891890103A5038062469 @default.
- W2891890103 hasAuthorship W2891890103A5043209884 @default.
- W2891890103 hasAuthorship W2891890103A5059605240 @default.
- W2891890103 hasAuthorship W2891890103A5078691114 @default.
- W2891890103 hasBestOaLocation W28918901031 @default.
- W2891890103 hasConcept C101468663 @default.
- W2891890103 hasConcept C111919701 @default.
- W2891890103 hasConcept C118524514 @default.
- W2891890103 hasConcept C120314980 @default.
- W2891890103 hasConcept C127162648 @default.
- W2891890103 hasConcept C173608175 @default.
- W2891890103 hasConcept C188045654 @default.
- W2891890103 hasConcept C18903297 @default.
- W2891890103 hasConcept C206588197 @default.
- W2891890103 hasConcept C2776257435 @default.
- W2891890103 hasConcept C2776834041 @default.
- W2891890103 hasConcept C2778562939 @default.
- W2891890103 hasConcept C31258907 @default.
- W2891890103 hasConcept C41008148 @default.
- W2891890103 hasConcept C68339613 @default.
- W2891890103 hasConcept C86803240 @default.
- W2891890103 hasConceptScore W2891890103C101468663 @default.
- W2891890103 hasConceptScore W2891890103C111919701 @default.
- W2891890103 hasConceptScore W2891890103C118524514 @default.
- W2891890103 hasConceptScore W2891890103C120314980 @default.
- W2891890103 hasConceptScore W2891890103C127162648 @default.
- W2891890103 hasConceptScore W2891890103C173608175 @default.
- W2891890103 hasConceptScore W2891890103C188045654 @default.
- W2891890103 hasConceptScore W2891890103C18903297 @default.
- W2891890103 hasConceptScore W2891890103C206588197 @default.
- W2891890103 hasConceptScore W2891890103C2776257435 @default.
- W2891890103 hasConceptScore W2891890103C2776834041 @default.
- W2891890103 hasConceptScore W2891890103C2778562939 @default.
- W2891890103 hasConceptScore W2891890103C31258907 @default.
- W2891890103 hasConceptScore W2891890103C41008148 @default.
- W2891890103 hasConceptScore W2891890103C68339613 @default.
- W2891890103 hasConceptScore W2891890103C86803240 @default.
- W2891890103 hasFunder F4320335254 @default.
- W2891890103 hasFunder F4320337506 @default.
- W2891890103 hasLocation W28918901031 @default.
- W2891890103 hasOpenAccess W2891890103 @default.
- W2891890103 hasPrimaryLocation W28918901031 @default.
- W2891890103 hasRelatedWork W1509211761 @default.
- W2891890103 hasRelatedWork W1531488649 @default.
- W2891890103 hasRelatedWork W1585350690 @default.