Matches in SemOpenAlex for { <https://semopenalex.org/work/W2034135205> ?p ?o ?g. }
- W2034135205 endingPage "19" @default.
- W2034135205 startingPage "3" @default.
- W2034135205 abstract "Parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still contains both sequential code and parallel loops. In order to leverage such parallel loops, the latest NVIDIA Kepler architecture introduces dynamic parallelism, which allows a GPU thread to start another GPU kernel, thereby reducing the overhead of launching kernels from a CPU. However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs. In this paper, we first study a set of GPGPU benchmarks that contain parallel loops, and highlight that these benchmarks do not have a very high loop count or high degree of TLP. Consequently, the benefits of leveraging such parallel loops using dynamic parallelism are too limited to offset its overhead. We then present our proposed solution to exploit nested parallelism in CUDA, referred to as CUDA-NP. With CUDA-NP, we initially enable a high number of threads when a GPU program starts, and use control flow to activate different numbers of threads for different code sections. We implement our proposed CUDA-NP framework using a directive-based compiler approach. For a GPU kernel, an application developer only needs to add OpenMP-like pragmas for parallelizable code sections. Then, our CUDA-NP compiler automatically generates the optimized GPU kernels. It supports both the reduction and the scan primitives, explores different ways to distribute parallel loop iterations into threads, and efficiently manages on-chip resource. Our experiments show that for a set of GPGPU benchmarks, which have already been optimized and contain nested parallelism, our proposed CUDA-NP framework further improves the performance by up to 6.69 times and 2.01 times on average." @default.
- W2034135205 created "2016-06-24" @default.
- W2034135205 creator A5003753200 @default.
- W2034135205 creator A5066072288 @default.
- W2034135205 creator A5072755725 @default.
- W2034135205 date "2015-01-01" @default.
- W2034135205 modified "2023-10-16" @default.
- W2034135205 title "CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications" @default.
- W2034135205 cites W1485193134 @default.
- W2034135205 cites W1493206251 @default.
- W2034135205 cites W1517684295 @default.
- W2034135205 cites W1534721345 @default.
- W2034135205 cites W1537323515 @default.
- W2034135205 cites W1589136629 @default.
- W2034135205 cites W1972343800 @default.
- W2034135205 cites W1979527452 @default.
- W2034135205 cites W1988888548 @default.
- W2034135205 cites W1992851788 @default.
- W2034135205 cites W2013051005 @default.
- W2034135205 cites W2015839815 @default.
- W2034135205 cites W2029940394 @default.
- W2034135205 cites W2039304376 @default.
- W2034135205 cites W2047060659 @default.
- W2034135205 cites W2072680607 @default.
- W2034135205 cites W2080592089 @default.
- W2034135205 cites W2083056254 @default.
- W2034135205 cites W2086500526 @default.
- W2034135205 cites W2090584832 @default.
- W2034135205 cites W2108801243 @default.
- W2034135205 cites W2109473404 @default.
- W2034135205 cites W2112185810 @default.
- W2034135205 cites W2115148068 @default.
- W2034135205 cites W2118880662 @default.
- W2034135205 cites W2128329055 @default.
- W2034135205 cites W2129232868 @default.
- W2034135205 cites W2129817042 @default.
- W2034135205 cites W2130749431 @default.
- W2034135205 cites W2132600716 @default.
- W2034135205 cites W2142769604 @default.
- W2034135205 cites W2147193503 @default.
- W2034135205 cites W2152812278 @default.
- W2034135205 cites W2153492376 @default.
- W2034135205 cites W2161190431 @default.
- W2034135205 cites W2162726111 @default.
- W2034135205 cites W2167101788 @default.
- W2034135205 cites W2167334577 @default.
- W2034135205 cites W2170634604 @default.
- W2034135205 cites W3141650078 @default.
- W2034135205 cites W3147878143 @default.
- W2034135205 cites W4245867598 @default.
- W2034135205 doi "https://doi.org/10.1007/s11390-015-1500-y" @default.
- W2034135205 hasPublicationYear "2015" @default.
- W2034135205 type Work @default.
- W2034135205 sameAs 2034135205 @default.
- W2034135205 citedByCount "3" @default.
- W2034135205 countsByYear W20341352052015 @default.
- W2034135205 countsByYear W20341352052017 @default.
- W2034135205 countsByYear W20341352052022 @default.
- W2034135205 crossrefType "journal-article" @default.
- W2034135205 hasAuthorship W2034135205A5003753200 @default.
- W2034135205 hasAuthorship W2034135205A5066072288 @default.
- W2034135205 hasAuthorship W2034135205A5072755725 @default.
- W2034135205 hasBestOaLocation W20341352052 @default.
- W2034135205 hasConcept C111919701 @default.
- W2034135205 hasConcept C121684516 @default.
- W2034135205 hasConcept C138101251 @default.
- W2034135205 hasConcept C173608175 @default.
- W2034135205 hasConcept C21442007 @default.
- W2034135205 hasConcept C2778119891 @default.
- W2034135205 hasConcept C2781172179 @default.
- W2034135205 hasConcept C41008148 @default.
- W2034135205 hasConcept C42992933 @default.
- W2034135205 hasConcept C50630238 @default.
- W2034135205 hasConceptScore W2034135205C111919701 @default.
- W2034135205 hasConceptScore W2034135205C121684516 @default.
- W2034135205 hasConceptScore W2034135205C138101251 @default.
- W2034135205 hasConceptScore W2034135205C173608175 @default.
- W2034135205 hasConceptScore W2034135205C21442007 @default.
- W2034135205 hasConceptScore W2034135205C2778119891 @default.
- W2034135205 hasConceptScore W2034135205C2781172179 @default.
- W2034135205 hasConceptScore W2034135205C41008148 @default.
- W2034135205 hasConceptScore W2034135205C42992933 @default.
- W2034135205 hasConceptScore W2034135205C50630238 @default.
- W2034135205 hasIssue "1" @default.
- W2034135205 hasLocation W20341352051 @default.
- W2034135205 hasLocation W20341352052 @default.
- W2034135205 hasOpenAccess W2034135205 @default.
- W2034135205 hasPrimaryLocation W20341352051 @default.
- W2034135205 hasRelatedWork W1567437828 @default.
- W2034135205 hasRelatedWork W2011822474 @default.
- W2034135205 hasRelatedWork W2034135205 @default.
- W2034135205 hasRelatedWork W2035779775 @default.
- W2034135205 hasRelatedWork W2187267005 @default.
- W2034135205 hasRelatedWork W2203549461 @default.
- W2034135205 hasRelatedWork W2306641587 @default.
- W2034135205 hasRelatedWork W2473470984 @default.
- W2034135205 hasRelatedWork W3037545369 @default.
- W2034135205 hasRelatedWork W3175298944 @default.