Matches in SemOpenAlex for { <https://semopenalex.org/work/W2802893805> ?p ?o ?g. }
- W2802893805 abstract "Author(s): Ashkiani, Saman | Abstract: Graphics Processing Units (GPUs) are massively parallel processors with thousands of active threads originally designed for throughput-oriented tasks.In order to get as much performance as possible given the hardware characteristics of GPUs, it is extremely important for programmers to not only design an efficient algorithm with good enough asymptotic complexities, but also to take into account the hardware limitations and preferences.In this work, we focus our design on two high level abstractions: work assignment and processing. The former denotes the assigned task by the programmer to each thread or group of threads. The latter encapsulates the actual execution of assigned tasks.Previous work conflates work assignment and processing into similar granularities. The most traditional way is to have per-thread work assignment followed by per-thread processing of that assigned work. Each thread sequentially processes a part of input and then the results are combined appropriately. In this work, we use this approach in implementing various algorithms for the string matching problem (finding all instances of a pattern within a larger text).Another effective but less popular idea is per-warp work assignment followed by per-warp processing of that work. It usually requires efficient intra-warp communication to be able to efficiently process input data which is now distributed among all threads within that warp. With the emergence of warp-wide voting and shuffle instructions, this approach has gained more potential in solving particular problems efficiently and with some benefits compared to the per-thread assignment and processing. In this work, we use this approach to implement a series of parallel algorithms: histogram, multisplit and radix sort.An advantage of using similar granularities for work assignment and processing is in problems with uniform per-thread or per-warp workloads, where it is quite easy to adapt warp-synchronous ideas and achieve high performance.However, with non-uniform irregular workloads, different threads might finish their processing in different times which can cause a sub-par performance. This is mainly because the whole warp continues to be resident in the device as long as all its threads are finished.With these irregular tasks in mind, we propose to use different granularities for our work assignment and processing.We use a per-thread work assignment followed by a per-warp processing; each thread is still responsible for an independent task, but now all threads within a warp cooperate with each other to perform all these tasks together, one at a time, until all are successfully processed.Using this strategy, we design a dynamic hash table for the GPU, the slab hash, which is a totally concurrent data structure supporting asynchronous updates and search queries: threads may have different operations to perform and each might require an unknown amount of time to be fulfilled. By following our warp-cooperative strategy, all threads help each other perform these operations together, causing a much higher warp efficiency compared to traditional conflated work assignment and processing schemes." @default.
- W2802893805 created "2018-05-17" @default.
- W2802893805 creator A5025222722 @default.
- W2802893805 date "2017-12-01" @default.
- W2802893805 modified "2023-09-26" @default.
- W2802893805 title "Parallel Algorithms and Dynamic Data Structures on the Graphics Processing Unit: a warp-centric approach" @default.
- W2802893805 cites W1480958225 @default.
- W2802893805 cites W1482680420 @default.
- W2802893805 cites W1840931607 @default.
- W2802893805 cites W1842317227 @default.
- W2802893805 cites W1972418517 @default.
- W2802893805 cites W1981585303 @default.
- W2802893805 cites W1983343295 @default.
- W2802893805 cites W1983424264 @default.
- W2802893805 cites W1983540755 @default.
- W2802893805 cites W1983788629 @default.
- W2802893805 cites W1985108724 @default.
- W2802893805 cites W2007742815 @default.
- W2802893805 cites W2016706026 @default.
- W2802893805 cites W2017086619 @default.
- W2802893805 cites W2024794876 @default.
- W2802893805 cites W2028499920 @default.
- W2802893805 cites W2032309817 @default.
- W2802893805 cites W2032995245 @default.
- W2802893805 cites W2035080386 @default.
- W2802893805 cites W2039820867 @default.
- W2802893805 cites W2041470524 @default.
- W2802893805 cites W2048187990 @default.
- W2802893805 cites W2050182684 @default.
- W2802893805 cites W2050277572 @default.
- W2802893805 cites W2050513283 @default.
- W2802893805 cites W2057112598 @default.
- W2802893805 cites W2069976245 @default.
- W2802893805 cites W2074982700 @default.
- W2802893805 cites W2085830238 @default.
- W2802893805 cites W2087507944 @default.
- W2802893805 cites W2100608884 @default.
- W2802893805 cites W2107173440 @default.
- W2802893805 cites W2116079772 @default.
- W2802893805 cites W2117098610 @default.
- W2802893805 cites W2119547137 @default.
- W2802893805 cites W2131565410 @default.
- W2802893805 cites W2134158653 @default.
- W2802893805 cites W2134427337 @default.
- W2802893805 cites W2134826720 @default.
- W2802893805 cites W2135525032 @default.
- W2802893805 cites W2145455679 @default.
- W2802893805 cites W2148016496 @default.
- W2802893805 cites W2149150056 @default.
- W2802893805 cites W2153226019 @default.
- W2802893805 cites W2159392969 @default.
- W2802893805 cites W2161694911 @default.
- W2802893805 cites W2169528473 @default.
- W2802893805 cites W2260780546 @default.
- W2802893805 cites W2313241142 @default.
- W2802893805 cites W2336892223 @default.
- W2802893805 cites W2403521492 @default.
- W2802893805 cites W2506118120 @default.
- W2802893805 cites W2557279889 @default.
- W2802893805 cites W2577971637 @default.
- W2802893805 cites W2581744095 @default.
- W2802893805 cites W2597537647 @default.
- W2802893805 cites W2620346917 @default.
- W2802893805 cites W2739266613 @default.
- W2802893805 cites W2901608006 @default.
- W2802893805 cites W2964064024 @default.
- W2802893805 cites W3004540582 @default.
- W2802893805 cites W3145128584 @default.
- W2802893805 cites W2963040295 @default.
- W2802893805 hasPublicationYear "2017" @default.
- W2802893805 type Work @default.
- W2802893805 sameAs 2802893805 @default.
- W2802893805 citedByCount "0" @default.
- W2802893805 crossrefType "journal-article" @default.
- W2802893805 hasAuthorship W2802893805A5025222722 @default.
- W2802893805 hasConcept C120314980 @default.
- W2802893805 hasConcept C121684516 @default.
- W2802893805 hasConcept C138101251 @default.
- W2802893805 hasConcept C173608175 @default.
- W2802893805 hasConcept C199360897 @default.
- W2802893805 hasConcept C21442007 @default.
- W2802893805 hasConcept C2778514511 @default.
- W2802893805 hasConcept C2779851693 @default.
- W2802893805 hasConcept C41008148 @default.
- W2802893805 hasConcept C80444323 @default.
- W2802893805 hasConceptScore W2802893805C120314980 @default.
- W2802893805 hasConceptScore W2802893805C121684516 @default.
- W2802893805 hasConceptScore W2802893805C138101251 @default.
- W2802893805 hasConceptScore W2802893805C173608175 @default.
- W2802893805 hasConceptScore W2802893805C199360897 @default.
- W2802893805 hasConceptScore W2802893805C21442007 @default.
- W2802893805 hasConceptScore W2802893805C2778514511 @default.
- W2802893805 hasConceptScore W2802893805C2779851693 @default.
- W2802893805 hasConceptScore W2802893805C41008148 @default.
- W2802893805 hasConceptScore W2802893805C80444323 @default.
- W2802893805 hasLocation W28028938051 @default.
- W2802893805 hasOpenAccess W2802893805 @default.
- W2802893805 hasPrimaryLocation W28028938051 @default.
- W2802893805 hasRelatedWork W1511870327 @default.
- W2802893805 hasRelatedWork W1522735082 @default.