Matches in SemOpenAlex for { <https://semopenalex.org/work/W2912509813> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W2912509813 abstract "Heterogeneous CPU-GPU systems have emerged as a power-efficient platform for high performance parallelization of the applications. However, effectively exploiting these architectures faces a number of challenges including differences in the programming models of the CPU (MIMD) and the GPU (SIMD), GPU memory constraints, and comparatively low communication bandwidth between the CPU and GPU. As a consequence, high performance execution of applications on these platforms requires designing new adaptive parallelizing methods. In this thesis, first we explore embarrassingly parallel applications where tasks have no inter-dependencies. Although the massive processing power of GPUs provides an attractive opportunity for high-performance execution of embarrassingly parallel tasks on CPU-GPU systems, minimized execution time can only be obtained by optimally distributing the tasks between the processors. In contemporary CPU-GPU systems, the scheduler cannot decide about the appropriate rate distribution. Hence it requires high programming effort to manually divide the tasks among the processors. Herein, we design and implement a new dynamic scheduling heuristic to minimize the execution time of embarrassingly parallel applications on a heterogeneous CPU-GPU system. The scheduler is integrated into a scheduling framework that provides pre-implemented automated scheduling modules, liberating the user from the complexities of scheduling details. The experimental results show that our scheduling approach achieves better to similar performance compared to some of the scheduling algorithms proposed for CPU-GPU systems. We then investigate task dependent applications, where the tasks have data dependencies. The computational tasks and their communication patterns are expressed by a task interaction graph. Scheduling of the task interaction graph on a cluster can be done by first partitioning the graph into a set of computationally balanced partitions in such a way that the communication cost among the partitions is minimized, and subsequently mapping the partitions onto physical processors. Aside from scheduling, graph partitioning is a common computation phase in many application domains, including social network analysis, data mining, and VLSI design. However, irregular and data-dependent graph partitioning sub-tasks pose multiple challenges for efficient GPU utilization, which favors regularity. We design and implement a multilevel graph partitioner on a heterogeneous CPU-GPU system that takes advantage of the high parallel processing power of GPUs by executing the computation-intensive parts of the partitioning sub-tasks on the GPU and assigning the parts with less parallelism to the CPU. Our partitioner aims to overcome some of the challenges arising due to the irregular nature of the algorithm, and memory constraints on GPUs. We present a lock-free scheme since fine-grained synchronization among thousands of GPU threads imposes too high a performance overhead. Experimental results demonstrate that our partitioner outperforms serial and parallel MPI-based partitioners. It performs similar to shared-memory CPU-based parallel graph partitioner. To optimize the graph partitioner performance, we describe an effective and methodological approach to enable a GPU-based multi-level graph partitioning that is tailored specifically for the SIMD architecture. Our solution avoids thread divergence and balances the load over GPU threads by dynamically assigning an appropriate number of threads to process the graph vertices and irregular sized neighbors. Our optimized design is autonomous as all the steps are carried out by the GPU with minimal CPU interference. We show that this design outperforms CPU-based parallel graph partitioner. Finally, we apply some of our partitioning techniques to another graph processing algorithm, minimum spanning tree (MST), that exhibits load imbalance characteristics. We show that extending these techniques helps in achieving a high performance implementation of MST on the GPU." @default.
- W2912509813 created "2019-02-21" @default.
- W2912509813 creator A5016385511 @default.
- W2912509813 date "2018-07-09" @default.
- W2912509813 modified "2023-09-24" @default.
- W2912509813 title "Efficient Scheduling and High-Performance Graph Partitioning on Heterogeneous CPU-GPU Systems" @default.
- W2912509813 hasPublicationYear "2018" @default.
- W2912509813 type Work @default.
- W2912509813 sameAs 2912509813 @default.
- W2912509813 citedByCount "0" @default.
- W2912509813 crossrefType "dissertation" @default.
- W2912509813 hasAuthorship W2912509813A5016385511 @default.
- W2912509813 hasConcept C111919701 @default.
- W2912509813 hasConcept C120314980 @default.
- W2912509813 hasConcept C120373497 @default.
- W2912509813 hasConcept C126909462 @default.
- W2912509813 hasConcept C162324750 @default.
- W2912509813 hasConcept C173608175 @default.
- W2912509813 hasConcept C180613757 @default.
- W2912509813 hasConcept C206729178 @default.
- W2912509813 hasConcept C21032095 @default.
- W2912509813 hasConcept C21547014 @default.
- W2912509813 hasConcept C41008148 @default.
- W2912509813 hasConcept C49154492 @default.
- W2912509813 hasConceptScore W2912509813C111919701 @default.
- W2912509813 hasConceptScore W2912509813C120314980 @default.
- W2912509813 hasConceptScore W2912509813C120373497 @default.
- W2912509813 hasConceptScore W2912509813C126909462 @default.
- W2912509813 hasConceptScore W2912509813C162324750 @default.
- W2912509813 hasConceptScore W2912509813C173608175 @default.
- W2912509813 hasConceptScore W2912509813C180613757 @default.
- W2912509813 hasConceptScore W2912509813C206729178 @default.
- W2912509813 hasConceptScore W2912509813C21032095 @default.
- W2912509813 hasConceptScore W2912509813C21547014 @default.
- W2912509813 hasConceptScore W2912509813C41008148 @default.
- W2912509813 hasConceptScore W2912509813C49154492 @default.
- W2912509813 hasLocation W29125098131 @default.
- W2912509813 hasOpenAccess W2912509813 @default.
- W2912509813 hasPrimaryLocation W29125098131 @default.
- W2912509813 hasRelatedWork W16565456 @default.
- W2912509813 hasRelatedWork W1765637227 @default.
- W2912509813 hasRelatedWork W1922576887 @default.
- W2912509813 hasRelatedWork W2049972134 @default.
- W2912509813 hasRelatedWork W2090958089 @default.
- W2912509813 hasRelatedWork W2135496605 @default.
- W2912509813 hasRelatedWork W2156519507 @default.
- W2912509813 hasRelatedWork W2159911284 @default.
- W2912509813 hasRelatedWork W2430693926 @default.
- W2912509813 hasRelatedWork W2533980075 @default.
- W2912509813 hasRelatedWork W2534671040 @default.
- W2912509813 hasRelatedWork W2564407003 @default.
- W2912509813 hasRelatedWork W2762485337 @default.
- W2912509813 hasRelatedWork W2885486526 @default.
- W2912509813 hasRelatedWork W3090452365 @default.
- W2912509813 hasRelatedWork W3116115872 @default.
- W2912509813 hasRelatedWork W3153589783 @default.
- W2912509813 hasRelatedWork W3164669718 @default.
- W2912509813 hasRelatedWork W3165602921 @default.
- W2912509813 hasRelatedWork W3176561082 @default.
- W2912509813 isParatext "false" @default.
- W2912509813 isRetracted "false" @default.
- W2912509813 magId "2912509813" @default.
- W2912509813 workType "dissertation" @default.