Matches in SemOpenAlex for { <https://semopenalex.org/work/W23806513> ?p ?o ?g. }
- W23806513 abstract "We address the problem of building high-performance uniprocessor implementations of sparse triangular solve (SpTS) automatically. This computational kernel is often the bottleneck in a variety of scientific and engineering applications that require the direct solution of sparse linear systems. Performance tuning of SpTS—and sparse matrix kernels in general—is a tedious and time-consuming task, because performance depends on the complex interaction of many factors: the performance gap between processors and memory, the limits on the scope of compiler analyses and transformations, and the overhead of manipulating sparse data structures. Consequently, it is not unusual to see kernels such as SpTS run at under 10% of peak uniprocessor floating point performance. Our approach to automatic tuning of SpTS builds on prior experience with building tuning systems for sparse matrix-vector multiply (SpM×V) [21, 22, 40], and dense matrix kernels [8, 41]. In particular, we adopt the two-step methodology of previous approaches: (1) we identify and generate a set of reasonable candidate implementations, and (2) search this set for the fastest implementation by some combination of performance modeling and actually executing the implementations. In this paper, we consider the solution of the sparse lower triangular system Lx = y for a single dense vector x, given the lower triangular sparse matrix L and dense vector y. We refer to x as the solution vector and y as the right-hand side (RHS). Many of the lower triangular factors we have observed from sparse LU factorization have a large, dense triangle in the lower right-hand corner of the matrix; this trailing triangle can account for as much as 90% of the matrix non-zeros. Therefore, we consider both algorithmic and data structure reorganizations which partition the solve into a sparse phase and a dense phase. To the sparse phase, we adapt the register blocking optimization, previously proposed for sparse matrix-vector multiply (SpM×V) in the Sparsity system [21, 22], to the SpTS kernel; to the dense phase, we make judicious use of highly tuned BLAS routines by switching to a dense implementation (switch-to-dense optimization). We describe fully automatic hybrid off-line/on-line heuristics for selecting the key tuning parameters: the register block size and the point at which to use the dense algorithm. (See Section 2.) We then evaluate the performance of our optimized implementations relative to the fundamental limits on performance. Specifically, we first derive simple models of the upper bounds on the execution rate (Mflop/s) of our implementations. Using hardware counter data collected with the PAPI library [10], we then verify our models on three hardware platforms (Table 1) and a set of triangular factors from applications (Table 2). We observe that our optimized implementations can achieve 80% or more of these bounds; furthermore, we observe speedups of up to 1.8x when both register blocking and switch-to-dense optimizations are applied. We also present preliminary results confirming that our heuristics choose reasonable values for the tuning parameters. These results support our prior findings with SpM×V [40], suggesting two new directions for performance enhancements: (1) the use of higher-level matrix structures (e.g., matrix reordering and multiple register block sizes), and (2) optimizing kernels with more opportunities for data reuse (e.g., multiplication and solve with multiple vectors, multiplication of AA by a vector)." @default.
- W23806513 created "2016-06-24" @default.
- W23806513 creator A5003807368 @default.
- W23806513 creator A5005047790 @default.
- W23806513 creator A5054130257 @default.
- W23806513 creator A5076825233 @default.
- W23806513 creator A5091741596 @default.
- W23806513 date "2002-01-01" @default.
- W23806513 modified "2023-09-24" @default.
- W23806513 title "Automatic Performance Tuning and Analysis of Sparse Triangular Solve" @default.
- W23806513 cites W1488793182 @default.
- W23806513 cites W1509994495 @default.
- W23806513 cites W1518134045 @default.
- W23806513 cites W1518969538 @default.
- W23806513 cites W1523329529 @default.
- W23806513 cites W1564580340 @default.
- W23806513 cites W1575701986 @default.
- W23806513 cites W1581501197 @default.
- W23806513 cites W1583189038 @default.
- W23806513 cites W1595252915 @default.
- W23806513 cites W1608723498 @default.
- W23806513 cites W1640020431 @default.
- W23806513 cites W1653630692 @default.
- W23806513 cites W1948792255 @default.
- W23806513 cites W1964031104 @default.
- W23806513 cites W1972209410 @default.
- W23806513 cites W1985573397 @default.
- W23806513 cites W199075744 @default.
- W23806513 cites W1992107487 @default.
- W23806513 cites W2012715759 @default.
- W23806513 cites W2015656182 @default.
- W23806513 cites W2036603155 @default.
- W23806513 cites W2038205735 @default.
- W23806513 cites W2047656763 @default.
- W23806513 cites W2051086725 @default.
- W23806513 cites W2065705265 @default.
- W23806513 cites W2068863300 @default.
- W23806513 cites W2070299075 @default.
- W23806513 cites W2073017869 @default.
- W23806513 cites W2088160222 @default.
- W23806513 cites W2089592348 @default.
- W23806513 cites W2094171110 @default.
- W23806513 cites W2096070062 @default.
- W23806513 cites W2104120668 @default.
- W23806513 cites W2108315152 @default.
- W23806513 cites W2119609467 @default.
- W23806513 cites W2125467990 @default.
- W23806513 cites W2125955291 @default.
- W23806513 cites W2135653967 @default.
- W23806513 cites W2153024623 @default.
- W23806513 cites W2158737060 @default.
- W23806513 cites W2164139896 @default.
- W23806513 cites W2169282672 @default.
- W23806513 cites W39495715 @default.
- W23806513 cites W78547367 @default.
- W23806513 hasPublicationYear "2002" @default.
- W23806513 type Work @default.
- W23806513 sameAs 23806513 @default.
- W23806513 citedByCount "7" @default.
- W23806513 countsByYear W238065132020 @default.
- W23806513 countsByYear W238065132021 @default.
- W23806513 crossrefType "journal-article" @default.
- W23806513 hasAuthorship W23806513A5003807368 @default.
- W23806513 hasAuthorship W23806513A5005047790 @default.
- W23806513 hasAuthorship W23806513A5054130257 @default.
- W23806513 hasAuthorship W23806513A5076825233 @default.
- W23806513 hasAuthorship W23806513A5091741596 @default.
- W23806513 hasConcept C104528550 @default.
- W23806513 hasConcept C106487976 @default.
- W23806513 hasConcept C111919701 @default.
- W23806513 hasConcept C11413529 @default.
- W23806513 hasConcept C118615104 @default.
- W23806513 hasConcept C121332964 @default.
- W23806513 hasConcept C124066611 @default.
- W23806513 hasConcept C134978465 @default.
- W23806513 hasConcept C149635348 @default.
- W23806513 hasConcept C158693339 @default.
- W23806513 hasConcept C159985019 @default.
- W23806513 hasConcept C163716315 @default.
- W23806513 hasConcept C173608175 @default.
- W23806513 hasConcept C177264268 @default.
- W23806513 hasConcept C192562407 @default.
- W23806513 hasConcept C199360897 @default.
- W23806513 hasConcept C202444582 @default.
- W23806513 hasConcept C2779960059 @default.
- W23806513 hasConcept C2780513914 @default.
- W23806513 hasConcept C33923547 @default.
- W23806513 hasConcept C34727166 @default.
- W23806513 hasConcept C41008148 @default.
- W23806513 hasConcept C42355184 @default.
- W23806513 hasConcept C4822641 @default.
- W23806513 hasConcept C56372850 @default.
- W23806513 hasConcept C62520636 @default.
- W23806513 hasConcept C74193536 @default.
- W23806513 hasConcept C79189994 @default.
- W23806513 hasConcept C96442724 @default.
- W23806513 hasConceptScore W23806513C104528550 @default.
- W23806513 hasConceptScore W23806513C106487976 @default.
- W23806513 hasConceptScore W23806513C111919701 @default.
- W23806513 hasConceptScore W23806513C11413529 @default.