Matches in SemOpenAlex for { <https://semopenalex.org/work/W2137816013> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W2137816013 endingPage "86" @default.
- W2137816013 startingPage "72" @default.
- W2137816013 abstract "In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs) that implement the Compute Unified Device Architecture (CUDA), a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104) and a central processing unit; the data type influence; the binary operator's influence.Keywords: GPU, Cuda, Kepler Architecture, Parallel Reduction, Thread Blocks(ProQuest: ... denotes formulae omitted.)IntroductionInitially, graphics processing units (GPUs) have been designed solely for graphic specific rendering purposes, but later, by the end of the 1990s, these processors became programmable at a hardware level. In November 2006, the NVidia company released the GeForce 8800 GTX, the first GPU to support the new CUDA (Compute Unified Device Architecture) by unifying both software and hardware components [1]. This new parallel programming model uses the huge parallel computational processing power of the GPU to solve complex processing tasks in a much more efficient manner than by using traditional processing methods based on central processing units (CPUs). This novel architecture offers several new components, specifically designed for alleviating the limitations of previous GPUs architectures and easing the processing of general-purpose computations through graphics processing units. Unlike previous GPU hardware architectures, the Compute Unified Device Architecture employs a unified implementation that makes it possible for the GPU to perform generalpurpose computations.In this context, the development of high performance optimization solutions using high-performance basic functional blocks (like the parallel reduction algorithmic function) leads to a tremendous improvement in the parallel data processing. In the scientific literature, this type of research is of great interest, many researchers studying the potential to optimize algorithmic functions using the CUDA architecture [2], [3], [4], [5], [6]. None of the works so far (to our best knowledge) has studied optimization solutions that scale in terms of resource allocation and performance on all the available CUDA architectures, especially on the latest Kepler CUDA architecture.The latest three CUDA-enabled graphic cards are GTX 280 from the Tesla GT200 architecture, GTX 480 from the Fermi GF100 architecture and GTX 680 from the Kepler GK104 architecture [7].The GTX 280 graphics processor, launched on 16 Jun 2008, is based on 65 nm fabrication technology, has 240 CUDA cores, 30 streaming multiprocessors and 1.4 billion of transistors, the processor clock runs at 1296 MHz, the graphics clock at 602 MHz. It comes with 1024 MB of memory in the standard configuration, having an effective clock of 1107 MHz, a 512-bit GDDR3 memory interface width and 141.7 GB/sec memory bandwidth. It has the maximum board power (TDP) of 236 Watts, a texture fill rate of 48.2 billion/sec, 80 texture units and 32 ROP units.The GTX 480 graphics processor, launched on 26 March 2010, is based on 40 nm fabrication technology, 480 CUDA cores, 15 streaming multiprocessors and 3.2 billion of transistors, the processor clock runs at 1401 MHz and the graphics clock at 700 MHz. It comes with 1536 MB of memory in the standard configuration, having an effective clock of 3700 MHz, a 384-bit GDDR5 memory interface width and 177. …" @default.
- W2137816013 created "2016-06-24" @default.
- W2137816013 creator A5072163832 @default.
- W2137816013 creator A5073879581 @default.
- W2137816013 creator A5088191183 @default.
- W2137816013 date "2012-07-01" @default.
- W2137816013 modified "2023-10-18" @default.
- W2137816013 title "Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units" @default.
- W2137816013 cites W1017718204 @default.
- W2137816013 cites W1595783387 @default.
- W2137816013 cites W164384110 @default.
- W2137816013 cites W2113841249 @default.
- W2137816013 cites W2118558147 @default.
- W2137816013 cites W2119547137 @default.
- W2137816013 cites W2168362280 @default.
- W2137816013 cites W2249637006 @default.
- W2137816013 cites W597455093 @default.
- W2137816013 hasPublicationYear "2012" @default.
- W2137816013 type Work @default.
- W2137816013 sameAs 2137816013 @default.
- W2137816013 citedByCount "2" @default.
- W2137816013 countsByYear W21378160132017 @default.
- W2137816013 crossrefType "posted-content" @default.
- W2137816013 hasAuthorship W2137816013A5072163832 @default.
- W2137816013 hasAuthorship W2137816013A5073879581 @default.
- W2137816013 hasAuthorship W2137816013A5088191183 @default.
- W2137816013 hasConcept C11413529 @default.
- W2137816013 hasConcept C121684516 @default.
- W2137816013 hasConcept C13280743 @default.
- W2137816013 hasConcept C138101251 @default.
- W2137816013 hasConcept C150846664 @default.
- W2137816013 hasConcept C173608175 @default.
- W2137816013 hasConcept C185798385 @default.
- W2137816013 hasConcept C199360897 @default.
- W2137816013 hasConcept C205649164 @default.
- W2137816013 hasConcept C205711294 @default.
- W2137816013 hasConcept C207963374 @default.
- W2137816013 hasConcept C21442007 @default.
- W2137816013 hasConcept C2777904410 @default.
- W2137816013 hasConcept C2778119891 @default.
- W2137816013 hasConcept C2779851693 @default.
- W2137816013 hasConcept C31972630 @default.
- W2137816013 hasConcept C41008148 @default.
- W2137816013 hasConcept C459310 @default.
- W2137816013 hasConcept C50630238 @default.
- W2137816013 hasConcept C86111242 @default.
- W2137816013 hasConceptScore W2137816013C11413529 @default.
- W2137816013 hasConceptScore W2137816013C121684516 @default.
- W2137816013 hasConceptScore W2137816013C13280743 @default.
- W2137816013 hasConceptScore W2137816013C138101251 @default.
- W2137816013 hasConceptScore W2137816013C150846664 @default.
- W2137816013 hasConceptScore W2137816013C173608175 @default.
- W2137816013 hasConceptScore W2137816013C185798385 @default.
- W2137816013 hasConceptScore W2137816013C199360897 @default.
- W2137816013 hasConceptScore W2137816013C205649164 @default.
- W2137816013 hasConceptScore W2137816013C205711294 @default.
- W2137816013 hasConceptScore W2137816013C207963374 @default.
- W2137816013 hasConceptScore W2137816013C21442007 @default.
- W2137816013 hasConceptScore W2137816013C2777904410 @default.
- W2137816013 hasConceptScore W2137816013C2778119891 @default.
- W2137816013 hasConceptScore W2137816013C2779851693 @default.
- W2137816013 hasConceptScore W2137816013C31972630 @default.
- W2137816013 hasConceptScore W2137816013C41008148 @default.
- W2137816013 hasConceptScore W2137816013C459310 @default.
- W2137816013 hasConceptScore W2137816013C50630238 @default.
- W2137816013 hasConceptScore W2137816013C86111242 @default.
- W2137816013 hasIssue "3" @default.
- W2137816013 hasLocation W21378160131 @default.
- W2137816013 hasOpenAccess W2137816013 @default.
- W2137816013 hasPrimaryLocation W21378160131 @default.
- W2137816013 hasRelatedWork W1264773896 @default.
- W2137816013 hasRelatedWork W1490353925 @default.
- W2137816013 hasRelatedWork W2014778103 @default.
- W2137816013 hasRelatedWork W2042187922 @default.
- W2137816013 hasRelatedWork W2076827372 @default.
- W2137816013 hasRelatedWork W2129347053 @default.
- W2137816013 hasRelatedWork W2129770968 @default.
- W2137816013 hasRelatedWork W2143257021 @default.
- W2137816013 hasRelatedWork W2163351858 @default.
- W2137816013 hasRelatedWork W2259146918 @default.
- W2137816013 hasRelatedWork W2265453641 @default.
- W2137816013 hasRelatedWork W2266981530 @default.
- W2137816013 hasRelatedWork W2267544373 @default.
- W2137816013 hasRelatedWork W2267964279 @default.
- W2137816013 hasRelatedWork W2270368596 @default.
- W2137816013 hasRelatedWork W2415012447 @default.
- W2137816013 hasRelatedWork W2612854686 @default.
- W2137816013 hasRelatedWork W2766001387 @default.
- W2137816013 hasRelatedWork W3101533771 @default.
- W2137816013 hasRelatedWork W848255087 @default.
- W2137816013 hasVolume "16" @default.
- W2137816013 isParatext "false" @default.
- W2137816013 isRetracted "false" @default.
- W2137816013 magId "2137816013" @default.
- W2137816013 workType "article" @default.