Matches in SemOpenAlex for { <https://semopenalex.org/work/W4311083612> ?p ?o ?g. }
Showing items 1 to 53 of
53
with 100 items per page.
- W4311083612 abstract "<strong class=journal-contentHeaderColor>Abstract.</strong> Surface gravity waves play a critical role in several processes, including mixing, coastal inundation, and surface fluxes. Despite the growing literature on the importance of ocean surface waves, windâwave processes have traditionally been excluded from Earth system models (ESMs) due to the high computational costs of running spectral wave models. The development of the Next Generation Ocean Model for the DOEâs (Department of Energy) E3SM (Energy Exascale Earth System Model) Project partly focuses on the inclusion of a wave model, WAVEWATCH III (WW3), into E3SM. WW3, which was originally developed for operational wave forecasting, needs to be computationally less expensive before it can be integrated into ESMs. To accomplish this, we take advantage of heterogeneous architectures at DOE leadership computing facilities and the increasing computing power of general-purpose graphics processing units (GPUs). This paper identifies the wave action source terms, <code>W3SRCEMD</code>, as the most computationally intensive module in WW3 and then accelerates them via GPU. Our experiments on two computing platforms, Kodiak (P100 GPU and Intel(R) Xeon(R) central processing unit, CPU, E5-2695 v4) and Summit (V100 GPU and IBM POWER9 CPU) show respective average speedups of 2<span class=inline-formula>Ã</span> and 4<span class=inline-formula>Ã</span> when mapping one Message Passing Interface (MPI) per GPU. An average speedup of 1.4<span class=inline-formula>Ã</span> was achieved using all 42 CPU cores and 6 GPUs on a Summit node (with 7 MPI ranks per GPU). However, the GPU speedup over the 42 CPU cores remains relatively unchanged (<span class=inline-formula>â¼</span>â1.3<span class=inline-formula>Ã</span>) even when using 4 MPI ranks per GPU (24 ranks in total) and 3 MPI ranks per GPU (18 ranks in total). This corresponds to a 35â%â40â% decrease in both simulation time and usage of resources. Due to too many local scalars and arrays in the <code>W3SRCEMD</code> subroutine and the huge WW3 memory requirement, GPU performance is currently limited by the data transfer bandwidth between the CPU and the GPU. Ideally, OpenACC routine directives could be used to further improve performance. However, <code>W3SRCEMD</code> would require significant code refactoring to make this possible. We also discuss how the trade-off between the occupancy, register, and latency affects the GPU performance of WW3." @default.
- W4311083612 created "2022-12-23" @default.
- W4311083612 creator A5089774949 @default.
- W4311083612 date "2022-12-01" @default.
- W4311083612 modified "2023-10-18" @default.
- W4311083612 title "Reply on RC1" @default.
- W4311083612 doi "https://doi.org/10.5194/gmd-2022-141-ac1" @default.
- W4311083612 hasPublicationYear "2022" @default.
- W4311083612 type Work @default.
- W4311083612 citedByCount "0" @default.
- W4311083612 crossrefType "peer-review" @default.
- W4311083612 hasAuthorship W4311083612A5089774949 @default.
- W4311083612 hasBestOaLocation W43110836121 @default.
- W4311083612 hasConcept C111368507 @default.
- W4311083612 hasConcept C127313418 @default.
- W4311083612 hasConcept C165082838 @default.
- W4311083612 hasConcept C173608175 @default.
- W4311083612 hasConcept C177264268 @default.
- W4311083612 hasConcept C199360897 @default.
- W4311083612 hasConcept C2776760102 @default.
- W4311083612 hasConcept C2779851693 @default.
- W4311083612 hasConcept C41008148 @default.
- W4311083612 hasConcept C459310 @default.
- W4311083612 hasConcept C83283714 @default.
- W4311083612 hasConcept C96972482 @default.
- W4311083612 hasConceptScore W4311083612C111368507 @default.
- W4311083612 hasConceptScore W4311083612C127313418 @default.
- W4311083612 hasConceptScore W4311083612C165082838 @default.
- W4311083612 hasConceptScore W4311083612C173608175 @default.
- W4311083612 hasConceptScore W4311083612C177264268 @default.
- W4311083612 hasConceptScore W4311083612C199360897 @default.
- W4311083612 hasConceptScore W4311083612C2776760102 @default.
- W4311083612 hasConceptScore W4311083612C2779851693 @default.
- W4311083612 hasConceptScore W4311083612C41008148 @default.
- W4311083612 hasConceptScore W4311083612C459310 @default.
- W4311083612 hasConceptScore W4311083612C83283714 @default.
- W4311083612 hasConceptScore W4311083612C96972482 @default.
- W4311083612 hasLocation W43110836121 @default.
- W4311083612 hasOpenAccess W4311083612 @default.
- W4311083612 hasPrimaryLocation W43110836121 @default.
- W4311083612 hasRelatedWork W1445273741 @default.
- W4311083612 hasRelatedWork W1466612602 @default.
- W4311083612 hasRelatedWork W1985658314 @default.
- W4311083612 hasRelatedWork W2077672232 @default.
- W4311083612 hasRelatedWork W2205865923 @default.
- W4311083612 hasRelatedWork W2891492949 @default.
- W4311083612 hasRelatedWork W2892892150 @default.
- W4311083612 hasRelatedWork W2949197156 @default.
- W4311083612 hasRelatedWork W4231763094 @default.
- W4311083612 hasRelatedWork W4301021456 @default.
- W4311083612 isParatext "false" @default.
- W4311083612 isRetracted "false" @default.
- W4311083612 workType "peer-review" @default.