Matches in SemOpenAlex for { <https://semopenalex.org/work/W2891885066> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W2891885066 abstract "Memory systems are signicant contributors to the overall power requirements, energy consumption, and the operational cost of large high-performance computing systems (HPC). Limitations of main memory systems in terms of latency, bandwidth and capacity, can signicantly affect the performance of HPC applications, and can have strong negative impact on system scalability. In addition, errors in the main memory system can have a strong impact on the reliability, accessibility and serviceability of large-scale clusters. This thesis studies capacity and reliability issues in modern memory systems for high-performance computing. The choice of main memory capacity is an important aspect of high-performance computing memory system design. This choice becomes in- creasingly important now that 3D-stacked memories are entering the market. Compared with conventional DIMMs, 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now. We analyze memory capacity requirements of important HPC benchmarks and applications. The study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step towards the adoption of this novel technology in the HPC domain. For HPC domains where large memory capacities are required, we propose scaling-in of HPC applications to reduce energy consumption and the running time of a batch of jobs. We also propose upgrading the per-node memory capacity, which enables greater degree of scaling-in and additional energy savings. Memory system is one of the main causes of hardware failures. In each generation, the DRAM chip density and the amount of the memory in systems increase, while the DRAM technology process is constantly shrinking. Therefore, we could expect that the DRAM failures could have a serious impact on the future-systems reliability. This thesis studies DRAM errors observed on a production HPC system during a period of two years. We clearly distinguish between two different approaches for the DRAM error analysis: categorical analysis and the analysis of error rates. The first approach compares the errors at the DIMM level and partitions the DIMMs into various categories, e.g. based on whether they did or did not experience an error. The second approach is to analyze the error rates, i.e., to present the total number of errors relative to other statistics, typically the number of MB-hours or the duration of the observation period. We show that although DRAM error analysis may be performed with both approaches, they are not interchangeable and can lead to completely different conclusions. We further demonstrate the importance of providing statistical significance and presenting results that have practical value and real-life use. We show that various widely-accepted approaches for DRAM error analysis may provide data that appear to support an interesting conclusion, but are not statistically signifcant, meaning that they could merely be the result of chance. We hope the study of methods for DRAM error analysis presented in this thesis will become a standard for any future analysis of DRAM errors in the field." @default.
- W2891885066 created "2018-09-27" @default.
- W2891885066 creator A5064988156 @default.
- W2891885066 date "2018-01-01" @default.
- W2891885066 modified "2023-09-27" @default.
- W2891885066 title "Memory systems for high-performance computing: the capacity and reliability implications" @default.
- W2891885066 hasPublicationYear "2018" @default.
- W2891885066 type Work @default.
- W2891885066 sameAs 2891885066 @default.
- W2891885066 citedByCount "0" @default.
- W2891885066 crossrefType "journal-article" @default.
- W2891885066 hasAuthorship W2891885066A5064988156 @default.
- W2891885066 hasConcept C111919701 @default.
- W2891885066 hasConcept C119599485 @default.
- W2891885066 hasConcept C120314980 @default.
- W2891885066 hasConcept C127413603 @default.
- W2891885066 hasConcept C149635348 @default.
- W2891885066 hasConcept C152890283 @default.
- W2891885066 hasConcept C171675096 @default.
- W2891885066 hasConcept C173608175 @default.
- W2891885066 hasConcept C176649486 @default.
- W2891885066 hasConcept C188045654 @default.
- W2891885066 hasConcept C2780165032 @default.
- W2891885066 hasConcept C41008148 @default.
- W2891885066 hasConcept C48044578 @default.
- W2891885066 hasConcept C57863822 @default.
- W2891885066 hasConcept C63511323 @default.
- W2891885066 hasConcept C83283714 @default.
- W2891885066 hasConcept C98986596 @default.
- W2891885066 hasConceptScore W2891885066C111919701 @default.
- W2891885066 hasConceptScore W2891885066C119599485 @default.
- W2891885066 hasConceptScore W2891885066C120314980 @default.
- W2891885066 hasConceptScore W2891885066C127413603 @default.
- W2891885066 hasConceptScore W2891885066C149635348 @default.
- W2891885066 hasConceptScore W2891885066C152890283 @default.
- W2891885066 hasConceptScore W2891885066C171675096 @default.
- W2891885066 hasConceptScore W2891885066C173608175 @default.
- W2891885066 hasConceptScore W2891885066C176649486 @default.
- W2891885066 hasConceptScore W2891885066C188045654 @default.
- W2891885066 hasConceptScore W2891885066C2780165032 @default.
- W2891885066 hasConceptScore W2891885066C41008148 @default.
- W2891885066 hasConceptScore W2891885066C48044578 @default.
- W2891885066 hasConceptScore W2891885066C57863822 @default.
- W2891885066 hasConceptScore W2891885066C63511323 @default.
- W2891885066 hasConceptScore W2891885066C83283714 @default.
- W2891885066 hasConceptScore W2891885066C98986596 @default.
- W2891885066 hasLocation W28918850661 @default.
- W2891885066 hasOpenAccess W2891885066 @default.
- W2891885066 hasPrimaryLocation W28918850661 @default.
- W2891885066 hasRelatedWork W118577270 @default.
- W2891885066 hasRelatedWork W1985409225 @default.
- W2891885066 hasRelatedWork W1991238016 @default.
- W2891885066 hasRelatedWork W2009063162 @default.
- W2891885066 hasRelatedWork W2023014860 @default.
- W2891885066 hasRelatedWork W2056222607 @default.
- W2891885066 hasRelatedWork W2126658609 @default.
- W2891885066 hasRelatedWork W2262516749 @default.
- W2891885066 hasRelatedWork W2344773683 @default.
- W2891885066 hasRelatedWork W2532349170 @default.
- W2891885066 hasRelatedWork W2567223979 @default.
- W2891885066 hasRelatedWork W2571413934 @default.
- W2891885066 hasRelatedWork W2732938856 @default.
- W2891885066 hasRelatedWork W2892052002 @default.
- W2891885066 hasRelatedWork W2896135073 @default.
- W2891885066 hasRelatedWork W3009207130 @default.
- W2891885066 hasRelatedWork W3164329980 @default.
- W2891885066 hasRelatedWork W3193248473 @default.
- W2891885066 hasRelatedWork W3197810343 @default.
- W2891885066 hasRelatedWork W2313648919 @default.
- W2891885066 isParatext "false" @default.
- W2891885066 isRetracted "false" @default.
- W2891885066 magId "2891885066" @default.
- W2891885066 workType "article" @default.