Matches in SemOpenAlex for { <https://semopenalex.org/work/W3206590765> ?p ?o ?g. }
- W3206590765 abstract "Fault tolerance poses a major challenge for future large-scale systems. Current research on fault tolerance has been principally focused on mitigating the impact of uncorrectable errors: errors that corrupt the state of the machine and require a restart from a known good state. However, correctable errors occur much more frequently than uncorrectable errors and may be even more common on future systems. Although an application can safely continue to execute when correctable errors occur, recovery from a correctable error requires the error to be corrected and, in most cases, information about its occurrence to be logged. The potential performance impact of these recovery activities has not been extensively studied in HPC. In this paper, we use simulation to examine the relationship between recovery from correctable errors and application performance for several important extreme-scale workloads. Our paper contains what is, to the best of our knowledge, the first detailed analysis of the impact of correctable errors on application performance. Our study shows that correctable errors can have significant impact on application performance for future systems. We also find that although the focus on correctable errors is focused on reducing failure rates, reducing the time required to log individual errors may have a greater impact on overheads at scale. Finally, this study outlines the error frequency and durations targets to keep correctable overheads similar to that of today’s systems. This paper provides critical analysis and insight into the overheads of correctable errors and provides practical advice to systems administrators and hardware designers in an effort to fine-tune performance to application and system characteristics." @default.
- W3206590765 created "2021-10-25" @default.
- W3206590765 creator A5018865548 @default.
- W3206590765 creator A5020783484 @default.
- W3206590765 creator A5036921385 @default.
- W3206590765 creator A5056569157 @default.
- W3206590765 creator A5071239015 @default.
- W3206590765 date "2021-09-01" @default.
- W3206590765 modified "2023-10-16" @default.
- W3206590765 title "Understanding the Effects of DRAM Correctable Error Logging at Scale" @default.
- W3206590765 cites W153293336 @default.
- W3206590765 cites W1558516248 @default.
- W3206590765 cites W1559781097 @default.
- W3206590765 cites W2009063162 @default.
- W3206590765 cites W2019465613 @default.
- W3206590765 cites W2021234574 @default.
- W3206590765 cites W2025024269 @default.
- W3206590765 cites W2029023406 @default.
- W3206590765 cites W2046124011 @default.
- W3206590765 cites W2062011830 @default.
- W3206590765 cites W2067617155 @default.
- W3206590765 cites W2072072075 @default.
- W3206590765 cites W2094127360 @default.
- W3206590765 cites W2101221989 @default.
- W3206590765 cites W2102061396 @default.
- W3206590765 cites W2114462749 @default.
- W3206590765 cites W2115331279 @default.
- W3206590765 cites W2156182786 @default.
- W3206590765 cites W2510192911 @default.
- W3206590765 cites W2524688414 @default.
- W3206590765 cites W2621404490 @default.
- W3206590765 cites W2751243270 @default.
- W3206590765 cites W2767346922 @default.
- W3206590765 cites W2782944865 @default.
- W3206590765 cites W2803636851 @default.
- W3206590765 cites W2903241786 @default.
- W3206590765 cites W3140689773 @default.
- W3206590765 cites W4248895726 @default.
- W3206590765 doi "https://doi.org/10.1109/cluster48925.2021.00060" @default.
- W3206590765 hasPublicationYear "2021" @default.
- W3206590765 type Work @default.
- W3206590765 sameAs 3206590765 @default.
- W3206590765 citedByCount "1" @default.
- W3206590765 countsByYear W32065907652022 @default.
- W3206590765 crossrefType "proceedings-article" @default.
- W3206590765 hasAuthorship W3206590765A5018865548 @default.
- W3206590765 hasAuthorship W3206590765A5020783484 @default.
- W3206590765 hasAuthorship W3206590765A5036921385 @default.
- W3206590765 hasAuthorship W3206590765A5056569157 @default.
- W3206590765 hasAuthorship W3206590765A5071239015 @default.
- W3206590765 hasBestOaLocation W32065907652 @default.
- W3206590765 hasConcept C111919701 @default.
- W3206590765 hasConcept C11413529 @default.
- W3206590765 hasConcept C120314980 @default.
- W3206590765 hasConcept C120665830 @default.
- W3206590765 hasConcept C121332964 @default.
- W3206590765 hasConcept C127413603 @default.
- W3206590765 hasConcept C192209626 @default.
- W3206590765 hasConcept C200601418 @default.
- W3206590765 hasConcept C2775928411 @default.
- W3206590765 hasConcept C2777904410 @default.
- W3206590765 hasConcept C2778755073 @default.
- W3206590765 hasConcept C41008148 @default.
- W3206590765 hasConcept C48103436 @default.
- W3206590765 hasConcept C62520636 @default.
- W3206590765 hasConcept C63540848 @default.
- W3206590765 hasConcept C7366592 @default.
- W3206590765 hasConcept C9390403 @default.
- W3206590765 hasConceptScore W3206590765C111919701 @default.
- W3206590765 hasConceptScore W3206590765C11413529 @default.
- W3206590765 hasConceptScore W3206590765C120314980 @default.
- W3206590765 hasConceptScore W3206590765C120665830 @default.
- W3206590765 hasConceptScore W3206590765C121332964 @default.
- W3206590765 hasConceptScore W3206590765C127413603 @default.
- W3206590765 hasConceptScore W3206590765C192209626 @default.
- W3206590765 hasConceptScore W3206590765C200601418 @default.
- W3206590765 hasConceptScore W3206590765C2775928411 @default.
- W3206590765 hasConceptScore W3206590765C2777904410 @default.
- W3206590765 hasConceptScore W3206590765C2778755073 @default.
- W3206590765 hasConceptScore W3206590765C41008148 @default.
- W3206590765 hasConceptScore W3206590765C48103436 @default.
- W3206590765 hasConceptScore W3206590765C62520636 @default.
- W3206590765 hasConceptScore W3206590765C63540848 @default.
- W3206590765 hasConceptScore W3206590765C7366592 @default.
- W3206590765 hasConceptScore W3206590765C9390403 @default.
- W3206590765 hasLocation W32065907651 @default.
- W3206590765 hasLocation W32065907652 @default.
- W3206590765 hasOpenAccess W3206590765 @default.
- W3206590765 hasPrimaryLocation W32065907651 @default.
- W3206590765 hasRelatedWork W2027487876 @default.
- W3206590765 hasRelatedWork W2124870959 @default.
- W3206590765 hasRelatedWork W2147458933 @default.
- W3206590765 hasRelatedWork W2170821048 @default.
- W3206590765 hasRelatedWork W2270308778 @default.
- W3206590765 hasRelatedWork W2351378856 @default.
- W3206590765 hasRelatedWork W2570560494 @default.
- W3206590765 hasRelatedWork W2978510736 @default.
- W3206590765 hasRelatedWork W3102446781 @default.
- W3206590765 hasRelatedWork W2147034415 @default.
- W3206590765 isParatext "false" @default.