Matches in SemOpenAlex for { <https://semopenalex.org/work/W2896096784> ?p ?o ?g. }
Showing items 1 to 64 of
64
with 100 items per page.
- W2896096784 abstract "Large-scale scientific computing facilities usually operate expensive HPC (High Performance Computing) systems, which have their computational and storage resources shared with the authorized users. On such shared resource systems, a continuous and stable operation is fundamental for providing the necessary hardware resources for the different user needs, including large-scale numerical simulations, which are the main targets of such large-scale facilities. For instance, the K computer installed at the R-CCS (RIKEN Center for Computational Science), in Kobe, Japan, enables the users to continuously run large jobs with tens of thousands of nodes (a maximum of 36,864 computational nodes) for up to 24 h, and a huge job by using the entire K computer system (82,944 computational nodes) for up to 8 h. Critical hardware failures can directly impact the affected job, and may also indirectly impact the scheduled subsequent jobs. To monitor the health condition of the K computer and its supporting facility, a large number of sensors has been providing a vast amount of measured data. Since it is almost impossible to analyze the entire data in real-time, these information has been stored as log data files for post-hoc analysis. In this work, we propose a visual analytics system which uses these big log data files to identify the possible causes of the critical hardware failures. We focused on the transfer entropy technique for quantifying the “causality” between the possible cause and the critical hardware failure. As a case study, we focused on the critical CPU failures, which required subsequent substitution, and utilized the log files corresponding to the measured temperatures of the cooling system such as air and water. We evaluated the usability of our proposed system, by conducting practical evaluations via a group of experts who directly works on the K computer system operation. The positive and negative feedbacks obtained from this evaluation will be considered for the future enhancements." @default.
- W2896096784 created "2018-10-26" @default.
- W2896096784 creator A5004756417 @default.
- W2896096784 creator A5066167505 @default.
- W2896096784 creator A5078567249 @default.
- W2896096784 creator A5084533941 @default.
- W2896096784 date "2018-01-01" @default.
- W2896096784 modified "2023-10-06" @default.
- W2896096784 title "A Transfer Entropy Based Visual Analytics System for Identifying Causality of Critical Hardware Failures Case Study: CPU Failures in the K Computer" @default.
- W2896096784 cites W1578105119 @default.
- W2896096784 cites W1804694390 @default.
- W2896096784 cites W1998221613 @default.
- W2896096784 cites W2041782669 @default.
- W2896096784 cites W2142812297 @default.
- W2896096784 cites W2768664685 @default.
- W2896096784 doi "https://doi.org/10.1007/978-981-13-2853-4_44" @default.
- W2896096784 hasPublicationYear "2018" @default.
- W2896096784 type Work @default.
- W2896096784 sameAs 2896096784 @default.
- W2896096784 citedByCount "0" @default.
- W2896096784 crossrefType "book-chapter" @default.
- W2896096784 hasAuthorship W2896096784A5004756417 @default.
- W2896096784 hasAuthorship W2896096784A5066167505 @default.
- W2896096784 hasAuthorship W2896096784A5078567249 @default.
- W2896096784 hasAuthorship W2896096784A5084533941 @default.
- W2896096784 hasConcept C111919701 @default.
- W2896096784 hasConcept C120314980 @default.
- W2896096784 hasConcept C41008148 @default.
- W2896096784 hasConcept C77088390 @default.
- W2896096784 hasConcept C79158427 @default.
- W2896096784 hasConcept C83283714 @default.
- W2896096784 hasConceptScore W2896096784C111919701 @default.
- W2896096784 hasConceptScore W2896096784C120314980 @default.
- W2896096784 hasConceptScore W2896096784C41008148 @default.
- W2896096784 hasConceptScore W2896096784C77088390 @default.
- W2896096784 hasConceptScore W2896096784C79158427 @default.
- W2896096784 hasConceptScore W2896096784C83283714 @default.
- W2896096784 hasLocation W28960967841 @default.
- W2896096784 hasOpenAccess W2896096784 @default.
- W2896096784 hasPrimaryLocation W28960967841 @default.
- W2896096784 hasRelatedWork W16186195 @default.
- W2896096784 hasRelatedWork W1692937897 @default.
- W2896096784 hasRelatedWork W2106686319 @default.
- W2896096784 hasRelatedWork W2112895457 @default.
- W2896096784 hasRelatedWork W2260621657 @default.
- W2896096784 hasRelatedWork W2290691139 @default.
- W2896096784 hasRelatedWork W2337343734 @default.
- W2896096784 hasRelatedWork W2590539595 @default.
- W2896096784 hasRelatedWork W2619571263 @default.
- W2896096784 hasRelatedWork W2758081967 @default.
- W2896096784 hasRelatedWork W2768725756 @default.
- W2896096784 hasRelatedWork W2785346526 @default.
- W2896096784 hasRelatedWork W2888535617 @default.
- W2896096784 hasRelatedWork W2904654489 @default.
- W2896096784 hasRelatedWork W2952354644 @default.
- W2896096784 hasRelatedWork W2965985637 @default.
- W2896096784 hasRelatedWork W3098750140 @default.
- W2896096784 hasRelatedWork W3104373930 @default.
- W2896096784 hasRelatedWork W3137382871 @default.
- W2896096784 hasRelatedWork W3196172036 @default.
- W2896096784 isParatext "false" @default.
- W2896096784 isRetracted "false" @default.
- W2896096784 magId "2896096784" @default.
- W2896096784 workType "book-chapter" @default.