iDVL is funded by GRI to monitor HPCC via the visual interface
Dec 2, 2018
Monitoring Health Status of High Performance Computing Systems
Monitoring data centers is challenging due to their size, complexity, and dynamic nature. This project proposes a visual approach for situational awareness and health monitoring of high-performance computing systems. The visualization requirements are expanded on the following dimensions: 1) High performance computing spatial layout, 2) Temporal domain (historical vs. real-time tracking), and 3) System health services such as temperature, CPU load, memory usage, fan speed, and power consumption. We demonstrate the developed prototype on a medium-scale data center of 10 racks and 467 hosts.
The work was developed in collaboration with both industrial and acadamic domain experts:
- Dr. Yong Chen, Department of Computer Science, Texas Tech University.
- Dr. Alan Sill, Managing Director of HPCC; Co-Director, NSF CAC.