Energy
Consumption Reduction
(Green HPC) is the most critical challenge in modern supercomputing.
The
limiting factor for the next generation of supercomputers is not silicon
technology; it is the electricity bill. An Exascale system can consume 20-30
Megawatts—enough to power a small city. Reducing consumption is not just about
"saving the planet"; it is about preventing the operational costs (OpEx) from exceeding the cost of the hardware itself.
We tackle
this via "Watts for Science," ensuring that electricity
creates calculations, not waste heat.
Here is the
detailed breakdown of the reduction strategies (Cooling, DVFS, and Hardware),
followed by the downloadable Word file.
1. The
Metric: PUE and FLOPS/Watt
You cannot
manage what you do not measure.
2.
Facility Strategy: Cooling Optimization
Cooling
typically accounts for 30-40% of the energy bill in air-cooled data centers.
3.
Software Strategy: Energy-Aware Scheduling (DVFS)
This is the
"Low Hanging Fruit."
4. Hardware Strategy: Accelerators
5. Key Applications & Tools
|
Category |
Tool |
Usage |
|
Control |
Slurm Power Plugin |
Allows
the scheduler to cap the power usage of a job (e.g., "Run this job, but
do not exceed 200 Watts"). |
|
Optimization |
EAR (Energy Aware Runtime) |
An
automated framework that tunes CPU frequency dynamically based on the
application's real-time behavior. |
|
Monitoring |
IPMI / Redfish |
The
protocol used to read the power sensors inside the
Power Supply Units (PSU). |
|
Hardware |
CoolIT / Asetek |
Leaders
in Direct Liquid Cooling (DLC) hardware solutions. |