I/O
Process Analysis & Optimization is the surgical removal of the most common
bottleneck in modern science: the storage waiting game.
In HPC, compute power has outpaced storage speed by orders of
magnitude. A simulation might calculate the weather in 10 minutes but take 30
minutes to save the result to disk. This is a waste of expensive compute cycles.
Optimization
involves moving away from "Naive I/O" (standard C/Fortran write
statements) to "Structured I/O" (Middleware) that understands the
physics of the parallel file system.
Here is the
detailed breakdown of the analysis methodology, the pathological I/O patterns
to avoid, and the optimization strategies, followed by the downloadable Word
file.
1. The Methodology: Characterizing the I/O
Before
optimizing, you must profile. We need to know how the application talks
to the disk.
2. The
Patterns: The Good, The Bad, and The Ugly
HPC I/O
patterns generally fall into three categories. Optimization usually means
moving from the "Ugly" to the "Good."
A. The
Ugly: N-to-N (File Per Process)
B. The
Bad: Naive Shared File (N-to-1)
C. The
Good: Collective I/O & Aggregation
3. The
Solution: Middleware (Don't Reinvent the Wheel)
Scientific
codes should rarely use raw write() or fprintf()
statements. They should use
High-Level I/O Libraries.
4. Key Applications & Tools
|
Category |
Tool |
Usage |
|
Profiling |
Darshan |
Lightweight
I/O characterization tool. Shows exactly why I/O is slow (e.g., "90% of
your writes were < 1KB"). |
|
Recorder |
Captures
HDF5 and MPI-IO level calls to visualize the I/O hierarchy. |
|
|
Libraries |
HDF5 / NetCDF |
"Self-Describing"
data formats. They optimize the write pattern and make the data portable
across architectures. |
|
Benchmark |
IOR |
The
synthetic benchmark used to determine the theoretical maximum speed of
the storage, to see how far off your application is. |
|
Optimization |
Romio |
The
implementation of MPI-IO inside MPICH/OpenMPI.
Allows tuning of aggregator nodes via "hints" in the code. |