Performance Bottleneck Identification is the diagnostic medicine of Supercomputing.

In a cluster of 1,000 servers, if one component (e.g., the storage array) slows down by 5%, the entire 1,000-node simulation might slow down by 50% due to the "wait chain" effect. Identifying bottlenecks is not about guessing; it is a systematic process of isolation using the "Four Horsemen" model: CPU, Memory, I/O, and Network.

Here is the detailed breakdown of the identification strategy, the "Roofline" analysis, and the resolution techniques, followed by the downloadable Word file.

1. The "Four Horsemen" of Bottlenecks

Every performance issue falls into one of these four categories. You must identify which one is the "Limiting Factor."

  1. CPU Bound (Compute Limited):
  2. Memory Bound (Bandwidth Limited):
  3. I/O Bound (Storage Limited):
  4. Network Bound (Latency Limited):

2. The Diagnostic Strategy: "The Drill Down"

We start from the satellite view and zoom in to the microscope view.

3. The "Roofline" Model

This is the standard engineering chart used to identify bottlenecks.

4. Key Applications & Tools

Category

Tool

Usage

CPU/Memory Analysis

Intel VTune

The gold standard. Tells you exactly how many "Cache Misses" or "Branch Mispredictions" occurred.

Network Analysis

Vampir / Tau

Visualizes MPI traffic. Shows you a timeline of "Who is talking to whom" to find chatty nodes.

I/O Analysis

Darshan

A lightweight profiler that runs silently in the background. At the end of a job, it tells you: "You opened this file 1 million times."

System Check

BCC / eBPF

Linux kernel tracing tools to find deep OS latency (e.g., slow driver interrupts).