ML and AI Workshops on HPC are unique because they must bridge two different cultures: the "Data Science" culture (interactive, Jupyter-based, pip-install everything) and the "HPC" culture (batch-based, Slurm, strict modules).

A successful workshop doesn't just teach Machine Learning; it teaches how to do ML without breaking the supercomputer.

Here is the detailed breakdown of the infrastructure, the 3-Day curriculum, and the hands-on labs, followed by the downloadable Word file.

1. The Workshop Infrastructure

You cannot run a workshop on the Login Node. You need a dedicated environment.

2. The Curriculum: From Interactive to Batch

Day 1: The Environment (Stop using pip install)

Day 2: Scaling Up (Distributed Data Parallel - DDP)

Day 3: The Frontier (LLMs & Model Parallelism)

3. Critical "Soft Skills" for AI Users

4. Key Applications & Tools

Category

Tool

Usage

Portal

Open OnDemand

The standard interface for workshops. Allows "Zero-Install" participation via Chrome/Firefox.

Container

Apptainer

The required format for running Docker containers safely on shared HPC systems.

Framework

PyTorch Lightning

The best teaching tool for scaling. It abstracts away the complex MPI engineering code so students focus on the ML.

Dataset

WebDataset

A library for high-performance I/O. Essential for teaching users how to feed GPUs fast enough.