That Container Thing

High-Performance Computing w/ Docker & Kubernetes


Garrett Mills


Follow along:

Wait, who's the kid?

Hi, there.

  • Software Developer
  • Linux Proselytizer
  • KU CS Undergrad
  • CRMDA Lab Tech

So what's the deal?

Traditional HPC

  • Bare-Metal Nodes
  • Scheduler
  • Complex Environments
  • Kinda Expensive


Pro: it's the standard

Con: it's expensive


  • Containers
  • Scheduler
  • Portable Environments
  • Less Expensive*


Pro: cheaper & portable

Con: learning curve

Traditional HPC

Jobs run on bare-metal.

Environment is static.

No guarantee that each node is the same.

Also, it's kinda pricey...

24 Nodes


I'm going to mispronounce this.

So, what is it?

Production-grade container orchestration.

a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another

container (n.) -

Source: Docker Resources

Source: Docker Resources


Remember that environment file from earlier?

module purge
module load legacy
module load emacs/26.1
module use /panfs/pfs.local/work/crmda/tools/modules
module load Rstats/3.5.1
module load mplus
module load SAS
module load anaconda

This strategy requires these modules to have been added to all of the nodes by the cluster administrators.

FROM everpeace/kube-openmpi:2.1.2-16.04-0.7.0
RUN apt-get update

RUN apt-get install -y fish python3-pip apt-transport-https software-properties-common python-software-properties

RUN apt-key adv --keyserver --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

RUN add-apt-repository -y 'deb [arch=amd64,i386] xenial/'

RUN apt-get update
RUN apt-get install -y r-base
COPY ./Rinst.R /tmp
RUN R --vanilla -f /tmp/Rinst.R
RUN pip3 install numpy mpi4py
WORKDIR /clusterfs


Containers are "built" into images that can be used instantly.

This is awesome.

  • HPC Environment is abstracted away
  • Containers are easy to customize
  • Containers are predictable
  • Containers are reproducible

So back to Kubernetes.

"container orchestration" : Kubernetes


"scheduler" : Traditional HPC

Remember this? It still applies (mostly).

Node > Pod > Container

Pro: Containers are awesome.

Pro: Kubernetes can be cloud-hosted.

(Pricing can be assessed per-job.)

For example:


5 nodes x 4 cores/node

20 cores total

10 hour runtime


Con: It's not standard (yet).

Con: Activation energy.

Demos Available Online