User Tools

Site Tools


singularity-apptainer

Apptainer/Singularity

Apptainer from https://apptainer.org/, formerly Sylabs Singularity, is a container system for HPC systems. In many respects it is similar to https://www.docker.com/, but Docker is too insecure for use with parallel file systems. Containers allow a specific distribution and version of linux and application software to be set in the container image while running on the HPC system. Containers are very useful for applications that were written on personal workstations (often Ubuntu Linux) and not designed for the multiple software versions with stable base software of HPC. They are also useful for GPU programs that often depend on Python modules with very particular compatibility requirements.

Docker images can be converted either explicitly or implicitly to apptainer images. If you invoke apptainer/singularity with a docker image, it will be implicitly converted to sif. On your local workstation, you can run docker/podman as root and modify docker images, which can then be transferred to the HPC system.

Apptainer/singularity can use two types of container images: “sandbox”, a directory usually holding tens of thousands of small files, and “sif”, a single relatively large file. The major differences are that sandbox images can be modified, while sif images are read-only disk images and cannot be modified. Sif images are much easier to deal with on a parallel file system that is optimized for large files. If you do not intend to modify the images, the simplest method is to pull docker images directly into apptainer, in which case they will be converted to a sif image.

Example 1

For an example: on your workstation as root, using (nearly identical) docker or podman commands:

podman pull docker.io/unfixable/deeplab2:latest
podman save --format=docker-archive docker.io/unfixable/deeplab2:latest deeplab2.tar

Then transfer the tar file to the HPC system and, in the directory of deeplab2.tar:

singularity build --sandbox deeplab2 docker-archive://deeplab2.tar
#or
singularity build deeplab2.sif docker-archive://deeplab2.tar

In this case the docker archive deeplab2.tar and the sandbox directory deeplab2 each use 14 GB, while the compressed file deeplab2.sif is 6 GB. In the same directory, we run apptainer/singularity. Since this is a gpu container and gpu node, we include --nv and a bind mount of our storage directory with --bind /scrfs/$USER/build:/mnt. You can also bind mount a scratch directory at some unused root-level directory such as /opt, and you may need to start it once to check for that directory. Here nvidia-smi shows that our GPU is found, df shows that our storage directory is mounted at /mnt/, and mkdir shows that the sif is a read-only file system.

$ singularity shell --nv --bind /scrfs/storage/build:/mnt deeplab2.sif
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (403) bind mounts
INFO:    underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (85) bind mounts
Apptainer> nvidia-smi
Tue Nov  8 13:43:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   35C    P0    38W / 250W |      0MiB / 32768MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Apptainer> df | grep scrfs
172.17.27.1@o2ib,172.17.27.21@o2ib:172.17.27.2@o2ib,172.17.27.22@o2ib:/scrfs 2455426280448 1292785568712 1137793418616  54% /mnt
Apptainer> mkdir /newfile
mkdir: cannot create directory ‘/newfile’: Read-only file system
Apptainer> exit
exit

If you retry with the sandbox file, they aren't very useful with NVidia containers, which don't function with a writeable disk, so it's usually better to use the more convenient sif format.

$ singularity shell --nv --bind /scrfs/storage/build:/mnt -w deeplab2/
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
WARNING: Skipping mount /bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
Apptainer> nvidia-smi
bash: nvidia-smi: command not found
Apptainer> exit
exit

Sandboxes can be usefully modified with non-NVidia containers. In our opinion you are better off modifying NVidia containers in docker then converting to apptainer.

$ singularity shell --bind /scrfs/storage/build:/mnt -w deeplab2/
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
Apptainer> touch /newdir
Apptainer> exit
exit
$ ls deeplab2/newdir
deeplab2/newdir

If you are happy with a read-only copy of the original Docker container, you can do it all in one step if singularity can find your image:

$ singularity pull docker://unfixable/deeplab2
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 864effccda8b skipped: already exists  
...a lot of output...
2022/11/08 16:16:47  info unpack layer: sha256:00d88365c70266e590b49ef5a03c6721030f90e1ba22e0cb332c7094840ed7ec
INFO:    Creating SIF file...
$ ls -alrt | tail -2
-rwxr-xr-x 1 build ahpcc 6680887296 Nov  8 16:18 deeplab2_latest.sif
drwxr-xr-x 4 build ahpcc       4096 Nov  8 16:18 .
$ singularity shell --nv deeplab2_latest.sif
NFO:    underlay of /usr/bin/nvidia-smi required more than 50 (403) bind mounts
INFO:    underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (85) bind mounts
Apptainer> nvidia-smi
Tue Nov  8 16:26:34 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
..etc..
Modules

We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed. We don't perceive any difference between them. We suggest not using the module and use the Apptainer RPM version (whose commands are either apptainer ... or singularity ...). If you load the singularity modules, they will come first in the path and the module commands will be used.

Example 2: NVidia nvcr.io

A login is required for NVidia containers, such as https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks Get an NVidia login (also useful for free GTC conferences, some training and software). Login and go to https://ngc.nvidia.com/setup/api-key. Generate the API key, then login like so and paste the key:

On your workstation as root using docker:

$ docker login nvcr.io
Username: $oauthtoken
Password: 

Login Succeeded

$ docker pull nvcr.io/nvidia/hpc-benchmarks:23.5
$ docker save nvcr.io/nvidia/hpc-benchmarks:23.5 /mystoragelocation/hpc-benchmarks.tar
$ scp /mystoragelocation/hpc-benchmarks.tar rfeynman@hpc-portal2.hpc.uark.edu:/storage/rfeynman/

As “rfeynman” on the cluster, use srun to get a cloud node, this takes a few minutes:

$ cd /storage/rfeynman
$ singularity build hpc-benchmarks.sif docker-archive://hpc-benchmarks.tar

Then for nvidia, get a gpu node with srun (and use shell –nv), or continue with cloud node (without –nv) for serial cpu computing:

$ singularity shell --nv --bind /scrfs/storage/rfeynman:/mnt hpc-benchmarks.sif

For comparison, we'll run HPL with the cpu on the same node in bare metal. At this memory size it runs about 2.4 TF for this dual 7543 AMD.

$ module load  gcc/7.3.1 mkl/19.0.5 impi/19.0.5
$ mpirun -np 16 -genv MKL_NUM_THREADS 4 ./xhpl
================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   62976 
NB     :     216 
PMAP   : Column-major process mapping
P      :       4 
Q      :       4 
PFACT  :   Crout 
NBMIN  :       4 
NDIV   :       2 
RFACT  :    Left 
BCAST  :   2ring 
DEPTH  :       3 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WC32L2C4       62976   216     4     4              69.49             2.3963e+03


================================================================================

Finished      1 tests with the following results:
              1 tests completed without checking,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

We'll copy this HPL.dat to apptainer and change the MPI grid from 4×4 to 1×1. In this measure, a single NVidia A100 GPU is a little over 5 times as fast as two AMD 7543 CPUs. The 40 GB memory of this GPU is close to full, though the CPU memory could hold more.

Apptainer> cd /tmp
Apptainer> vi HPL.dat
Apptainer> mpirun --bind-to none -np 1 /workspace/hpl.sh --dat ./HPL.dat --no-multinode

================================================================================
HPL-NVIDIA 23.5.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   62976 
NB     :     216 
PMAP   : Column-major process mapping
P      :       1 
Q      :       1 
PFACT  :   Crout 
NBMIN  :       4 
NDIV   :       2 
RFACT  :    Left 
BCAST  :   2ring 
DEPTH  :       3 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

================================================================================
T/V                N    NB     P     Q         Time          Gflops (   per GPU)
--------------------------------------------------------------------------------
WC02R2R4       62976   192     1     1        13.36       1.246e+04 ( 1.246e+04)
================================================================================

Finished      1 tests with the following results:
              1 tests completed without checking,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
Apptainer>
References and Finding Prebuilt Containers
singularity-apptainer.txt · Last modified: 2023/06/14 20:54 by root