=== Apptainer/Singularity === Apptainer from [[https://apptainer.org/]], formerly Sylabs Singularity, is a container system for HPC systems. In many respects it is similar to [[https://www.docker.com/]], but Docker is too insecure for use with parallel file systems. Containers allow a specific distribution and version of linux and application software to be set in the container image while running on the HPC system. Containers are very useful for applications that were written on personal workstations (often Ubuntu Linux) and not designed for the multiple software versions with stable base software of HPC. They are also useful for GPU programs that often depend on Python modules with very particular compatibility requirements. Docker images can be converted either explicitly or implicitly to apptainer images. If you invoke apptainer/singularity with a docker image, it will be implicitly converted to sif. On your local workstation, you can run docker/podman as root and modify docker images, which can then be transferred to the HPC system. Apptainer/singularity can use two types of container images: "sandbox", a directory usually holding tens of thousands of small files, and "sif", a single relatively large file. The major differences are that sandbox images can be modified, while sif images are read-only disk images and cannot be modified. Sif images are much easier to deal with on a parallel file system that is optimized for large files. If you do not intend to modify the images, the simplest method is to pull docker images directly into apptainer, in which case they will be converted to a sif image. ==== Example 1 ==== For an example: on your workstation as root, using (nearly identical) docker or podman commands: podman pull docker.io/unfixable/deeplab2:latest podman save --format=docker-archive docker.io/unfixable/deeplab2:latest deeplab2.tar Then transfer the tar file to the HPC system and, in the directory of deeplab2.tar: singularity build --sandbox deeplab2 docker-archive://deeplab2.tar #or singularity build deeplab2.sif docker-archive://deeplab2.tar In this case the docker archive deeplab2.tar and the sandbox directory deeplab2 each use 14 GB, while the compressed file deeplab2.sif is 6 GB. In the same directory, we run apptainer/singularity. Since this is a gpu container and gpu node, we include ``--nv`` and a bind mount of our storage directory with ``--bind /scrfs/$USER/build:/mnt``. You can also bind mount a scratch directory at some unused root-level directory such as ``/opt``, and you may need to start it once to check for that directory. Here ``nvidia-smi`` shows that our GPU is found, ``df`` shows that our storage directory is mounted at ``/mnt/``, and ``mkdir`` shows that the sif is a read-only file system. $ singularity shell --nv --bind /scrfs/storage/build:/mnt deeplab2.sif INFO: underlay of /usr/bin/nvidia-smi required more than 50 (403) bind mounts INFO: underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (85) bind mounts Apptainer> nvidia-smi Tue Nov 8 13:43:08 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 | | N/A 35C P0 38W / 250W | 0MiB / 32768MiB | 2% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Apptainer> df | grep scrfs 172.17.27.1@o2ib,172.17.27.21@o2ib:172.17.27.2@o2ib,172.17.27.22@o2ib:/scrfs 2455426280448 1292785568712 1137793418616 54% /mnt Apptainer> mkdir /newfile mkdir: cannot create directory ‘/newfile’: Read-only file system Apptainer> exit exit If you retry with the sandbox file, they aren't very useful with NVidia containers, which don't function with a writeable disk, so it's usually better to use the more convenient sif format. $ singularity shell --nv --bind /scrfs/storage/build:/mnt -w deeplab2/ WARNING: nv files may not be bound with --writable WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container WARNING: Skipping mount /bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container WARNING: Skipping mount /bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container WARNING: Skipping mount /bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container WARNING: Skipping mount /bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container WARNING: Skipping mount /bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container Apptainer> nvidia-smi bash: nvidia-smi: command not found Apptainer> exit exit Sandboxes can be usefully modified with non-NVidia containers. In our opinion you are better off modifying NVidia containers in docker then converting to apptainer. $ singularity shell --bind /scrfs/storage/build:/mnt -w deeplab2/ WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container Apptainer> touch /newdir Apptainer> exit exit $ ls deeplab2/newdir deeplab2/newdir If you are happy with a read-only copy of the original Docker container, you can do it all in one step if singularity can find your image: $ singularity pull docker://unfixable/deeplab2 INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob 864effccda8b skipped: already exists ...a lot of output... 2022/11/08 16:16:47 info unpack layer: sha256:00d88365c70266e590b49ef5a03c6721030f90e1ba22e0cb332c7094840ed7ec INFO: Creating SIF file... $ ls -alrt | tail -2 -rwxr-xr-x 1 build ahpcc 6680887296 Nov 8 16:18 deeplab2_latest.sif drwxr-xr-x 4 build ahpcc 4096 Nov 8 16:18 . $ singularity shell --nv deeplab2_latest.sif NFO: underlay of /usr/bin/nvidia-smi required more than 50 (403) bind mounts INFO: underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (85) bind mounts Apptainer> nvidia-smi Tue Nov 8 16:26:34 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ ..etc.. == Modules == We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed. We don't perceive any difference between them. We suggest not using the module and use the Apptainer RPM version (whose commands are either ``apptainer ...`` or ``singularity ...``). If you load the singularity modules, they will come first in the path and the module commands will be used. ==== Example 2: NVidia nvcr.io ==== A login is required for NVidia containers, such as [[https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks]] Get an NVidia login (also useful for free GTC conferences, some training and software). Login and go to [[https://ngc.nvidia.com/setup/api-key]]. Generate the API key, then login like so and paste the key: On your workstation as root using docker: $ docker login nvcr.io Username: $oauthtoken Password: Login Succeeded $ docker pull nvcr.io/nvidia/hpc-benchmarks:23.5 $ docker save nvcr.io/nvidia/hpc-benchmarks:23.5 /mystoragelocation/hpc-benchmarks.tar $ scp /mystoragelocation/hpc-benchmarks.tar rfeynman@hpc-portal2.hpc.uark.edu:/storage/rfeynman/ As "rfeynman" on the cluster, use srun to get a cloud node, this takes a few minutes: $ cd /storage/rfeynman $ singularity build hpc-benchmarks.sif docker-archive://hpc-benchmarks.tar Then for nvidia, get a gpu node with srun (and use shell --nv), or continue with cloud node (without --nv) for serial cpu computing: $ singularity shell --nv --bind /scrfs/storage/rfeynman:/mnt hpc-benchmarks.sif For comparison, we'll run HPL with the cpu on the same node in bare metal. At this memory size it runs about 2.4 TF for this dual 7543 AMD. $ module load gcc/7.3.1 mkl/19.0.5 impi/19.0.5 $ mpirun -np 16 -genv MKL_NUM_THREADS 4 ./xhpl ================================================================================ HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 62976 NB : 216 PMAP : Column-major process mapping P : 4 Q : 4 PFACT : Crout NBMIN : 4 NDIV : 2 RFACT : Left BCAST : 2ring DEPTH : 3 SWAP : Spread-roll (long) L1 : no-transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WC32L2C4 62976 216 4 4 69.49 2.3963e+03 ================================================================================ Finished 1 tests with the following results: 1 tests completed without checking, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================ We'll copy this HPL.dat to apptainer and change the MPI grid from 4x4 to 1x1. In this measure, a single NVidia A100 GPU is a little over 5 times as fast as two AMD 7543 CPUs. The 40 GB memory of this GPU is close to full, though the CPU memory could hold more. Apptainer> cd /tmp Apptainer> vi HPL.dat Apptainer> mpirun --bind-to none -np 1 /workspace/hpl.sh --dat ./HPL.dat --no-multinode ================================================================================ HPL-NVIDIA 23.5.0 -- NVIDIA accelerated HPL benchmark -- NVIDIA ================================================================================ HPLinpack 2.1 -- High-Performance Linpack benchmark -- October 26, 2012 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 62976 NB : 216 PMAP : Column-major process mapping P : 1 Q : 1 PFACT : Crout NBMIN : 4 NDIV : 2 RFACT : Left BCAST : 2ring DEPTH : 3 SWAP : Spread-roll (long) L1 : no-transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ================================================================================ T/V N NB P Q Time Gflops ( per GPU) -------------------------------------------------------------------------------- WC02R2R4 62976 192 1 1 13.36 1.246e+04 ( 1.246e+04) ================================================================================ Finished 1 tests with the following results: 1 tests completed without checking, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================ Apptainer> ==References and Finding Prebuilt Containers== [[https://hpc.nih.gov/apps/apptainer.html]] [[https://pawseysc.github.io/singularity-containers/12-singularity-intro/index.html]] [[https://www.nas.nasa.gov/hecc/support/kb/singularity-184/]] [[https://carpentries-incubator.github.io/singularity-introduction/aio/index.html]] [[https://www.osc.edu/book/export/html/4678]] There are relatively small collections of Apptainer/Singularity containers and quite large collections of Docker containers. Some are, with a typical search A mirror of the original Singularity hub, without a search engine [[https://github.com/con/shub/]] The current Sylabs hub [[https://cloud.sylabs.io/library/search?q=tensorflow]] The largest, Docker hub [[https://hub.docker.com/search?q=tensorflow]] NVidia [[https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=tensorflow]] RedHat [[https://quay.io/search?q=tensorflow]] AWS [[https://gallery.ecr.aws/?operatingSystems=Linux&searchTerm=tensorflow]] Github not exclusively containers [[https://github.com/search?q=tensorflow+container]] Gitlab not exclusively containers [[https://gitlab.com/search?search=tensorflow%2Bcontainer]] Biocontainers [[https://github.com/BioContainers/containers]]