Differences

This shows you the differences between two versions of the page.

--- singularity-apptainer [2022/11/08 23:01]
root created
+++ singularity-apptainer [2023/06/14 20:54] (current)
root
@@ Line 1: / Line 1: @@
 === Apptainer/Singularity ===
-Apptainer from apptainer.org [[https://hpc.nih.gov/apps/apptainer.html]], formerly Sylabs Singularity, is a container system for HPC systems.  In many respects it is similar to Docker [[https://www.docker.com/]], but docker is too insecure for use with parallel file systems.  Containers allow a specific distribution and version of linux and application software to be set in the container image while running on the HPC system. Containers are very useful for applications that were written on personal workstations (often Ubuntu Linux) and not designed for the multiple software versions with stable base software of HPC.
+Apptainer from [[https://apptainer.org/]], formerly Sylabs Singularity, is a container system for HPC systems.  In many respects it is similar to [[https://www.docker.com/]], but Docker is too insecure for use with parallel file systems.  Containers allow a specific distribution and version of linux and application software to be set in the container image while running on the HPC system. Containers are very useful for applications that were written on personal workstations (often Ubuntu Linux) and not designed for the multiple software versions with stable base software of HPC.  They are also useful for GPU programs that often depend on Python modules with very particular compatibility requirements.
 Docker images can be converted either explicitly or implicitly to apptainer images.  If you invoke apptainer/singularity with a docker image, it will be implicitly converted to sif.  On your local workstation, you can run docker/podman as root and modify docker images, which can then be transferred to the HPC system.
@@ Line 7: / Line 7: @@
 Apptainer/singularity can use two types of container images: "sandbox", a directory usually holding tens of thousands of small files, and "sif", a single relatively large file. The major differences are that sandbox images can be modified, while sif images are read-only disk images and cannot be modified. Sif images are much easier to deal with on a parallel file system that is optimized for large files.  If you do not intend to modify the images, the simplest method is to pull docker images directly into apptainer, in which case they will be converted to a sif image.
+==== Example 1 ====
 For an example: on your workstation as root, using (nearly identical) docker or podman commands:
@@ Line 22: / Line 25: @@
 </code>
-In this case the docker archive deeplab2.tar and the sandbox directory deeplab2 each use 14 GB, while the compressed file deeplab2.sif is 6 GB.  In the same directory, we run apptainer/singularity. Since this is a gpu container and gpu node, we include ``-nv`` and we include a bind mount of our storage directory with ``--bind /scrfs/$USER/build:/mnt``. ``nvidia-smi`` shows that our GPU is found, ``df`` shows that our storage directory is mounted at ``/mnt/``, and ``mkdir`` shows that the sif is a read-only file system.
+In this case the docker archive deeplab2.tar and the sandbox directory deeplab2 each use 14 GB, while the compressed file deeplab2.sif is 6 GB.  In the same directory, we run apptainer/singularity. Since this is a gpu container and gpu node, we include ``--nv`` and a bind mount of our storage directory with ``--bind /scrfs/$USER/build:/mnt``. You can also bind mount a scratch directory at some unused root-level directory such as ``/opt``, and you may need to start it once to check for that directory. Here ``nvidia-smi`` shows that our GPU is found, ``df`` shows that our storage directory is mounted at ``/mnt/``, and ``mkdir`` shows that the sif is a read-only file system.
 <code>
-c1715:build:/scr1$ singularity shell --nv --bind /scrfs/storage/build:/mnt deeplab2.sif
+$ singularity shell --nv --bind /scrfs/storage/build:/mnt deeplab2.sif
 INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (403) bind mounts
 INFO:    underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (85) bind mounts
@@ Line 111: / Line 114: @@
 </code>
-==Finding Prebuilt Containers==
+== Modules ==
+We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed.  We don't perceive any difference between them.  We suggest not using the module and use the Apptainer RPM version (whose commands are either ``apptainer ...`` or ``singularity ...``).  If you load the singularity modules, they will come first in the path and the module commands will be used.
+==== Example 2: NVidia nvcr.io ====
+A login is required for NVidia containers, such as  [[https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks]]
+Get an NVidia login (also useful for free GTC conferences, some training and software).  Login and go to [[https://ngc.nvidia.com/setup/api-key]].  Generate the API key, then login like so and paste the key:
+On your workstation as root using docker:
+<code>
+$ docker login nvcr.io
+Username: $oauthtoken
+Password:
+Login Succeeded
+$ docker pull nvcr.io/nvidia/hpc-benchmarks:23.5
+$ docker save nvcr.io/nvidia/hpc-benchmarks:23.5 /mystoragelocation/hpc-benchmarks.tar
+$ scp /mystoragelocation/hpc-benchmarks.tar rfeynman@hpc-portal2.hpc.uark.edu:/storage/rfeynman/
+</code>
+As "rfeynman" on the cluster, use srun to get a cloud node, this takes a few minutes:
+<code>
+$ cd /storage/rfeynman
+$ singularity build hpc-benchmarks.sif docker-archive://hpc-benchmarks.tar
+</code>
+Then for nvidia, get a gpu node with srun (and use shell --nv), or continue with cloud node (without --nv) for serial cpu computing:
+<code>
+$ singularity shell --nv --bind /scrfs/storage/rfeynman:/mnt hpc-benchmarks.sif
+</code>
+For comparison, we'll run HPL with the cpu on the same node in bare metal. At this memory size it runs about 2.4 TF for this dual 7543 AMD.
+<code>
+$ module load  gcc/7.3.1 mkl/19.0.5 impi/19.0.5
+$ mpirun -np 16 -genv MKL_NUM_THREADS 4 ./xhpl
+================================================================================
+HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
+Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
+Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
+Modified by Julien Langou, University of Colorado Denver
+================================================================================
+An explanation of the input/output parameters follows:
+T/V    : Wall time / encoded variant.
+N      : The order of the coefficient matrix A.
+NB     : The partitioning blocking factor.
+P      : The number of process rows.
+Q      : The number of process columns.
+Time   : Time in seconds to solve the linear system.
+Gflops : Rate of execution for solving the linear system.
+The following parameter values will be used:
+N      :   62976
+NB     :     216
+PMAP   : Column-major process mapping
+P      :       4
+Q      :       4
+PFACT  :   Crout
+NBMIN  :       4
+NDIV   :       2
+RFACT  :    Left
+BCAST  :   2ring
+DEPTH  :       3
+SWAP   : Spread-roll (long)
+L1     : no-transposed form
+U      : transposed form
+EQUIL  : yes
+ALIGN  : 8 double precision words
+================================================================================
+T/V                N    NB     P     Q               Time                 Gflops
+--------------------------------------------------------------------------------
+WC32L2C4       62976   216     4     4              69.49             2.3963e+03
+================================================================================
+Finished      1 tests with the following results:
+tests completed without checking,
+tests skipped because of illegal input values.
+--------------------------------------------------------------------------------
+End of Tests.
+================================================================================
+</code>
+We'll copy this HPL.dat to apptainer and change the MPI grid from 4x4 to 1x1.  In this measure, a single NVidia A100 GPU is a little over 5 times as fast as two AMD 7543 CPUs.  The 40 GB memory of this GPU is close to full, though the CPU memory could hold more.
+<code>
+Apptainer> cd /tmp
+Apptainer> vi HPL.dat
+Apptainer> mpirun --bind-to none -np 1 /workspace/hpl.sh --dat ./HPL.dat --no-multinode
+================================================================================
+HPL-NVIDIA 23.5.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
+================================================================================
+HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
+Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
+Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
+Modified by Julien Langou, University of Colorado Denver
+================================================================================
+An explanation of the input/output parameters follows:
+T/V    : Wall time / encoded variant.
+N      : The order of the coefficient matrix A.
+NB     : The partitioning blocking factor.
+P      : The number of process rows.
+Q      : The number of process columns.
+Time   : Time in seconds to solve the linear system.
+Gflops : Rate of execution for solving the linear system.
+The following parameter values will be used:
+N      :   62976
+NB     :     216
+PMAP   : Column-major process mapping
+P      :       1
+Q      :       1
+PFACT  :   Crout
+NBMIN  :       4
+NDIV   :       2
+RFACT  :    Left
+BCAST  :   2ring
+DEPTH  :       3
+SWAP   : Spread-roll (long)
+L1     : no-transposed form
+U      : transposed form
+EQUIL  : yes
+ALIGN  : 8 double precision words
+================================================================================
+T/V                N    NB     P     Q         Time          Gflops (   per GPU)
+--------------------------------------------------------------------------------
+WC02R2R4       62976   192     1     1        13.36       1.246e+04 ( 1.246e+04)
+================================================================================
+Finished      1 tests with the following results:
+tests completed without checking,
+tests skipped because of illegal input values.
+--------------------------------------------------------------------------------
+End of Tests.
+================================================================================
+Apptainer>
+</code>
+==References and Finding Prebuilt Containers==
+[[https://hpc.nih.gov/apps/apptainer.html]]
+[[https://pawseysc.github.io/singularity-containers/12-singularity-intro/index.html]]
+[[https://www.nas.nasa.gov/hecc/support/kb/singularity-184/]]
+[[https://carpentries-incubator.github.io/singularity-introduction/aio/index.html]]
+[[https://www.osc.edu/book/export/html/4678]]
 There are relatively small collections of Apptainer/Singularity containers and quite large collections of Docker containers.
@@ Line 122: / Line 288: @@
 The largest, Docker hub [[https://hub.docker.com/search?q=tensorflow]]
-RedHat's hub [[https://quay.io/search?q=tensorflow]]
 NVidia [[https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=tensorflow]]
-AWS  [[https://gallery.ecr.aws/?operatingSystems=Linux&searchTerm=tensorflow]]
+RedHat [[https://quay.io/search?q=tensorflow]]
+AWS [[https://gallery.ecr.aws/?operatingSystems=Linux&searchTerm=tensorflow]]
 Github not exclusively containers [[https://github.com/search?q=tensorflow+container]]
@@ Line 133: / Line 299: @@
 Gitlab not exclusively containers [[https://gitlab.com/search?search=tensorflow%2Bcontainer]]
-Biocontainers [[https://biocontainers.pro/registry?all_fields_search=blast&sort_order=asc&sort_field=default&offset=0&limit=30]]
+Biocontainers [[https://github.com/BioContainers/containers]]
-== Modules ==
-We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed.  We don't perceive any difference between them.  We suggest not using the module and use the Apptainer RPM version (whose commands are either ``apptainer ...`` or ``singularity ...``).  If you load the singularity modules, they will come first in the path and the module commands will be used.

Arkansas High Performace Computing Center [hpcwiki]

User Tools

Site Tools

Differences

Page Tools