User Tools

Site Tools


singularity-apptainer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
singularity-apptainer [2022/11/09 19:28]
root
singularity-apptainer [2023/06/14 20:54] (current)
root
Line 7: Line 7:
 Apptainer/singularity can use two types of container images: "sandbox", a directory usually holding tens of thousands of small files, and "sif", a single relatively large file. The major differences are that sandbox images can be modified, while sif images are read-only disk images and cannot be modified. Sif images are much easier to deal with on a parallel file system that is optimized for large files.  If you do not intend to modify the images, the simplest method is to pull docker images directly into apptainer, in which case they will be converted to a sif image. Apptainer/singularity can use two types of container images: "sandbox", a directory usually holding tens of thousands of small files, and "sif", a single relatively large file. The major differences are that sandbox images can be modified, while sif images are read-only disk images and cannot be modified. Sif images are much easier to deal with on a parallel file system that is optimized for large files.  If you do not intend to modify the images, the simplest method is to pull docker images directly into apptainer, in which case they will be converted to a sif image.
  
 +==== Example 1 ====
 +
 + 
 For an example: on your workstation as root, using (nearly identical) docker or podman commands: For an example: on your workstation as root, using (nearly identical) docker or podman commands:
  
Line 114: Line 117:
  
 We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed.  We don't perceive any difference between them.  We suggest not using the module and use the Apptainer RPM version (whose commands are either ``apptainer ...`` or ``singularity ...``).  If you load the singularity modules, they will come first in the path and the module commands will be used. We have versions of both Sylabs Singularity (most recently singularity/3.9.3) and Apptainer (default as RPM, no module) installed.  We don't perceive any difference between them.  We suggest not using the module and use the Apptainer RPM version (whose commands are either ``apptainer ...`` or ``singularity ...``).  If you load the singularity modules, they will come first in the path and the module commands will be used.
 +
 +==== Example 2: NVidia nvcr.io ====
 +
 +A login is required for NVidia containers, such as  [[https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks]]
 +Get an NVidia login (also useful for free GTC conferences, some training and software).  Login and go to [[https://ngc.nvidia.com/setup/api-key]].  Generate the API key, then login like so and paste the key:
 +
 +On your workstation as root using docker:
 +
 +<code>
 +$ docker login nvcr.io
 +Username: $oauthtoken
 +Password: 
 +
 +Login Succeeded
 +
 +$ docker pull nvcr.io/nvidia/hpc-benchmarks:23.5
 +$ docker save nvcr.io/nvidia/hpc-benchmarks:23.5 /mystoragelocation/hpc-benchmarks.tar
 +$ scp /mystoragelocation/hpc-benchmarks.tar rfeynman@hpc-portal2.hpc.uark.edu:/storage/rfeynman/
 +</code>
 +
 +As "rfeynman" on the cluster, use srun to get a cloud node, this takes a few minutes:
 +<code>
 +$ cd /storage/rfeynman
 +$ singularity build hpc-benchmarks.sif docker-archive://hpc-benchmarks.tar
 +</code>
 +
 +Then for nvidia, get a gpu node with srun (and use shell --nv), or continue with cloud node (without --nv) for serial cpu computing:
 +
 +<code>
 +$ singularity shell --nv --bind /scrfs/storage/rfeynman:/mnt hpc-benchmarks.sif
 +</code> 
 +
 +For comparison, we'll run HPL with the cpu on the same node in bare metal. At this memory size it runs about 2.4 TF for this dual 7543 AMD.
 +
 +<code>
 +$ module load  gcc/7.3.1 mkl/19.0.5 impi/19.0.5
 +$ mpirun -np 16 -genv MKL_NUM_THREADS 4 ./xhpl
 +================================================================================
 +HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
 +Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
 +Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
 +Modified by Julien Langou, University of Colorado Denver
 +================================================================================
 +
 +An explanation of the input/output parameters follows:
 +T/V    : Wall time / encoded variant.
 +N      : The order of the coefficient matrix A.
 +NB     : The partitioning blocking factor.
 +P      : The number of process rows.
 +Q      : The number of process columns.
 +Time   : Time in seconds to solve the linear system.
 +Gflops : Rate of execution for solving the linear system.
 +
 +The following parameter values will be used:
 +
 +N      :   62976 
 +NB     :     216 
 +PMAP   : Column-major process mapping
 +P      :       
 +Q      :       
 +PFACT  :   Crout 
 +NBMIN  :       
 +NDIV   :       
 +RFACT  :    Left 
 +BCAST  :   2ring 
 +DEPTH  :       
 +SWAP   : Spread-roll (long)
 +L1     : no-transposed form
 +U      : transposed form
 +EQUIL  : yes
 +ALIGN  : 8 double precision words
 +
 +================================================================================
 +T/V                N    NB                       Time                 Gflops
 +--------------------------------------------------------------------------------
 +WC32L2C4       62976   216                      69.49             2.3963e+03
 +
 +
 +================================================================================
 +
 +Finished      1 tests with the following results:
 +              1 tests completed without checking,
 +              0 tests skipped because of illegal input values.
 +--------------------------------------------------------------------------------
 +
 +End of Tests.
 +================================================================================
 +
 +</code>
 +
 +We'll copy this HPL.dat to apptainer and change the MPI grid from 4x4 to 1x1.  In this measure, a single NVidia A100 GPU is a little over 5 times as fast as two AMD 7543 CPUs.  The 40 GB memory of this GPU is close to full, though the CPU memory could hold more.
 +
 +<code>
 +Apptainer> cd /tmp
 +Apptainer> vi HPL.dat
 +Apptainer> mpirun --bind-to none -np 1 /workspace/hpl.sh --dat ./HPL.dat --no-multinode
 +
 +================================================================================
 +HPL-NVIDIA 23.5.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
 +================================================================================
 +HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
 +Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
 +Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
 +Modified by Julien Langou, University of Colorado Denver
 +================================================================================
 +
 +An explanation of the input/output parameters follows:
 +T/V    : Wall time / encoded variant.
 +N      : The order of the coefficient matrix A.
 +NB     : The partitioning blocking factor.
 +P      : The number of process rows.
 +Q      : The number of process columns.
 +Time   : Time in seconds to solve the linear system.
 +Gflops : Rate of execution for solving the linear system.
 +
 +The following parameter values will be used:
 +
 +N      :   62976 
 +NB     :     216 
 +PMAP   : Column-major process mapping
 +P      :       
 +Q      :       
 +PFACT  :   Crout 
 +NBMIN  :       
 +NDIV   :       
 +RFACT  :    Left 
 +BCAST  :   2ring 
 +DEPTH  :       
 +SWAP   : Spread-roll (long)
 +L1     : no-transposed form
 +U      : transposed form
 +EQUIL  : yes
 +ALIGN  : 8 double precision words
 +
 +================================================================================
 +T/V                N    NB                 Time          Gflops (   per GPU)
 +--------------------------------------------------------------------------------
 +WC02R2R4       62976   192                13.36       1.246e+04 ( 1.246e+04)
 +================================================================================
 +
 +Finished      1 tests with the following results:
 +              1 tests completed without checking,
 +              0 tests skipped because of illegal input values.
 +--------------------------------------------------------------------------------
 +
 +End of Tests.
 +================================================================================
 +Apptainer>
 +</code>
  
 ==References and Finding Prebuilt Containers== ==References and Finding Prebuilt Containers==
singularity-apptainer.1668022131.txt.gz · Last modified: 2022/11/09 19:28 by root