User Tools

Site Tools


containers

Singularity/Containers

Singularity http://singularity.lbl.gov/ is a software container system. It allows users to build and run entire scientific workflows, software and libraries using a specific distribution and version of Linux all packaged into a single image file. It is based on the Linux “chroot” command which allows users to switch the environment from the operating system installed on the host node to the one inside the singularity image file.
Many pre-built container images are available for download in the singularity hub https://singularity-hub.org/ and the docker hub https://singularity-hub.org/ repositories.

Local Image Files

Start an interactive job and load the singularity module

razor-l2:pwolinsk:$ qsub -I -q tiny12core -l walltime=1:00:00 -l nodes=1:ppn=12
qsub: waiting for job 3608596.sched to start
qsub: job 3608596.sched ready
compute1144:pwolinsk:$ module load singularity

Open a shell inside a prebuilt container stored locally on Razor in /share/apps/singularity/images/hello-world.simg

compute1144:pwolinsk:$ cat /etc/issue
CentOS release 6.8 (Final)
Kernel \r on an \m

compute1144:pwolinsk:$ singularity shell /share/apps/singularity/images/hello-world.simg 
Singularity: Invoking an interactive shell within container...

Singularity hello-world.simg:~> cat /etc/issue
Ubuntu 14.04.5 LTS \n \l

Singularity hello-world.simg:~> exit
exit
compute1144:pwolinsk:$ 

Remote Repository Image Files

Open a shell inside a container stored on singularity hub shub:/ /vsoch/hello-world

compute1144:pwolinsk:$ singularity shell shub://vsoch/hello-world
Progress |===================================| 100.0% 
Singularity: Invoking an interactive shell within container...

Singularity vsoch-hello-world-master.simg:~> cat /etc/issue
Ubuntu 14.04.5 LTS \n \l

Singularity vsoch-hello-world-master.simg:~> 

Open a shell inside a container pulled for the docker repository, and bind /scratch directory on Razor to /mnt directory inside the container

compute1144:pwolinsk:$ ls /scratch |wc -l
5535
compute1144:pwolinsk:$ singularity shell --bind /scratch:/mnt docker://ubuntu
Docker image path: index.docker.io/library/ubuntu:latest
Cache folder set to /gpfs_home/pwolinsk/.singularity/docker
[5/5] |===================================| 100.0% 
Creating container runtime...
Singularity: Invoking an interactive shell within container...

Singularity ubuntu:~> cat /etc/issue
Ubuntu 16.04.3 LTS \n \l

Singularity ubuntu:~> ls /mnt |wc -l
5535
Singularity ubuntu:~> 

By default the container will bind the users $HOME directory /tmp and the current working directory from teh host to the equivalent directories inside the container. You can specify additional directories to bind using –bind <localdir>:<containerdir> syntax.

Tensorflow Example

Download the example models from git repository

compute1144:pwolinsk:$ git clone https://github.com/tensorflow/models
Initialized empty Git repository in /gpfs_home/pwolinsk/models/.git/
remote: Counting objects: 9158, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 9158 (delta 0), reused 0 (delta 0), pack-reused 9156
Receiving objects: 100% (9158/9158), 293.18 MiB | 32.15 MiB/s, done.
Resolving deltas: 100% (5162/5162), done.
compute1144:pwolinsk:$ cd models/tutorials/image/mnist/

Start a shell in the prebuilt singularity container from within the directory containing the python training script

compute1144:pwolinsk:/models/tutorials/image/mnist$ singularity shell /share/apps/singularity/images/ubuntu-tensorflow-1.4.simg 
Singularity: Invoking an interactive shell within container...

Singularity ubuntu-tensorflow-1.4.simg:~/models/tutorials/image/mnist> python convolutional.py 
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-12-01 15:05:15.992688: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Initialized!
Step 0 (epoch 0.00), 2.4 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
...

Or instead of starting a shell inside the container use the exec command run the command inside the container:

compute1144:pwolinsk:/models/tutorials/image/mnist$ singularity exec /share/apps/singularity/images/ubuntu-tensorflow-1.4.simg python convolutional.py 
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-12-01 15:07:56.905035: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Initialized!
Step 0 (epoch 0.00), 2.2 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
...

Tensorflow Example - GPU NVIDIA container

Start an interactive job on a gpu node:

razor-l1:pwolinsk:$ qsub -I -q gpu16core
qsub: waiting for job 3927490.sched to start
qsub: job 3927490.sched ready

Currently Loaded Modulefiles:
  1) os/el6
compute0805:pwolinsk:$ 

Clone the tensorflow example models:

compute0805:pwolinsk:$ git clone https://github.com/tensorflow/models
Initialized empty Git repository in /home/pwolinsk/models/.git/
remote: Counting objects: 12884, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876
Receiving objects: 100% (12884/12884), 412.34 MiB | 27.24 MiB/s, done.
Resolving deltas: 100% (7276/7276), done.

Load the singularity module and start a shell within the docker container.

compute0805:pwolinsk:$ module load singularity
compute0805:pwolinsk:$ singularity shell  --nv /share/apps/singularity/images/nvidia-tensorflow\:18.01-py2-ahpcc.simg
Singularity: Invoking an interactive shell within container...

Singularity nvidia-tensorflow:18.01-py2-ahpcc.simg:~> python ~/models/tutorials/image/mnist/convolutional.py 
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2018-03-06 20:49:48.176645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:81:00.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-03-06 20:49:48.421609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:82:00.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-03-06 20:49:48.421905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-03-06 20:49:48.421966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2018-03-06 20:49:48.421981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y 
2018-03-06 20:49:48.421989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y 
2018-03-06 20:49:48.422010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:81:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
2018-03-06 20:49:48.422033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
Initialized!
Step 0 (epoch 0.00), 40.5 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 48.3 ms
....

While the Tensorflow job is running inside the Singularity container, ssh into the node and verify that the GPUS are in use:

compute0805:pwolinsk:$ nvidia-smi 
Tue Mar  6 14:49:50 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 00000000:81:00.0 Off |                    0 |
| 24%   48C    P0    66W / 235W |     79MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          Off  | 00000000:82:00.0 Off |                    0 |
| 25%   50C    P0    62W / 235W |     79MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     12628      C   python                                        68MiB |
|    1     12628      C   python                                        68MiB |
+-----------------------------------------------------------------------------+
compute0805:pwolinsk:$ 
containers.txt · Last modified: 2020/01/27 20:33 by root