==== Singularity ====
Singularity [[http://singularity.lbl.gov/]] is a software container system. It allows users to build and run entire scientific workflows, software and libraries using a specific distribution and version of Linux all packaged into a single image file. It is based on the Linux "chroot" command which allows users to switch the environment from the operating system installed on the host node to the one inside the singularity image file.
Many pre-built container images are available for download in the singularity hub [[https://singularity-hub.org/]] and the docker hub [[https://singularity-hub.org/]] repositories.
=== Local Image Files ===
Start an interactive job and load the singularity module
razor-l2:pwolinsk:$ qsub -I -q tiny12core -l walltime=1:00:00 -l nodes=1:ppn=12
qsub: waiting for job 3608596.sched to start
qsub: job 3608596.sched ready
compute1144:pwolinsk:$ module load singularity
Open a shell inside a prebuilt container stored locally on Razor in /share/apps/singularity/images/hello-world.simg
compute1144:pwolinsk:$ cat /etc/issue
CentOS release 6.8 (Final)
Kernel \r on an \m
compute1144:pwolinsk:$ singularity shell /share/apps/singularity/images/hello-world.simg
Singularity: Invoking an interactive shell within container...
Singularity hello-world.simg:~> cat /etc/issue
Ubuntu 14.04.5 LTS \n \l
Singularity hello-world.simg:~> exit
exit
compute1144:pwolinsk:$
=== Remote Repository Image Files ===
Open a shell inside a container stored on singularity hub shub:/ /vsoch/hello-world
compute1144:pwolinsk:$ singularity shell shub://vsoch/hello-world
Progress |===================================| 100.0%
Singularity: Invoking an interactive shell within container...
Singularity vsoch-hello-world-master.simg:~> cat /etc/issue
Ubuntu 14.04.5 LTS \n \l
Singularity vsoch-hello-world-master.simg:~>
Open a shell inside a container pulled for the docker repository, and bind /scratch directory on Razor to /mnt directory inside the container
compute1144:pwolinsk:$ ls /scratch |wc -l
5535
compute1144:pwolinsk:$ singularity shell --bind /scratch:/mnt docker://ubuntu
Docker image path: index.docker.io/library/ubuntu:latest
Cache folder set to /gpfs_home/pwolinsk/.singularity/docker
[5/5] |===================================| 100.0%
Creating container runtime...
Singularity: Invoking an interactive shell within container...
Singularity ubuntu:~> cat /etc/issue
Ubuntu 16.04.3 LTS \n \l
Singularity ubuntu:~> ls /mnt |wc -l
5535
Singularity ubuntu:~>
By default the container will bind the users $HOME directory /tmp and the current working directory from teh host to the equivalent directories inside the container. You can specify additional directories to bind using --bind : syntax.
=== Tensorflow Example ===
Download the example models from git repository
compute1144:pwolinsk:$ git clone https://github.com/tensorflow/models
Initialized empty Git repository in /gpfs_home/pwolinsk/models/.git/
remote: Counting objects: 9158, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 9158 (delta 0), reused 0 (delta 0), pack-reused 9156
Receiving objects: 100% (9158/9158), 293.18 MiB | 32.15 MiB/s, done.
Resolving deltas: 100% (5162/5162), done.
compute1144:pwolinsk:$ cd models/tutorials/image/mnist/
Start a shell in the prebuilt singularity container from within the directory containing the python training script
compute1144:pwolinsk:/models/tutorials/image/mnist$ singularity shell /share/apps/singularity/images/ubuntu-tensorflow-1.4.simg
Singularity: Invoking an interactive shell within container...
Singularity ubuntu-tensorflow-1.4.simg:~/models/tutorials/image/mnist> python convolutional.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-12-01 15:05:15.992688: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Initialized!
Step 0 (epoch 0.00), 2.4 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
...
Or instead of starting a shell inside the container use the **exec** command run the command inside the container:
compute1144:pwolinsk:/models/tutorials/image/mnist$ singularity exec /share/apps/singularity/images/ubuntu-tensorflow-1.4.simg python convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-12-01 15:07:56.905035: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
Initialized!
Step 0 (epoch 0.00), 2.2 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
...
=== Tensorflow Example - GPU NVIDIA container ===
Start an interactive job on a gpu node:
razor-l1:pwolinsk:$ qsub -I -q gpu16core
qsub: waiting for job 3927490.sched to start
qsub: job 3927490.sched ready
Currently Loaded Modulefiles:
1) os/el6
compute0805:pwolinsk:$
Clone the tensorflow example models:
compute0805:pwolinsk:$ git clone https://github.com/tensorflow/models
Initialized empty Git repository in /home/pwolinsk/models/.git/
remote: Counting objects: 12884, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876
Receiving objects: 100% (12884/12884), 412.34 MiB | 27.24 MiB/s, done.
Resolving deltas: 100% (7276/7276), done.
Load the singularity module and start a shell within the docker container.
compute0805:pwolinsk:$ module load singularity
compute0805:pwolinsk:$ singularity shell --nv /share/apps/singularity/images/nvidia-tensorflow\:18.01-py2-ahpcc.simg
Singularity: Invoking an interactive shell within container...
Singularity nvidia-tensorflow:18.01-py2-ahpcc.simg:~> python ~/models/tutorials/image/mnist/convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2018-03-06 20:49:48.176645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:81:00.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-03-06 20:49:48.421609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:82:00.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-03-06 20:49:48.421905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-03-06 20:49:48.421966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2018-03-06 20:49:48.421981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y
2018-03-06 20:49:48.421989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y
2018-03-06 20:49:48.422010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:81:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
2018-03-06 20:49:48.422033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
Initialized!
Step 0 (epoch 0.00), 40.5 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 48.3 ms
....
While the Tensorflow job is running inside the Singularity container, ssh into the node and verify that the GPUS are in use:
compute0805:pwolinsk:$ nvidia-smi
Tue Mar 6 14:49:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 00000000:81:00.0 Off | 0 |
| 24% 48C P0 66W / 235W | 79MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:82:00.0 Off | 0 |
| 25% 50C P0 62W / 235W | 79MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12628 C python 68MiB |
| 1 12628 C python 68MiB |
+-----------------------------------------------------------------------------+
compute0805:pwolinsk:$