User Tools

Site Tools


singularity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
singularity [2017/12/01 20:39]
pwolinsk created
singularity [2018/03/06 21:29] (current)
pwolinsk
Line 4: Line 4:
 Many pre-built container images are available for download in the singularity hub [[https://​singularity-hub.org/​]] and the docker hub [[https://​singularity-hub.org/​]] repositories. Many pre-built container images are available for download in the singularity hub [[https://​singularity-hub.org/​]] and the docker hub [[https://​singularity-hub.org/​]] repositories.
  
-=== Hello World Example ​===+=== Local Image Files ===
 Start an interactive job and load the singularity module Start an interactive job and load the singularity module
 <​code>​ <​code>​
Line 29: Line 29:
 </​code>​ </​code>​
  
 +=== Remote Repository Image Files ===
 Open a shell inside a container stored on singularity hub shub:/ /​vsoch/​hello-world Open a shell inside a container stored on singularity hub shub:/ /​vsoch/​hello-world
  
Line 43: Line 44:
 </​code>​ </​code>​
  
 +Open a shell inside a container pulled for the docker repository, and bind /scratch directory on Razor to /mnt directory inside the container
 +
 +<​code>​
 +compute1144:​pwolinsk:​$ ls /scratch |wc -l
 +5535
 +compute1144:​pwolinsk:​$ singularity shell --bind /​scratch:/​mnt docker://​ubuntu
 +Docker image path: index.docker.io/​library/​ubuntu:​latest
 +Cache folder set to /​gpfs_home/​pwolinsk/​.singularity/​docker
 +[5/5] |===================================| 100.0% ​
 +Creating container runtime...
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity ubuntu:​~>​ cat /etc/issue
 +Ubuntu 16.04.3 LTS \n \l
 +
 +Singularity ubuntu:​~>​ ls /mnt |wc -l
 +5535
 +Singularity ubuntu:​~> ​
 +
 +</​code>​
 +
 +By default the container will bind the users $HOME directory /tmp and the current working directory from teh host to the equivalent directories inside the container. ​ You can specify additional directories to bind using --bind <​local_dir>:<​container_dir>​ syntax.
 +
 +=== Tensorflow Example ===
 +Download the example models from git repository
 +<​code>​
 +compute1144:​pwolinsk:​$ git clone https://​github.com/​tensorflow/​models
 +Initialized empty Git repository in /​gpfs_home/​pwolinsk/​models/​.git/​
 +remote: Counting objects: 9158, done.
 +remote: Compressing objects: 100% (2/2), done.
 +remote: Total 9158 (delta 0), reused 0 (delta 0), pack-reused 9156
 +Receiving objects: 100% (9158/​9158),​ 293.18 MiB | 32.15 MiB/s, done.
 +Resolving deltas: 100% (5162/​5162),​ done.
 +compute1144:​pwolinsk:​$ cd models/​tutorials/​image/​mnist/​
 +</​code>​
 +
 +Start a shell in the prebuilt singularity container from within the directory containing the python training script
 +<​code>​
 +compute1144:​pwolinsk:/​models/​tutorials/​image/​mnist$ singularity shell /​share/​apps/​singularity/​images/​ubuntu-tensorflow-1.4.simg ​
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity ubuntu-tensorflow-1.4.simg:​~/​models/​tutorials/​image/​mnist>​ python convolutional.py ​
 +Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
 +Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
 +Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
 +Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2017-12-01 15:​05:​15.992688:​ I tensorflow/​core/​platform/​cpu_feature_guard.cc:​137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
 +Initialized!
 +Step 0 (epoch 0.00), 2.4 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +...
 +</​code>​
 +
 +Or instead of starting a shell inside the container use the **exec** command run the command inside the container:
 +<​code>​
 +compute1144:​pwolinsk:/​models/​tutorials/​image/​mnist$ singularity exec /​share/​apps/​singularity/​images/​ubuntu-tensorflow-1.4.simg python convolutional.py ​
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2017-12-01 15:​07:​56.905035:​ I tensorflow/​core/​platform/​cpu_feature_guard.cc:​137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
 +Initialized!
 +Step 0 (epoch 0.00), 2.2 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +...
 +</​code>​
 +
 +=== Tensorflow Example - GPU NVIDIA container ===
 +
 +Start an interactive job on a gpu node:
 +<​code>​
 +razor-l1:​pwolinsk:​$ qsub -I -q gpu16core
 +qsub: waiting for job 3927490.sched to start
 +qsub: job 3927490.sched ready
 +
 +Currently Loaded Modulefiles:​
 +  1) os/el6
 +compute0805:​pwolinsk:​$ ​
 +</​code>​
 +
 +Clone the tensorflow example models:
 +<​code>​
 +compute0805:​pwolinsk:​$ git clone https://​github.com/​tensorflow/​models
 +Initialized empty Git repository in /​home/​pwolinsk/​models/​.git/​
 +remote: Counting objects: 12884, done.
 +remote: Compressing objects: 100% (6/6), done.
 +remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876
 +Receiving objects: 100% (12884/​12884),​ 412.34 MiB | 27.24 MiB/s, done.
 +Resolving deltas: 100% (7276/​7276),​ done.
 +</​code>​
 +
 +Load the singularity module and start a shell within the docker container.
 +<​code>​
 +compute0805:​pwolinsk:​$ module load singularity
 +compute0805:​pwolinsk:​$ singularity shell  --nv /​share/​apps/​singularity/​images/​nvidia-tensorflow\:​18.01-py2-ahpcc.simg
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity nvidia-tensorflow:​18.01-py2-ahpcc.simg:​~>​ python ~/​models/​tutorials/​image/​mnist/​convolutional.py ​
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2018-03-06 20:​49:​48.176645:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 0 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​81:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421609:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 1 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​82:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421905:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1045] Device peer to peer matrix
 +2018-03-06 20:​49:​48.421966:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1051] DMA: 0 1 
 +2018-03-06 20:​49:​48.421981:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 0:   Y Y 
 +2018-03-06 20:​49:​48.421989:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 1:   Y Y 
 +2018-03-06 20:​49:​48.422010:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:​81:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +2018-03-06 20:​49:​48.422033:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:​82:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +Initialized!
 +Step 0 (epoch 0.00), 40.5 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +Step 100 (epoch 0.12), 48.3 ms
 +....
 +</​code>​
 +
 +While the Tensorflow job is running inside the Singularity container, ssh into the node and verify that the GPUS are in use:
 +
 +<​code>​
 +compute0805:​pwolinsk:​$ nvidia-smi ​
 +Tue Mar  6 14:49:50 2018       
 ++-----------------------------------------------------------------------------+
 +| NVIDIA-SMI 390.30 ​                ​Driver Version: 390.30 ​                   |
 +|-------------------------------+----------------------+----------------------+
 +| GPU  Name        Persistence-M| Bus-Id ​       Disp.A | Volatile Uncorr. ECC |
 +| Fan  Temp  Perf  Pwr:​Usage/​Cap| ​        ​Memory-Usage | GPU-Util ​ Compute M. |
 +|===============================+======================+======================|
 +|   ​0 ​ Tesla K40c          Off  | 00000000:​81:​00.0 Off |                    0 |
 +| 24%   ​48C ​   P0    66W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +|   ​1 ​ Tesla K40c          Off  | 00000000:​82:​00.0 Off |                    0 |
 +| 25%   ​50C ​   P0    62W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +                                                                               
 ++-----------------------------------------------------------------------------+
 +| Processes: ​                                                      GPU Memory |
 +|  GPU       ​PID ​  ​Type ​  ​Process name                             ​Usage ​     |
 +|=============================================================================|
 +|    0     ​12628 ​     C   ​python ​                                       68MiB |
 +|    1     ​12628 ​     C   ​python ​                                       68MiB |
 ++-----------------------------------------------------------------------------+
 +compute0805:​pwolinsk:​$ ​
 +
 +</​code>​
singularity.1512160777.txt.gz · Last modified: 2017/12/01 20:39 by pwolinsk