User Tools

Site Tools


singularity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
singularity [2017/12/01 20:51]
pwolinsk
singularity [2018/03/06 21:29] (current)
pwolinsk
Line 44: Line 44:
 </​code>​ </​code>​
  
-Open a shell inside a container pulled for the docker repository+Open a shell inside a container pulled for the docker repository, and bind /scratch directory on Razor to /mnt directory inside the container
  
 <​code>​ <​code>​
-compute1144:​pwolinsk:​$ singularity shell docker://​ubuntu+compute1144:​pwolinsk:​$ ls /scratch |wc -l 
 +5535 
 +compute1144:​pwolinsk:​$ singularity shell --bind /​scratch:/​mnt ​docker://​ubuntu
 Docker image path: index.docker.io/​library/​ubuntu:​latest Docker image path: index.docker.io/​library/​ubuntu:​latest
 Cache folder set to /​gpfs_home/​pwolinsk/​.singularity/​docker Cache folder set to /​gpfs_home/​pwolinsk/​.singularity/​docker
Line 57: Line 59:
 Ubuntu 16.04.3 LTS \n \l Ubuntu 16.04.3 LTS \n \l
  
 +Singularity ubuntu:​~>​ ls /mnt |wc -l
 +5535
 Singularity ubuntu:​~> ​ Singularity ubuntu:​~> ​
  
 </​code>​ </​code>​
  
 +By default the container will bind the users $HOME directory /tmp and the current working directory from teh host to the equivalent directories inside the container. ​ You can specify additional directories to bind using --bind <​local_dir>:<​container_dir>​ syntax.
 +
 +=== Tensorflow Example ===
 +Download the example models from git repository
 +<​code>​
 +compute1144:​pwolinsk:​$ git clone https://​github.com/​tensorflow/​models
 +Initialized empty Git repository in /​gpfs_home/​pwolinsk/​models/​.git/​
 +remote: Counting objects: 9158, done.
 +remote: Compressing objects: 100% (2/2), done.
 +remote: Total 9158 (delta 0), reused 0 (delta 0), pack-reused 9156
 +Receiving objects: 100% (9158/​9158),​ 293.18 MiB | 32.15 MiB/s, done.
 +Resolving deltas: 100% (5162/​5162),​ done.
 +compute1144:​pwolinsk:​$ cd models/​tutorials/​image/​mnist/​
 +</​code>​
 +
 +Start a shell in the prebuilt singularity container from within the directory containing the python training script
 +<​code>​
 +compute1144:​pwolinsk:/​models/​tutorials/​image/​mnist$ singularity shell /​share/​apps/​singularity/​images/​ubuntu-tensorflow-1.4.simg ​
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity ubuntu-tensorflow-1.4.simg:​~/​models/​tutorials/​image/​mnist>​ python convolutional.py ​
 +Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
 +Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
 +Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
 +Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2017-12-01 15:​05:​15.992688:​ I tensorflow/​core/​platform/​cpu_feature_guard.cc:​137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
 +Initialized!
 +Step 0 (epoch 0.00), 2.4 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +...
 +</​code>​
 +
 +Or instead of starting a shell inside the container use the **exec** command run the command inside the container:
 +<​code>​
 +compute1144:​pwolinsk:/​models/​tutorials/​image/​mnist$ singularity exec /​share/​apps/​singularity/​images/​ubuntu-tensorflow-1.4.simg python convolutional.py ​
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2017-12-01 15:​07:​56.905035:​ I tensorflow/​core/​platform/​cpu_feature_guard.cc:​137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
 +Initialized!
 +Step 0 (epoch 0.00), 2.2 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +...
 +</​code>​
 +
 +=== Tensorflow Example - GPU NVIDIA container ===
 +
 +Start an interactive job on a gpu node:
 +<​code>​
 +razor-l1:​pwolinsk:​$ qsub -I -q gpu16core
 +qsub: waiting for job 3927490.sched to start
 +qsub: job 3927490.sched ready
 +
 +Currently Loaded Modulefiles:​
 +  1) os/el6
 +compute0805:​pwolinsk:​$ ​
 +</​code>​
 +
 +Clone the tensorflow example models:
 +<​code>​
 +compute0805:​pwolinsk:​$ git clone https://​github.com/​tensorflow/​models
 +Initialized empty Git repository in /​home/​pwolinsk/​models/​.git/​
 +remote: Counting objects: 12884, done.
 +remote: Compressing objects: 100% (6/6), done.
 +remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876
 +Receiving objects: 100% (12884/​12884),​ 412.34 MiB | 27.24 MiB/s, done.
 +Resolving deltas: 100% (7276/​7276),​ done.
 +</​code>​
 +
 +Load the singularity module and start a shell within the docker container.
 +<​code>​
 +compute0805:​pwolinsk:​$ module load singularity
 +compute0805:​pwolinsk:​$ singularity shell  --nv /​share/​apps/​singularity/​images/​nvidia-tensorflow\:​18.01-py2-ahpcc.simg
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity nvidia-tensorflow:​18.01-py2-ahpcc.simg:​~>​ python ~/​models/​tutorials/​image/​mnist/​convolutional.py ​
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2018-03-06 20:​49:​48.176645:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 0 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​81:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421609:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 1 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​82:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421905:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1045] Device peer to peer matrix
 +2018-03-06 20:​49:​48.421966:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1051] DMA: 0 1 
 +2018-03-06 20:​49:​48.421981:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 0:   Y Y 
 +2018-03-06 20:​49:​48.421989:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 1:   Y Y 
 +2018-03-06 20:​49:​48.422010:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:​81:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +2018-03-06 20:​49:​48.422033:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:​82:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +Initialized!
 +Step 0 (epoch 0.00), 40.5 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +Step 100 (epoch 0.12), 48.3 ms
 +....
 +</​code>​
 +
 +While the Tensorflow job is running inside the Singularity container, ssh into the node and verify that the GPUS are in use:
 +
 +<​code>​
 +compute0805:​pwolinsk:​$ nvidia-smi ​
 +Tue Mar  6 14:49:50 2018       
 ++-----------------------------------------------------------------------------+
 +| NVIDIA-SMI 390.30 ​                ​Driver Version: 390.30 ​                   |
 +|-------------------------------+----------------------+----------------------+
 +| GPU  Name        Persistence-M| Bus-Id ​       Disp.A | Volatile Uncorr. ECC |
 +| Fan  Temp  Perf  Pwr:​Usage/​Cap| ​        ​Memory-Usage | GPU-Util ​ Compute M. |
 +|===============================+======================+======================|
 +|   ​0 ​ Tesla K40c          Off  | 00000000:​81:​00.0 Off |                    0 |
 +| 24%   ​48C ​   P0    66W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +|   ​1 ​ Tesla K40c          Off  | 00000000:​82:​00.0 Off |                    0 |
 +| 25%   ​50C ​   P0    62W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +                                                                               
 ++-----------------------------------------------------------------------------+
 +| Processes: ​                                                      GPU Memory |
 +|  GPU       ​PID ​  ​Type ​  ​Process name                             ​Usage ​     |
 +|=============================================================================|
 +|    0     ​12628 ​     C   ​python ​                                       68MiB |
 +|    1     ​12628 ​     C   ​python ​                                       68MiB |
 ++-----------------------------------------------------------------------------+
 +compute0805:​pwolinsk:​$ ​
 +
 +</​code>​
singularity.1512161497.txt.gz · Last modified: 2017/12/01 20:51 by pwolinsk