User Tools

Site Tools


singularity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
singularity [2017/12/01 21:16]
pwolinsk
singularity [2018/03/06 21:29]
pwolinsk
Line 119: Line 119:
 </​code>​ </​code>​
  
 +=== Tensorflow Example - GPU NVIDIA container ===
  
 +Start an interactive job on a gpu node:
 +<​code>​
 +razor-l1:​pwolinsk:​$ qsub -I -q gpu16core
 +qsub: waiting for job 3927490.sched to start
 +qsub: job 3927490.sched ready
 +
 +Currently Loaded Modulefiles:​
 +  1) os/el6
 +compute0805:​pwolinsk:​$ ​
 +</​code>​
 +
 +Clone the tensorflow example models:
 +<​code>​
 +compute0805:​pwolinsk:​$ git clone https://​github.com/​tensorflow/​models
 +Initialized empty Git repository in /​home/​pwolinsk/​models/​.git/​
 +remote: Counting objects: 12884, done.
 +remote: Compressing objects: 100% (6/6), done.
 +remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876
 +Receiving objects: 100% (12884/​12884),​ 412.34 MiB | 27.24 MiB/s, done.
 +Resolving deltas: 100% (7276/​7276),​ done.
 +</​code>​
 +
 +Load the singularity module and start a shell within the docker container.
 +<​code>​
 +compute0805:​pwolinsk:​$ module load singularity
 +compute0805:​pwolinsk:​$ singularity shell  --nv /​share/​apps/​singularity/​images/​nvidia-tensorflow\:​18.01-py2-ahpcc.simg
 +Singularity:​ Invoking an interactive shell within container...
 +
 +Singularity nvidia-tensorflow:​18.01-py2-ahpcc.simg:​~>​ python ~/​models/​tutorials/​image/​mnist/​convolutional.py ​
 +Extracting data/​train-images-idx3-ubyte.gz
 +Extracting data/​train-labels-idx1-ubyte.gz
 +Extracting data/​t10k-images-idx3-ubyte.gz
 +Extracting data/​t10k-labels-idx1-ubyte.gz
 +2018-03-06 20:​49:​48.176645:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 0 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​81:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421609:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1030] Found device 1 with properties: ​
 +name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz):​ 0.745
 +pciBusID: 0000:​82:​00.0
 +totalMemory:​ 11.17GiB freeMemory: 11.09GiB
 +2018-03-06 20:​49:​48.421905:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1045] Device peer to peer matrix
 +2018-03-06 20:​49:​48.421966:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1051] DMA: 0 1 
 +2018-03-06 20:​49:​48.421981:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 0:   Y Y 
 +2018-03-06 20:​49:​48.421989:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1061] 1:   Y Y 
 +2018-03-06 20:​49:​48.422010:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:​81:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +2018-03-06 20:​49:​48.422033:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:​82:​00.0,​ compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2.
 +Initialized!
 +Step 0 (epoch 0.00), 40.5 ms
 +Minibatch loss: 8.334, learning rate: 0.010000
 +Minibatch error: 85.9%
 +Validation error: 84.6%
 +Step 100 (epoch 0.12), 48.3 ms
 +....
 +</​code>​
 +
 +While the Tensorflow job is running inside the Singularity container, ssh into the node and verify that the GPUS are in use:
 +
 +<​code>​
 +compute0805:​pwolinsk:​$ nvidia-smi ​
 +Tue Mar  6 14:49:50 2018       
 ++-----------------------------------------------------------------------------+
 +| NVIDIA-SMI 390.30 ​                ​Driver Version: 390.30 ​                   |
 +|-------------------------------+----------------------+----------------------+
 +| GPU  Name        Persistence-M| Bus-Id ​       Disp.A | Volatile Uncorr. ECC |
 +| Fan  Temp  Perf  Pwr:​Usage/​Cap| ​        ​Memory-Usage | GPU-Util ​ Compute M. |
 +|===============================+======================+======================|
 +|   ​0 ​ Tesla K40c          Off  | 00000000:​81:​00.0 Off |                    0 |
 +| 24%   ​48C ​   P0    66W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +|   ​1 ​ Tesla K40c          Off  | 00000000:​82:​00.0 Off |                    0 |
 +| 25%   ​50C ​   P0    62W / 235W |     79MiB / 11441MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +                                                                               
 ++-----------------------------------------------------------------------------+
 +| Processes: ​                                                      GPU Memory |
 +|  GPU       ​PID ​  ​Type ​  ​Process name                             ​Usage ​     |
 +|=============================================================================|
 +|    0     ​12628 ​     C   ​python ​                                       68MiB |
 +|    1     ​12628 ​     C   ​python ​                                       68MiB |
 ++-----------------------------------------------------------------------------+
 +compute0805:​pwolinsk:​$ ​
 +
 +</​code>​
singularity.txt · Last modified: 2018/03/06 21:29 by pwolinsk