This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
singularity [2017/12/01 21:16] pwolinsk |
singularity [2018/03/06 21:27] pwolinsk |
||
---|---|---|---|
Line 119: | Line 119: | ||
</code> | </code> | ||
+ | === Tensorflow Example - GPU NVIDIA container === | ||
+ | Start an interactive job on a gpu node: | ||
+ | <code> | ||
+ | razor-l1:pwolinsk:$ qsub -I -q gpu16core | ||
+ | qsub: waiting for job 3927490.sched to start | ||
+ | qsub: job 3927490.sched ready | ||
+ | |||
+ | Currently Loaded Modulefiles: | ||
+ | 1) os/el6 | ||
+ | compute0805:pwolinsk:$ | ||
+ | </code> | ||
+ | |||
+ | Clone the tensorflow example models: | ||
+ | <code> | ||
+ | compute0805:pwolinsk:$ git clone https://github.com/tensorflow/models | ||
+ | Initialized empty Git repository in /home/pwolinsk/models/.git/ | ||
+ | remote: Counting objects: 12884, done. | ||
+ | remote: Compressing objects: 100% (6/6), done. | ||
+ | remote: Total 12884 (delta 2), reused 3 (delta 2), pack-reused 12876 | ||
+ | Receiving objects: 100% (12884/12884), 412.34 MiB | 27.24 MiB/s, done. | ||
+ | Resolving deltas: 100% (7276/7276), done. | ||
+ | </code> | ||
+ | |||
+ | Load the singularity module and start a shell within the docker container. | ||
+ | <code> | ||
+ | compute0805:pwolinsk:$ module load singularity | ||
+ | compute0805:pwolinsk:$ singularity shell --nv /share/apps/singularity/images/nvidia-tensorflow\:18.01-py2-ahpcc.simg | ||
+ | Singularity: Invoking an interactive shell within container... | ||
+ | |||
+ | Singularity nvidia-tensorflow:18.01-py2-ahpcc.simg:~> python ~/models/tutorials/image/mnist/convolutional.py | ||
+ | Extracting data/train-images-idx3-ubyte.gz | ||
+ | Extracting data/train-labels-idx1-ubyte.gz | ||
+ | Extracting data/t10k-images-idx3-ubyte.gz | ||
+ | Extracting data/t10k-labels-idx1-ubyte.gz | ||
+ | 2018-03-06 20:49:48.176645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: | ||
+ | name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745 | ||
+ | pciBusID: 0000:81:00.0 | ||
+ | totalMemory: 11.17GiB freeMemory: 11.09GiB | ||
+ | 2018-03-06 20:49:48.421609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: | ||
+ | name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745 | ||
+ | pciBusID: 0000:82:00.0 | ||
+ | totalMemory: 11.17GiB freeMemory: 11.09GiB | ||
+ | 2018-03-06 20:49:48.421905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix | ||
+ | 2018-03-06 20:49:48.421966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 | ||
+ | 2018-03-06 20:49:48.421981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y | ||
+ | 2018-03-06 20:49:48.421989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y | ||
+ | 2018-03-06 20:49:48.422010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 0, name: Tesla K40c, pci bus id: 0000:81:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2. | ||
+ | 2018-03-06 20:49:48.422033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] Ignoring visible gpu device (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) with Cuda compute capability 3.5. The minimum required Cuda capability is 5.2. | ||
+ | Initialized! | ||
+ | Step 0 (epoch 0.00), 40.5 ms | ||
+ | Minibatch loss: 8.334, learning rate: 0.010000 | ||
+ | Minibatch error: 85.9% | ||
+ | Validation error: 84.6% | ||
+ | Step 100 (epoch 0.12), 48.3 ms | ||
+ | .... | ||
+ | |||
+ | compute0805:pwolinsk:$ nvidia-smi | ||
+ | Tue Mar 6 14:49:50 2018 | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | NVIDIA-SMI 390.30 Driver Version: 390.30 | | ||
+ | |-------------------------------+----------------------+----------------------+ | ||
+ | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
+ | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | ||
+ | |===============================+======================+======================| | ||
+ | | 0 Tesla K40c Off | 00000000:81:00.0 Off | 0 | | ||
+ | | 24% 48C P0 66W / 235W | 79MiB / 11441MiB | 0% Default | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | | 1 Tesla K40c Off | 00000000:82:00.0 Off | 0 | | ||
+ | | 25% 50C P0 62W / 235W | 79MiB / 11441MiB | 0% Default | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | |||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | Processes: GPU Memory | | ||
+ | | GPU PID Type Process name Usage | | ||
+ | |=============================================================================| | ||
+ | | 0 12628 C python 68MiB | | ||
+ | | 1 12628 C python 68MiB | | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | compute0805:pwolinsk:$ | ||
+ | |||
+ | </code> |