Tensorflow is an open source, deep learning software library for numerical computation using data flow graphs. Detailed information about the software is available on the project website:
The library is available as a python package. The cpu version is installed for python/2.7.5 on both clusters and requires 3 additional dependencies gcc/4.9.1 mkl/16.0.1 java/sunjdk_1.8.0 . The gpu version is installed for python 2.7.11 on razor and requires gcc/4.9.1 mkl/16.0.1 java/sunjdk/1.8.0 cuda/8.0 as well as python/2.7.11.
tres0118:pwolinsk:$ module load gcc/4.9.1 python/2.7.5 mkl/16.0.1 java/sunjdk_1.8.0 tres0118:pwolinsk:$ python Python 2.7.5 (default, Jul 10 2014, 16:10:08) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow >>>
The tensorflow package is installed on Razor in /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow
. The installation contains a few example models: image/alexnet image/cifar10 image/imagenet image/mnist embedding
.
We will use the image/mnist training model to run a training session.
tres0118:pwolinsk:$ python -m tensorflow.models.image.mnist.convolutional Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting data/train-images-idx3-ubyte.gz Extracting data/train-labels-idx1-ubyte.gz Extracting data/t10k-images-idx3-ubyte.gz Extracting data/t10k-labels-idx1-ubyte.gz Initialized! Step 0 (epoch 0.00), 5.1 ms Minibatch loss: 12.054, learning rate: 0.010000 Minibatch error: 90.6% Validation error: 84.6% Step 100 (epoch 0.12), 203.7 ms Minibatch loss: 3.282, learning rate: 0.010000 Minibatch error: 6.2% Validation error: 7.1% ...
The -m
option instructs python to search the PYTHON path for a specified program name. You could also specify the full path to the convolutional.py script.
python /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow/models/image/mnist/convolutional.py
The cifar10 tensorflow example has been tested with cpu and gpu. To get the example set:
git clone https://github.com/tensorflow/models cd models/tutorials/image/cifar10 vi cifar10.py module load gcc/5.2.1 mkl/16.0.1 python/2.7.11 java/sunjdk_1.8.0 cuda/8.0
and edit cifar10.py to change /tmp/
to an appropriate scratch directory such as /localscratch/rfeynman/''. We use CUDAVISIBLEDEVICES to simulate 0,1,4 devices. Scaling performance from CPU to 1 GPU to 4 GPU (two twin K80) is very modest. There is essentially no difference between 1 gpu and 4.
<code>
models/tutorials/image/cifar10$ export CUDAVISIBLEDEVICES=“”
models/tutorials/image/cifar10$ python cifar10multigputrain.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-17 14:13:46.533942: step 0, loss = 4.68 (29.7 examples/sec; 4.305 sec/batch)
2017-03-17 14:13:49.055265: step 10, loss = 4.66 (781.8 examples/sec; 0.164 sec/batch)
2017-03-17 14:13:50.697406: step 20, loss = 4.63 (771.2 examples/sec; 0.166 sec/batch)
2017-03-17 14:13:52.340482: step 30, loss = 4.60 (771.3 examples/sec; 0.166 sec/batch)
models/tutorials/image/cifar10$ export CUDAVISIBLEDEVICES=“0”
models/tutorials/image/cifar10$ python cifar10multigpu_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-21 15:41:42.767510: I tensorflow/core/commonruntime/gpu/gpudevice.cc:887] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:41:42.767552: I tensorflow/core/commonruntime/gpu/gpudevice.cc:908] DMA: 0
2017-03-21 15:41:42.767559: I tensorflow/core/commonruntime/gpu/gpudevice.cc:918] 0: Y
2017-03-21 15:41:42.767571: I tensorflow/core/commonruntime/gpu/gpudevice.cc:977] Creating TensorFlow device (/gpu:0) → (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
2017-03-21 15:42:06.767827: step 0, loss = 4.68 (37.1 examples/sec; 3.448 sec/batch)
2017-03-21 15:42:07.701603: step 10, loss = 4.67 (1370.8 examples/sec; 0.093 sec/batch)
2017-03-21 15:42:08.750127: step 20, loss = 4.60 (1220.8 examples/sec; 0.105 sec/batch)
2017-03-21 15:42:09.762612: step 30, loss = 4.61 (1264.2 examples/sec; 0.101 sec/batch)
2017-03-21 15:42:10.769818: step 40, loss = 4.58 (1270.8 examples/sec; 0.101 sec/batch)
2017-03-21 15:42:11.768493: step 50, loss = 4.53 (1281.7 examples/sec; 0.100 sec/batch)
2017-03-21 15:42:12.769582: step 60, loss = 4.52 (1278.6 examples/sec; 0.100 sec/batch)
2017-03-21 15:42:13.769733: step 70, loss = 4.54 (1279.8 examples/sec; 0.100 sec/batch)
models/tutorials/image/cifar10$ export CUDAVISIBLEDEVICES=“0,1,2,3”
models/tutorials/image/cifar10$ python cifar10multigputrain.py
2017-03-21 15:43:37.866128: I tensorflow/core/commonruntime/gpu/gpudevice.cc:887] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:37.866246: W tensorflow/streamexecutor/cuda/cudadriver.cc:485] creating context when one is currently active; existing: 0x364c2b0
2017-03-21 15:43:38.104892: I tensorflow/core/commonruntime/gpu/gpudevice.cc:887] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.104995: W tensorflow/streamexecutor/cuda/cudadriver.cc:485] creating context when one is currently active; existing: 0x36500f0
2017-03-21 15:43:38.349437: I tensorflow/core/commonruntime/gpu/gpudevice.cc:887] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.349535: W tensorflow/streamexecutor/cuda/cudadriver.cc:485] creating context when one is currently active; existing: 0x3653f60
2017-03-21 15:43:38.600657: I tensorflow/core/commonruntime/gpu/gpudevice.cc:887] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.602403: I tensorflow/core/commonruntime/gpu/gpudevice.cc:908] DMA: 0 1 2 3
2017-03-21 15:43:38.602412: I tensorflow/core/commonruntime/gpu/gpudevice.cc:918] 0: Y Y N N
2017-03-21 15:43:38.602418: I tensorflow/core/commonruntime/gpu/gpudevice.cc:918] 1: Y Y N N
2017-03-21 15:43:38.602423: I tensorflow/core/commonruntime/gpu/gpudevice.cc:918] 2: N N Y Y
2017-03-21 15:43:38.602428: I tensorflow/core/commonruntime/gpu/gpudevice.cc:918] 3: N N Y Y
2017-03-21 15:43:38.602445: I tensorflow/core/commonruntime/gpu/gpudevice.cc:977] Creating TensorFlow device (/gpu:0) → (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
2017-03-21 15:43:38.602453: I tensorflow/core/commonruntime/gpu/gpudevice.cc:977] Creating TensorFlow device (/gpu:1) → (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
2017-03-21 15:43:38.602459: I tensorflow/core/commonruntime/gpu/gpudevice.cc:977] Creating TensorFlow device (/gpu:2) → (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
2017-03-21 15:43:38.602464: I tensorflow/core/commonruntime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:3) → (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
2017-03-21 15:43:54.086766: step 0, loss = 4.67 (47.9 examples/sec; 2.674 sec/batch)
2017-03-21 15:43:55.013200: step 10, loss = 4.66 (1381.7 examples/sec; 0.093 sec/batch)
2017-03-21 15:43:56.011015: step 20, loss = 4.65 (1282.8 examples/sec; 0.100 sec/batch)
2017-03-21 15:43:56.967307: step 30, loss = 4.60 (1338.5 examples/sec; 0.096 sec/batch)
2017-03-21 15:43:57.940303: step 40, loss = 4.57 (1315.5 examples/sec; 0.097 sec/batch)
2017-03-21 15:43:58.902810: step 50, loss = 4.54 (1329.9 examples/sec; 0.096 sec/batch)
2017-03-21 15:43:59.859618: step 60, loss = 4.48 (1337.8 examples/sec; 0.096 sec/batch)
</code>