User Tools

Site Tools


tensorflow

Tensorflow

Tensorflow is an open source, deep learning software library for numerical computation using data flow graphs. Detailed information about the software is available on the project website:

https://www.tensorflow.org/

The library is available as a python package. The cpu version is installed for python/2.7.5 on both clusters and requires 3 additional dependencies gcc/4.9.1 mkl/16.0.1 java/sunjdk_1.8.0 . The gpu version is installed for python 2.7.11 on razor and requires gcc/4.9.1 mkl/16.0.1 java/sunjdk/1.8.0 cuda/8.0 as well as python/2.7.11.

tres0118:pwolinsk:$ module load gcc/4.9.1 python/2.7.5 mkl/16.0.1 java/sunjdk_1.8.0
tres0118:pwolinsk:$ python
Python 2.7.5 (default, Jul 10 2014, 16:10:08) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> 

The tensorflow package is installed on Razor in /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow. The installation contains a few example models: image/alexnet image/cifar10 image/imagenet image/mnist embedding.

We will use the image/mnist training model to run a training session.

tres0118:pwolinsk:$ python -m tensorflow.models.image.mnist.convolutional
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 5.1 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 203.7 ms
Minibatch loss: 3.282, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.1%
...

The -m option instructs python to search the PYTHON path for a specified program name. You could also specify the full path to the convolutional.py script.

python /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow/models/image/mnist/convolutional.py

The cifar10 tensorflow example has been tested with cpu and gpu. To get the example set:

git clone https://github.com/tensorflow/models
cd models/tutorials/image/cifar10
vi cifar10.py
module load gcc/5.2.1 mkl/16.0.1 python/2.7.11  java/sunjdk_1.8.0 cuda/8.0

and edit cifar10.py to change /tmp/ to an appropriate scratch directory such as /local_scratch/rfeynman/. We use CUDA_VISIBLE_DEVICES to simulate 0,1,4 devices. Scaling performance from CPU to 1 GPU to 4 GPU (two twin K80) is very modest. There is essentially no difference between 1 gpu and 4.

models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES=""
models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-17 14:13:46.533942: step 0, loss = 4.68 (29.7 examples/sec; 4.305 sec/batch)
2017-03-17 14:13:49.055265: step 10, loss = 4.66 (781.8 examples/sec; 0.164 sec/batch)
2017-03-17 14:13:50.697406: step 20, loss = 4.63 (771.2 examples/sec; 0.166 sec/batch)
2017-03-17 14:13:52.340482: step 30, loss = 4.60 (771.3 examples/sec; 0.166 sec/batch)

models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="0"
models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.

2017-03-21 15:41:42.767510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:41:42.767552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-03-21 15:41:42.767559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-03-21 15:41:42.767571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
2017-03-21 15:42:06.767827: step 0, loss = 4.68 (37.1 examples/sec; 3.448 sec/batch)
2017-03-21 15:42:07.701603: step 10, loss = 4.67 (1370.8 examples/sec; 0.093 sec/batch)
2017-03-21 15:42:08.750127: step 20, loss = 4.60 (1220.8 examples/sec; 0.105 sec/batch)
2017-03-21 15:42:09.762612: step 30, loss = 4.61 (1264.2 examples/sec; 0.101 sec/batch)
2017-03-21 15:42:10.769818: step 40, loss = 4.58 (1270.8 examples/sec; 0.101 sec/batch)
2017-03-21 15:42:11.768493: step 50, loss = 4.53 (1281.7 examples/sec; 0.100 sec/batch)
2017-03-21 15:42:12.769582: step 60, loss = 4.52 (1278.6 examples/sec; 0.100 sec/batch)
2017-03-21 15:42:13.769733: step 70, loss = 4.54 (1279.8 examples/sec; 0.100 sec/batch)

models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="0,1,2,3"
models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py
2017-03-21 15:43:37.866128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:37.866246: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x364c2b0
2017-03-21 15:43:38.104892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.104995: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x36500f0
2017-03-21 15:43:38.349437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 2 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.349535: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3653f60
2017-03-21 15:43:38.600657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 3 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-03-21 15:43:38.602403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 2 3 
2017-03-21 15:43:38.602412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y Y N N 
2017-03-21 15:43:38.602418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1:   Y Y N N 
2017-03-21 15:43:38.602423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 2:   N N Y Y 
2017-03-21 15:43:38.602428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 3:   N N Y Y 
2017-03-21 15:43:38.602445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
2017-03-21 15:43:38.602453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
2017-03-21 15:43:38.602459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
2017-03-21 15:43:38.602464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
2017-03-21 15:43:54.086766: step 0, loss = 4.67 (47.9 examples/sec; 2.674 sec/batch)
2017-03-21 15:43:55.013200: step 10, loss = 4.66 (1381.7 examples/sec; 0.093 sec/batch)
2017-03-21 15:43:56.011015: step 20, loss = 4.65 (1282.8 examples/sec; 0.100 sec/batch)
2017-03-21 15:43:56.967307: step 30, loss = 4.60 (1338.5 examples/sec; 0.096 sec/batch)
2017-03-21 15:43:57.940303: step 40, loss = 4.57 (1315.5 examples/sec; 0.097 sec/batch)
2017-03-21 15:43:58.902810: step 50, loss = 4.54 (1329.9 examples/sec; 0.096 sec/batch)
2017-03-21 15:43:59.859618: step 60, loss = 4.48 (1337.8 examples/sec; 0.096 sec/batch)
tensorflow.txt · Last modified: 2017/03/21 20:55 by root