User Tools

Site Tools


tensorflow

This is an old revision of the document!


Tensorflow

Tensorflow is an open source, deep learning software library for numerical computation using data flow graphs. Detailed information about the software is available on the project website:

https://www.tensorflow.org/

The library is available as a python package. The cpu version is installed for python/2.7.5 on both clusters and requires 3 additional dependencies gcc/4.9.1 mkl/16.0.1 java/sunjdk_1.8.0 . The gpu version is installed for python 2.7.11 on razor and requires gcc/4.9.1 mkl/16.0.1 java/sunjdk/1.8.0 cuda/8.0 as well as python/2.7.11.

tres0118:pwolinsk:$ module load gcc/4.9.1 python/2.7.5 mkl/16.0.1 java/sunjdk_1.8.0
tres0118:pwolinsk:$ python
Python 2.7.5 (default, Jul 10 2014, 16:10:08) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> 

The tensorflow package is installed on Razor in /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow. The installation contains a few example models: image/alexnet image/cifar10 image/imagenet image/mnist embedding.

We will use the image/mnist training model to run a training session.

tres0118:pwolinsk:$ python -m tensorflow.models.image.mnist.convolutional
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 5.1 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 203.7 ms
Minibatch loss: 3.282, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.1%
...

The -m option instructs python to search the PYTHON path for a specified program name. You could also specify the full path to the convolutional.py script.

python /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow/models/image/mnist/convolutional.py

The cifar10 tensorflow example has been tested with cpu and gpu. To get the example set:

git clone https://github.com/tensorflow/models
cd models/tutorials/image/cifar10
vi cifar10.py

and edit cifar10.py to change /tmp/ to an appropriate scratchdirectory such as /local_scratch/rfeynman/. Running on the cpu (this is a gpu node, but using the cpu version of tensorflow) will look like

$ python cifar10_multi_gpu_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-17 14:13:46.533942: step 0, loss = 4.68 (29.7 examples/sec; 4.305 sec/batch)
2017-03-17 14:13:49.055265: step 10, loss = 4.66 (781.8 examples/sec; 0.164 sec/batch)
2017-03-17 14:13:50.697406: step 20, loss = 4.63 (771.2 examples/sec; 0.166 sec/batch)
2017-03-17 14:13:52.340482: step 30, loss = 4.60 (771.3 examples/sec; 0.166 sec/batch)

Running on dual gpus is only about 50% faster than dual cpus for this example.

compute0805:dchaffin:/storage/dchaffin/models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-17 15:22:28.162660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:81:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-03-17 15:22:28.162838: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3b3caf0
2017-03-17 15:22:28.423756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:82:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-03-17 15:22:28.424610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1
2017-03-17 15:22:28.424625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y Y
2017-03-17 15:22:28.424631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1:   Y Y
2017-03-17 15:22:28.424668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:81:00.0)
2017-03-17 15:22:28.424699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0)
2017-03-17 15:22:47.016337: step 0, loss = 4.67 (7.9 examples/sec; 16.237 sec/batch)
2017-03-17 15:22:48.801909: step 10, loss = 4.65 (1078.0 examples/sec; 0.119 sec/batch)
2017-03-17 15:22:49.943175: step 20, loss = 4.63 (1115.5 examples/sec; 0.115 sec/batch)
2017-03-17 15:22:51.107477: step 30, loss = 4.60 (1092.9 examples/sec; 0.117 sec/batch)
2017-03-17 15:22:52.271683: step 40, loss = 4.58 (1073.0 examples/sec; 0.119 sec/batch)
2017-03-17 15:22:53.426386: step 50, loss = 4.51 (1080.8 examples/sec; 0.118 sec/batch)
2017-03-17 15:22:54.601170: step 60, loss = 4.49 (1060.7 examples/sec; 0.121 sec/batch)
2017-03-17 15:22:55.759457: step 70, loss = 4.49 (1091.8 examples/sec; 0.117 sec/batch)
2017-03-17 15:22:57.005849: step 80, loss = 4.48 (1111.7 examples/sec; 0.115 sec/batch)
tensorflow.1489784544.txt.gz · Last modified: 2017/03/17 21:02 by root