This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
tensorflow [2016/08/18 20:14] pwolinsk |
tensorflow [2017/03/21 20:55] (current) root |
||
---|---|---|---|
Line 1: | Line 1: | ||
==== Tensorflow ==== | ==== Tensorflow ==== | ||
- | Tensorflow is an open source software library for numerical computation using data flow graphs. Detailed information about the software is available on the project website: | + | Tensorflow is an open source, deep learning software library for numerical computation using data flow graphs. Detailed information about the software is available on the project website: |
https://www.tensorflow.org/ | https://www.tensorflow.org/ | ||
- | The library is available as a python package. It is installed for **python/2.7.5** and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/sunjdk_1.8.0** | + | The library is available as a python package. The cpu version is installed for **python/2.7.5** on both clusters and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/sunjdk_1.8.0** . The gpu version is installed for **python 2.7.11** on razor and requires **gcc/4.9.1 mkl/16.0.1 java/sunjdk/1.8.0 cuda/8.0** as well as **python/2.7.11**. |
<code> | <code> | ||
Line 48: | Line 48: | ||
python /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow/models/image/mnist/convolutional.py | python /share/apps/opt/rh/python27/root/usr/lib/python2.7/site-packages/tensorflow/models/image/mnist/convolutional.py | ||
</code> | </code> | ||
+ | |||
+ | The **cifar10** tensorflow example has been tested with cpu and gpu. To get the example set: | ||
+ | <code> | ||
+ | git clone https://github.com/tensorflow/models | ||
+ | cd models/tutorials/image/cifar10 | ||
+ | vi cifar10.py | ||
+ | module load gcc/5.2.1 mkl/16.0.1 python/2.7.11 java/sunjdk_1.8.0 cuda/8.0 | ||
+ | </code> | ||
+ | and edit cifar10.py to change ''/tmp/'' to an appropriate scratch directory such as ''/local_scratch/rfeynman/''. We use CUDA_VISIBLE_DEVICES to simulate 0,1,4 devices. Scaling performance from CPU to 1 GPU to 4 GPU (two twin K80) is very modest. There is essentially no difference between 1 gpu and 4. | ||
+ | <code> | ||
+ | models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="" | ||
+ | models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py | ||
+ | Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. | ||
+ | 2017-03-17 14:13:46.533942: step 0, loss = 4.68 (29.7 examples/sec; 4.305 sec/batch) | ||
+ | 2017-03-17 14:13:49.055265: step 10, loss = 4.66 (781.8 examples/sec; 0.164 sec/batch) | ||
+ | 2017-03-17 14:13:50.697406: step 20, loss = 4.63 (771.2 examples/sec; 0.166 sec/batch) | ||
+ | 2017-03-17 14:13:52.340482: step 30, loss = 4.60 (771.3 examples/sec; 0.166 sec/batch) | ||
+ | |||
+ | models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="0" | ||
+ | models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py | ||
+ | Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. | ||
+ | |||
+ | 2017-03-21 15:41:42.767510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: | ||
+ | name: Tesla K80 | ||
+ | major: 3 minor: 7 memoryClockRate (GHz) 0.8235 | ||
+ | pciBusID 0000:04:00.0 | ||
+ | Total memory: 11.17GiB | ||
+ | Free memory: 11.11GiB | ||
+ | 2017-03-21 15:41:42.767552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 | ||
+ | 2017-03-21 15:41:42.767559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y | ||
+ | 2017-03-21 15:41:42.767571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0) | ||
+ | 2017-03-21 15:42:06.767827: step 0, loss = 4.68 (37.1 examples/sec; 3.448 sec/batch) | ||
+ | 2017-03-21 15:42:07.701603: step 10, loss = 4.67 (1370.8 examples/sec; 0.093 sec/batch) | ||
+ | 2017-03-21 15:42:08.750127: step 20, loss = 4.60 (1220.8 examples/sec; 0.105 sec/batch) | ||
+ | 2017-03-21 15:42:09.762612: step 30, loss = 4.61 (1264.2 examples/sec; 0.101 sec/batch) | ||
+ | 2017-03-21 15:42:10.769818: step 40, loss = 4.58 (1270.8 examples/sec; 0.101 sec/batch) | ||
+ | 2017-03-21 15:42:11.768493: step 50, loss = 4.53 (1281.7 examples/sec; 0.100 sec/batch) | ||
+ | 2017-03-21 15:42:12.769582: step 60, loss = 4.52 (1278.6 examples/sec; 0.100 sec/batch) | ||
+ | 2017-03-21 15:42:13.769733: step 70, loss = 4.54 (1279.8 examples/sec; 0.100 sec/batch) | ||
+ | |||
+ | models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="0,1,2,3" | ||
+ | models/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py | ||
+ | 2017-03-21 15:43:37.866128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: | ||
+ | name: Tesla K80 | ||
+ | major: 3 minor: 7 memoryClockRate (GHz) 0.8235 | ||
+ | pciBusID 0000:04:00.0 | ||
+ | Total memory: 11.17GiB | ||
+ | Free memory: 11.11GiB | ||
+ | 2017-03-21 15:43:37.866246: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x364c2b0 | ||
+ | 2017-03-21 15:43:38.104892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties: | ||
+ | name: Tesla K80 | ||
+ | major: 3 minor: 7 memoryClockRate (GHz) 0.8235 | ||
+ | pciBusID 0000:05:00.0 | ||
+ | Total memory: 11.17GiB | ||
+ | Free memory: 11.11GiB | ||
+ | 2017-03-21 15:43:38.104995: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x36500f0 | ||
+ | 2017-03-21 15:43:38.349437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 2 with properties: | ||
+ | name: Tesla K80 | ||
+ | major: 3 minor: 7 memoryClockRate (GHz) 0.8235 | ||
+ | pciBusID 0000:84:00.0 | ||
+ | Total memory: 11.17GiB | ||
+ | Free memory: 11.11GiB | ||
+ | 2017-03-21 15:43:38.349535: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3653f60 | ||
+ | 2017-03-21 15:43:38.600657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 3 with properties: | ||
+ | name: Tesla K80 | ||
+ | major: 3 minor: 7 memoryClockRate (GHz) 0.8235 | ||
+ | pciBusID 0000:85:00.0 | ||
+ | Total memory: 11.17GiB | ||
+ | Free memory: 11.11GiB | ||
+ | 2017-03-21 15:43:38.602403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 2 3 | ||
+ | 2017-03-21 15:43:38.602412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y Y N N | ||
+ | 2017-03-21 15:43:38.602418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1: Y Y N N | ||
+ | 2017-03-21 15:43:38.602423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 2: N N Y Y | ||
+ | 2017-03-21 15:43:38.602428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 3: N N Y Y | ||
+ | 2017-03-21 15:43:38.602445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0) | ||
+ | 2017-03-21 15:43:38.602453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0) | ||
+ | 2017-03-21 15:43:38.602459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0) | ||
+ | 2017-03-21 15:43:38.602464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0) | ||
+ | 2017-03-21 15:43:54.086766: step 0, loss = 4.67 (47.9 examples/sec; 2.674 sec/batch) | ||
+ | 2017-03-21 15:43:55.013200: step 10, loss = 4.66 (1381.7 examples/sec; 0.093 sec/batch) | ||
+ | 2017-03-21 15:43:56.011015: step 20, loss = 4.65 (1282.8 examples/sec; 0.100 sec/batch) | ||
+ | 2017-03-21 15:43:56.967307: step 30, loss = 4.60 (1338.5 examples/sec; 0.096 sec/batch) | ||
+ | 2017-03-21 15:43:57.940303: step 40, loss = 4.57 (1315.5 examples/sec; 0.097 sec/batch) | ||
+ | 2017-03-21 15:43:58.902810: step 50, loss = 4.54 (1329.9 examples/sec; 0.096 sec/batch) | ||
+ | 2017-03-21 15:43:59.859618: step 60, loss = 4.48 (1337.8 examples/sec; 0.096 sec/batch) | ||
+ | </code> | ||
+ | |||