User Tools

Site Tools


tensorflow

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
tensorflow [2016/09/02 16:33]
pwolinsk
tensorflow [2017/03/17 21:02]
root
Line 5: Line 5:
 https://​www.tensorflow.org/​ https://​www.tensorflow.org/​
  
-The library is available as a python package.  ​It is installed for **python/​2.7.5** and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/​sunjdk_1.8.0**+The library is available as a python package.  ​The cpu version ​is installed for **python/​2.7.5** ​on both clusters ​and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/​sunjdk_1.8.0** ​. The gpu version is installed for **python 2.7.11** on razor and requires **gcc/4.9.1 mkl/16.0.1 java/​sunjdk/​1.8.0 cuda/8.0** as well as **python/​2.7.11**.
  
 <​code>​ <​code>​
Line 48: Line 48:
 python /​share/​apps/​opt/​rh/​python27/​root/​usr/​lib/​python2.7/​site-packages/​tensorflow/​models/​image/​mnist/​convolutional.py python /​share/​apps/​opt/​rh/​python27/​root/​usr/​lib/​python2.7/​site-packages/​tensorflow/​models/​image/​mnist/​convolutional.py
 </​code>​ </​code>​
 +
 +The **cifar10** tensorflow example has been tested with cpu and gpu.  To get the example set:
 +<​code>​
 +git clone https://​github.com/​tensorflow/​models
 +cd models/​tutorials/​image/​cifar10
 +vi cifar10.py
 +</​code>​
 +and edit cifar10.py to change ''/​tmp/''​ to an appropriate scratchdirectory such as ''/​local_scratch/​rfeynman/''​. ​ Running on the cpu (this is a gpu node, but using the cpu version of tensorflow) will look like
 +<​code>​
 +$ python cifar10_multi_gpu_train.py
 +Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
 +2017-03-17 14:​13:​46.533942:​ step 0, loss = 4.68 (29.7 examples/​sec;​ 4.305 sec/batch)
 +2017-03-17 14:​13:​49.055265:​ step 10, loss = 4.66 (781.8 examples/​sec;​ 0.164 sec/batch)
 +2017-03-17 14:​13:​50.697406:​ step 20, loss = 4.63 (771.2 examples/​sec;​ 0.166 sec/batch)
 +2017-03-17 14:​13:​52.340482:​ step 30, loss = 4.60 (771.3 examples/​sec;​ 0.166 sec/batch)
 +</​code>​
 +Running on dual gpus is only about 50% faster than dual cpus for this example.
 +<​code>​
 +compute0805:​dchaffin:/​storage/​dchaffin/​models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
 +Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
 +2017-03-17 15:​22:​28.162660:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties:
 +name: Tesla K40c
 +major: 3 minor: 5 memoryClockRate (GHz) 0.745
 +pciBusID 0000:​81:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.10GiB
 +2017-03-17 15:​22:​28.162838:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x3b3caf0
 +2017-03-17 15:​22:​28.423756:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 1 with properties:
 +name: Tesla K40c
 +major: 3 minor: 5 memoryClockRate (GHz) 0.745
 +pciBusID 0000:​82:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.10GiB
 +2017-03-17 15:​22:​28.424610:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0 1
 +2017-03-17 15:​22:​28.424625:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   Y Y
 +2017-03-17 15:​22:​28.424631:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 1:   Y Y
 +2017-03-17 15:​22:​28.424668:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:​81:​00.0)
 +2017-03-17 15:​22:​28.424699:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:​82:​00.0)
 +2017-03-17 15:​22:​47.016337:​ step 0, loss = 4.67 (7.9 examples/​sec;​ 16.237 sec/batch)
 +2017-03-17 15:​22:​48.801909:​ step 10, loss = 4.65 (1078.0 examples/​sec;​ 0.119 sec/batch)
 +2017-03-17 15:​22:​49.943175:​ step 20, loss = 4.63 (1115.5 examples/​sec;​ 0.115 sec/batch)
 +2017-03-17 15:​22:​51.107477:​ step 30, loss = 4.60 (1092.9 examples/​sec;​ 0.117 sec/batch)
 +2017-03-17 15:​22:​52.271683:​ step 40, loss = 4.58 (1073.0 examples/​sec;​ 0.119 sec/batch)
 +2017-03-17 15:​22:​53.426386:​ step 50, loss = 4.51 (1080.8 examples/​sec;​ 0.118 sec/batch)
 +2017-03-17 15:​22:​54.601170:​ step 60, loss = 4.49 (1060.7 examples/​sec;​ 0.121 sec/batch)
 +2017-03-17 15:​22:​55.759457:​ step 70, loss = 4.49 (1091.8 examples/​sec;​ 0.117 sec/batch)
 +2017-03-17 15:​22:​57.005849:​ step 80, loss = 4.48 (1111.7 examples/​sec;​ 0.115 sec/batch)
 +</​code>​
 +
  
  
  
tensorflow.txt · Last modified: 2017/03/21 20:55 by root