User Tools

Site Tools


tensorflow

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tensorflow [2016/09/02 16:33]
pwolinsk
tensorflow [2017/03/21 20:55] (current)
root
Line 5: Line 5:
 https://​www.tensorflow.org/​ https://​www.tensorflow.org/​
  
-The library is available as a python package.  ​It is installed for **python/​2.7.5** and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/​sunjdk_1.8.0**+The library is available as a python package.  ​The cpu version ​is installed for **python/​2.7.5** ​on both clusters ​and requires 3 additional dependencies **gcc/4.9.1 mkl/16.0.1 java/​sunjdk_1.8.0** ​. The gpu version is installed for **python 2.7.11** on razor and requires **gcc/4.9.1 mkl/16.0.1 java/​sunjdk/​1.8.0 cuda/8.0** as well as **python/​2.7.11**.
  
 <​code>​ <​code>​
Line 48: Line 48:
 python /​share/​apps/​opt/​rh/​python27/​root/​usr/​lib/​python2.7/​site-packages/​tensorflow/​models/​image/​mnist/​convolutional.py python /​share/​apps/​opt/​rh/​python27/​root/​usr/​lib/​python2.7/​site-packages/​tensorflow/​models/​image/​mnist/​convolutional.py
 </​code>​ </​code>​
 +
 +The **cifar10** tensorflow example has been tested with cpu and gpu.  To get the example set:
 +<​code>​
 +git clone https://​github.com/​tensorflow/​models
 +cd models/​tutorials/​image/​cifar10
 +vi cifar10.py
 +module load gcc/5.2.1 mkl/16.0.1 python/​2.7.11 ​ java/​sunjdk_1.8.0 cuda/8.0
 +</​code>​
 +and edit cifar10.py to change ''/​tmp/''​ to an appropriate scratch directory such as ''/​local_scratch/​rfeynman/''​. ​ We use CUDA_VISIBLE_DEVICES to simulate 0,1,4 devices. Scaling performance from CPU to 1 GPU to 4 GPU (two twin K80) is very modest. ​ There is essentially no difference between 1 gpu and 4.
 +<​code>​
 +models/​tutorials/​image/​cifar10$ export CUDA_VISIBLE_DEVICES=""​
 +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
 +Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
 +2017-03-17 14:​13:​46.533942:​ step 0, loss = 4.68 (29.7 examples/​sec;​ 4.305 sec/batch)
 +2017-03-17 14:​13:​49.055265:​ step 10, loss = 4.66 (781.8 examples/​sec;​ 0.164 sec/batch)
 +2017-03-17 14:​13:​50.697406:​ step 20, loss = 4.63 (771.2 examples/​sec;​ 0.166 sec/batch)
 +2017-03-17 14:​13:​52.340482:​ step 30, loss = 4.60 (771.3 examples/​sec;​ 0.166 sec/batch)
 +
 +models/​tutorials/​image/​cifar10$ export CUDA_VISIBLE_DEVICES="​0"​
 +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
 +Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
 +
 +2017-03-21 15:​41:​42.767510:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties: ​
 +name: Tesla K80
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235
 +pciBusID 0000:​04:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.11GiB
 +2017-03-21 15:​41:​42.767552:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0 
 +2017-03-21 15:​41:​42.767559:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   ​Y ​
 +2017-03-21 15:​41:​42.767571:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:​04:​00.0)
 +2017-03-21 15:​42:​06.767827:​ step 0, loss = 4.68 (37.1 examples/​sec;​ 3.448 sec/batch)
 +2017-03-21 15:​42:​07.701603:​ step 10, loss = 4.67 (1370.8 examples/​sec;​ 0.093 sec/batch)
 +2017-03-21 15:​42:​08.750127:​ step 20, loss = 4.60 (1220.8 examples/​sec;​ 0.105 sec/batch)
 +2017-03-21 15:​42:​09.762612:​ step 30, loss = 4.61 (1264.2 examples/​sec;​ 0.101 sec/batch)
 +2017-03-21 15:​42:​10.769818:​ step 40, loss = 4.58 (1270.8 examples/​sec;​ 0.101 sec/batch)
 +2017-03-21 15:​42:​11.768493:​ step 50, loss = 4.53 (1281.7 examples/​sec;​ 0.100 sec/batch)
 +2017-03-21 15:​42:​12.769582:​ step 60, loss = 4.52 (1278.6 examples/​sec;​ 0.100 sec/batch)
 +2017-03-21 15:​42:​13.769733:​ step 70, loss = 4.54 (1279.8 examples/​sec;​ 0.100 sec/batch)
 +
 +models/​tutorials/​image/​cifar10$ export CUDA_VISIBLE_DEVICES="​0,​1,​2,​3"​
 +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
 +2017-03-21 15:​43:​37.866128:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties: ​
 +name: Tesla K80
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235
 +pciBusID 0000:​04:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.11GiB
 +2017-03-21 15:​43:​37.866246:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x364c2b0
 +2017-03-21 15:​43:​38.104892:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 1 with properties: ​
 +name: Tesla K80
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235
 +pciBusID 0000:​05:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.11GiB
 +2017-03-21 15:​43:​38.104995:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x36500f0
 +2017-03-21 15:​43:​38.349437:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 2 with properties: ​
 +name: Tesla K80
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235
 +pciBusID 0000:​84:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.11GiB
 +2017-03-21 15:​43:​38.349535:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x3653f60
 +2017-03-21 15:​43:​38.600657:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 3 with properties: ​
 +name: Tesla K80
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235
 +pciBusID 0000:​85:​00.0
 +Total memory: 11.17GiB
 +Free memory: 11.11GiB
 +2017-03-21 15:​43:​38.602403:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0 1 2 3 
 +2017-03-21 15:​43:​38.602412:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   Y Y N N 
 +2017-03-21 15:​43:​38.602418:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 1:   Y Y N N 
 +2017-03-21 15:​43:​38.602423:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 2:   N N Y Y 
 +2017-03-21 15:​43:​38.602428:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 3:   N N Y Y 
 +2017-03-21 15:​43:​38.602445:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:​04:​00.0)
 +2017-03-21 15:​43:​38.602453:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:​05:​00.0)
 +2017-03-21 15:​43:​38.602459:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:​84:​00.0)
 +2017-03-21 15:​43:​38.602464:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:​85:​00.0)
 +2017-03-21 15:​43:​54.086766:​ step 0, loss = 4.67 (47.9 examples/​sec;​ 2.674 sec/batch)
 +2017-03-21 15:​43:​55.013200:​ step 10, loss = 4.66 (1381.7 examples/​sec;​ 0.093 sec/batch)
 +2017-03-21 15:​43:​56.011015:​ step 20, loss = 4.65 (1282.8 examples/​sec;​ 0.100 sec/batch)
 +2017-03-21 15:​43:​56.967307:​ step 30, loss = 4.60 (1338.5 examples/​sec;​ 0.096 sec/batch)
 +2017-03-21 15:​43:​57.940303:​ step 40, loss = 4.57 (1315.5 examples/​sec;​ 0.097 sec/batch)
 +2017-03-21 15:​43:​58.902810:​ step 50, loss = 4.54 (1329.9 examples/​sec;​ 0.096 sec/batch)
 +2017-03-21 15:​43:​59.859618:​ step 60, loss = 4.48 (1337.8 examples/​sec;​ 0.096 sec/batch)
 +</​code>​
 +
  
  
  
tensorflow.1472834024.txt.gz · Last modified: 2016/09/02 16:33 by pwolinsk