User Tools

Site Tools


tensorflow

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
tensorflow [2017/03/17 21:02]
root
tensorflow [2017/03/21 20:55] (current)
root
Line 54: Line 54:
 cd models/​tutorials/​image/​cifar10 cd models/​tutorials/​image/​cifar10
 vi cifar10.py vi cifar10.py
 +module load gcc/5.2.1 mkl/16.0.1 python/​2.7.11 ​ java/​sunjdk_1.8.0 cuda/8.0
 </​code>​ </​code>​
-and edit cifar10.py to change ''/​tmp/''​ to an appropriate ​scratchdirectory ​such as ''/​local_scratch/​rfeynman/''​.  ​Running on the cpu (this is gpu node, but using the cpu version of tensorflow) will look like+and edit cifar10.py to change ''/​tmp/''​ to an appropriate ​scratch directory ​such as ''/​local_scratch/​rfeynman/''​.  ​We use CUDA_VISIBLE_DEVICES to simulate 0,1,4 devices. Scaling performance from CPU to 1 GPU to 4 GPU (two twin K80) is very modest. ​ There is essentially no difference between 1 gpu and 4.
 <​code>​ <​code>​
-$ python cifar10_multi_gpu_train.py+models/​tutorials/​image/​cifar10$ export CUDA_VISIBLE_DEVICES=""​ 
 +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
 Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
 2017-03-17 14:​13:​46.533942:​ step 0, loss = 4.68 (29.7 examples/​sec;​ 4.305 sec/batch) 2017-03-17 14:​13:​46.533942:​ step 0, loss = 4.68 (29.7 examples/​sec;​ 4.305 sec/batch)
Line 63: Line 65:
 2017-03-17 14:​13:​50.697406:​ step 20, loss = 4.63 (771.2 examples/​sec;​ 0.166 sec/batch) 2017-03-17 14:​13:​50.697406:​ step 20, loss = 4.63 (771.2 examples/​sec;​ 0.166 sec/batch)
 2017-03-17 14:​13:​52.340482:​ step 30, loss = 4.60 (771.3 examples/​sec;​ 0.166 sec/batch) 2017-03-17 14:​13:​52.340482:​ step 30, loss = 4.60 (771.3 examples/​sec;​ 0.166 sec/batch)
-</​code>​ + 
-Running on dual gpus is only about 50% faster than dual cpus for this example. +models/tutorials/image/cifar10$ export CUDA_VISIBLE_DEVICES="​0"​ 
-<​code>​ +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py
-compute0805:​dchaffin:​/storage/dchaffin/​models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py+
 Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
-2017-03-17 15:22:28.162660: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties:​ + 
-name: Tesla K40c +2017-03-21 15:41:42.767510: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties:  
-major: 3 minor: ​memoryClockRate (GHz) 0.745 +name: Tesla K80 
-pciBusID 0000:81:00.0+major: 3 minor: 7 memoryClockRate (GHz) 0.8235 
 +pciBusID 0000:​04:​00.0 
 +Total memory: 11.17GiB 
 +Free memory: 11.11GiB 
 +2017-03-21 15:​41:​42.767552:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0  
 +2017-03-21 15:​41:​42.767559:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   Y  
 +2017-03-21 15:​41:​42.767571:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:​04:​00.0) 
 +2017-03-21 15:​42:​06.767827:​ step 0, loss = 4.68 (37.1 examples/​sec;​ 3.448 sec/​batch) 
 +2017-03-21 15:​42:​07.701603:​ step 10, loss = 4.67 (1370.8 examples/​sec;​ 0.093 sec/​batch) 
 +2017-03-21 15:​42:​08.750127:​ step 20, loss = 4.60 (1220.8 examples/​sec;​ 0.105 sec/​batch) 
 +2017-03-21 15:​42:​09.762612:​ step 30, loss = 4.61 (1264.2 examples/​sec;​ 0.101 sec/​batch) 
 +2017-03-21 15:​42:​10.769818:​ step 40, loss = 4.58 (1270.8 examples/​sec;​ 0.101 sec/​batch) 
 +2017-03-21 15:​42:​11.768493:​ step 50, loss = 4.53 (1281.7 examples/​sec;​ 0.100 sec/​batch) 
 +2017-03-21 15:​42:​12.769582:​ step 60, loss = 4.52 (1278.6 examples/​sec;​ 0.100 sec/​batch) 
 +2017-03-21 15:​42:​13.769733:​ step 70, loss = 4.54 (1279.8 examples/​sec;​ 0.100 sec/​batch) 
 + 
 +models/​tutorials/​image/​cifar10$ export CUDA_VISIBLE_DEVICES="​0,​1,​2,​3"​ 
 +models/​tutorials/​image/​cifar10$ python cifar10_multi_gpu_train.py 
 +2017-03-21 15:​43:​37.866128:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 0 with properties:  
 +name: Tesla K80 
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235 
 +pciBusID 0000:​04:​00.0 
 +Total memory: 11.17GiB 
 +Free memory: 11.11GiB 
 +2017-03-21 15:​43:​37.866246:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x364c2b0 
 +2017-03-21 15:​43:​38.104892:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 1 with properties:  
 +name: Tesla K80 
 +major: 3 minor: 7 memoryClockRate (GHz) 0.8235 
 +pciBusID 0000:​05:​00.0 
 +Total memory: 11.17GiB 
 +Free memory: 11.11GiB 
 +2017-03-21 15:​43:​38.104995:​ W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: 0x36500f0 
 +2017-03-21 15:​43:​38.349437:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device 2 with properties:  
 +name: Tesla K80 
 +major: 3 minor: ​memoryClockRate (GHz) 0.8235 
 +pciBusID 0000:84:00.0
 Total memory: 11.17GiB Total memory: 11.17GiB
-Free memory: 11.10GiB +Free memory: 11.11GiB 
-2017-03-17 15:22:28.162838: W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: ​0x3b3caf0 +2017-03-21 15:43:38.349535: W tensorflow/​stream_executor/​cuda/​cuda_driver.cc:​485] creating context when one is currently active; existing: ​0x3653f60 
-2017-03-17 15:22:28.423756: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device ​with properties:​ +2017-03-21 15:43:38.600657: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​887] Found device ​with properties:  
-name: Tesla K40c +name: Tesla K80 
-major: 3 minor: ​memoryClockRate (GHz) 0.745 +major: 3 minor: ​memoryClockRate (GHz) 0.8235 
-pciBusID 0000:82:00.0+pciBusID 0000:85:00.0
 Total memory: 11.17GiB Total memory: 11.17GiB
-Free memory: 11.10GiB +Free memory: 11.11GiB 
-2017-03-17 15:22:28.424610: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0 1 +2017-03-21 15:43:38.602403: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​908] DMA: 0 1 2 3  
-2017-03-17 15:22:28.424625: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   Y Y +2017-03-21 15:43:38.602412: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 0:   Y Y N N  
-2017-03-17 15:22:28.424631: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 1:   Y Y +2017-03-21 15:43:38.602418: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 1:   Y Y N N  
-2017-03-17 15:22:28.424668: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:81:00.0) +2017-03-21 15:43:38.602423: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 2:   N N Y Y  
-2017-03-17 15:22:28.424699: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0) +2017-03-21 15:​43:​38.602428:​ I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​918] 3:   N N Y Y  
-2017-03-17 15:22:47.016337step 0, loss = 4.67 (7.9 examples/sec; 16.237 sec/batch+2017-03-21 15:​43:​38.602445: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0) 
-2017-03-17 15:22:48.801909step 10, loss = 4.65 (1078.0 examples/sec; 0.119 sec/batch+2017-03-21 15:43:38.602453: I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:​977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0) 
-2017-03-17 15:22:49.943175: step 20, loss = 4.63 (1115.examples/​sec; ​0.115 sec/​batch) +2017-03-21 15:43:38.602459I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:977] Creating TensorFlow device ​(/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0
-2017-03-17 15:22:51.107477: step 30, loss = 4.60 (1092.examples/​sec;​ 0.117 sec/​batch) +2017-03-21 15:43:38.602464I tensorflow/​core/​common_runtime/​gpu/​gpu_device.cc:977] Creating TensorFlow device ​(/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0
-2017-03-17 15:22:52.271683: step 40, loss = 4.58 (1073.examples/​sec;​ 0.119 sec/​batch) +2017-03-21 15:43:54.086766: step 0, loss = 4.67 (47.examples/​sec; ​2.674 sec/​batch) 
-2017-03-17 15:22:53.426386: step 50, loss = 4.51 (1080.examples/​sec;​ 0.118 sec/​batch) +2017-03-21 15:43:55.013200: step 10, loss = 4.66 (1381.examples/​sec;​ 0.093 sec/​batch) 
-2017-03-17 15:22:54.601170: step 60, loss = 4.49 (1060.examples/​sec;​ 0.121 sec/​batch) +2017-03-21 15:43:56.011015: step 20, loss = 4.65 (1282.examples/​sec;​ 0.100 sec/​batch) 
-2017-03-17 15:22:55.759457: step 70, loss = 4.49 (1091.examples/​sec;​ 0.117 sec/​batch) +2017-03-21 15:43:56.967307: step 30, loss = 4.60 (1338.examples/​sec;​ 0.096 sec/​batch) 
-2017-03-17 15:22:57.005849: step 80, loss = 4.48 (1111.examples/​sec;​ 0.115 sec/batch)+2017-03-21 15:43:57.940303: step 40, loss = 4.57 (1315.examples/​sec;​ 0.097 sec/​batch) 
 +2017-03-21 15:43:58.902810: step 50, loss = 4.54 (1329.examples/​sec;​ 0.096 sec/​batch) 
 +2017-03-21 15:43:59.859618: step 60, loss = 4.48 (1337.examples/​sec;​ 0.096 sec/batch)
 </​code>​ </​code>​
  
tensorflow.txt · Last modified: 2017/03/21 20:55 by root