===== MXNet =====
MXNet is a deep learning library with support for Python, R, C++, Scala, Julia, Matlab and javascript. It can run on cpus, gpus and well as multiple cluster nodes. Please see the project website for detailed infromation:
[[https://mxnet.readthedocs.io/en/latest/]]
On Razor, MXNet is compiled with GPU support as a python package and installed in ''/share/apps/python/2.7.11/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/''. It requires the following modules: gcc-4.9.1, python-2.7.11, opencv-2.4.13, mkl-16.0.1, and cuda-7.5.
As an example, below we will run an MXNet training session for MNIST (Mixed National Institute of Standards) hand written digit image database. Because of the precompiled GPU support, the example below requires a node with GPU(s) and cuda drivers installed. GPU nodes are available in the following queues:
* gpu8core
* gpu16core
* qcondo (with gpu property, i.e. ''qsub -q qcondo -l nodes=1:k80gpu'')
razor-l1:pwolinsk:$ qsub -I -q qcondo -l nodes=1:k80gpu
qsub: waiting for job 1758962.sched to start
qsub: job 1758962.sched ready
PBS_NODEFILE=/var/spool/torque/aux/1758962.sched PBS_NUM_NODES=1 PBS_PPN=neednodes=1:k80gpu PBS_PPA=24
Currently Loaded Modulefiles:
1) os/el6
compute3133:pwolinsk:$
compute3133:pwolinsk:$ module load gcc/4.9.1 python/2.7.11 mkl/16.0.1 opencv/2.4.13 cuda/7.5
compute3133:pwolinsk:$ cp -r /share/apps/python/2.7.11/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/example/image-classification/ .
compute3133:pwolinsk:$ cd image-classification/
compute3133:pwolinsk:/image-classification$ python train_mnist.py --gpus 0
2016-09-02 12:39:07,821 Node[0] start with arguments Namespace(batch_size=128, data_dir='mnist/', gpus='0', kv_store='local', load_epoch=None, lr=0.1, lr_factor=1, lr_factor_epoch=1, model_prefix=None, network='mlp', num_epochs=10, num_examples=60000, save_model_prefix=None)
[12:39:08] src/io/iter_mnist.cc:91: MNISTIter: load 60000 images, shuffle=1, shape=(128,784)
[12:39:09] src/io/iter_mnist.cc:91: MNISTIter: load 10000 images, shuffle=1, shape=(128,784)
2016-09-02 12:39:09,292 Node[0] Start training with [gpu(0)]
2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-accuracy=0.686719
2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-top_k_accuracy_5=0.936562
2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-top_k_accuracy_10=1.000000
...
2016-09-02 12:39:33,259 Node[0] Epoch[9] Batch [450] Speed: 62463.31 samples/sec Train-top_k_accuracy_10=1.000000
2016-09-02 12:39:33,259 Node[0] Epoch[9] Batch [450] Speed: 62463.31 samples/sec Train-top_k_accuracy_20=1.000000
2016-09-02 12:39:33,296 Node[0] Epoch[9] Resetting Data Iterator
2016-09-02 12:39:33,296 Node[0] Epoch[9] Time cost=0.962
2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-accuracy=0.973858
2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_5=0.999399
2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_10=1.000000
2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_20=1.000000
compute3133:pwolinsk:/image-classification$
These commands launched a training session using a single GPU. Nvidia driver installation provides a tool ''nvidia-smi'' which monitors the use of available gpus on the node.
compute3133:pwolinsk:/image-classification$ nvidia-smi
Fri Sep 2 12:41:34 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:04:00.0 Off | 0 |
| N/A 37C P0 56W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 0000:05:00.0 Off | 0 |
| N/A 31C P0 72W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 31C P0 56W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 0000:85:00.0 Off | 0 |
| N/A 40C P0 72W / 149W | 0MiB / 11439MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The output above shows that there are four Tesla K80 cards available on the system (in reality there are only 2). To launch the training model with multiple GPU cards:
compute3133:pwolinsk:/image-classification$ python train_mnist.py --gpus 0,1,2,3