===== MXNet ===== MXNet is a deep learning library with support for Python, R, C++, Scala, Julia, Matlab and javascript. It can run on cpus, gpus and well as multiple cluster nodes. Please see the project website for detailed infromation: [[https://mxnet.readthedocs.io/en/latest/]] On Razor, MXNet is compiled with GPU support as a python package and installed in ''/share/apps/python/2.7.11/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/''. It requires the following modules: gcc-4.9.1, python-2.7.11, opencv-2.4.13, mkl-16.0.1, and cuda-7.5. As an example, below we will run an MXNet training session for MNIST (Mixed National Institute of Standards) hand written digit image database. Because of the precompiled GPU support, the example below requires a node with GPU(s) and cuda drivers installed. GPU nodes are available in the following queues: * gpu8core * gpu16core * qcondo (with gpu property, i.e. ''qsub -q qcondo -l nodes=1:k80gpu'') razor-l1:pwolinsk:$ qsub -I -q qcondo -l nodes=1:k80gpu qsub: waiting for job 1758962.sched to start qsub: job 1758962.sched ready PBS_NODEFILE=/var/spool/torque/aux/1758962.sched PBS_NUM_NODES=1 PBS_PPN=neednodes=1:k80gpu PBS_PPA=24 Currently Loaded Modulefiles: 1) os/el6 compute3133:pwolinsk:$ compute3133:pwolinsk:$ module load gcc/4.9.1 python/2.7.11 mkl/16.0.1 opencv/2.4.13 cuda/7.5 compute3133:pwolinsk:$ cp -r /share/apps/python/2.7.11/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/example/image-classification/ . compute3133:pwolinsk:$ cd image-classification/ compute3133:pwolinsk:/image-classification$ python train_mnist.py --gpus 0 2016-09-02 12:39:07,821 Node[0] start with arguments Namespace(batch_size=128, data_dir='mnist/', gpus='0', kv_store='local', load_epoch=None, lr=0.1, lr_factor=1, lr_factor_epoch=1, model_prefix=None, network='mlp', num_epochs=10, num_examples=60000, save_model_prefix=None) [12:39:08] src/io/iter_mnist.cc:91: MNISTIter: load 60000 images, shuffle=1, shape=(128,784) [12:39:09] src/io/iter_mnist.cc:91: MNISTIter: load 10000 images, shuffle=1, shape=(128,784) 2016-09-02 12:39:09,292 Node[0] Start training with [gpu(0)] 2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-accuracy=0.686719 2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-top_k_accuracy_5=0.936562 2016-09-02 12:39:22,386 Node[0] Epoch[0] Batch [50] Speed: 54211.10 samples/sec Train-top_k_accuracy_10=1.000000 ... 2016-09-02 12:39:33,259 Node[0] Epoch[9] Batch [450] Speed: 62463.31 samples/sec Train-top_k_accuracy_10=1.000000 2016-09-02 12:39:33,259 Node[0] Epoch[9] Batch [450] Speed: 62463.31 samples/sec Train-top_k_accuracy_20=1.000000 2016-09-02 12:39:33,296 Node[0] Epoch[9] Resetting Data Iterator 2016-09-02 12:39:33,296 Node[0] Epoch[9] Time cost=0.962 2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-accuracy=0.973858 2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_5=0.999399 2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_10=1.000000 2016-09-02 12:39:33,404 Node[0] Epoch[9] Validation-top_k_accuracy_20=1.000000 compute3133:pwolinsk:/image-classification$ These commands launched a training session using a single GPU. Nvidia driver installation provides a tool ''nvidia-smi'' which monitors the use of available gpus on the node. compute3133:pwolinsk:/image-classification$ nvidia-smi Fri Sep 2 12:41:34 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.35 Driver Version: 367.35 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:04:00.0 Off | 0 | | N/A 37C P0 56W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:05:00.0 Off | 0 | | N/A 31C P0 72W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 0000:84:00.0 Off | 0 | | N/A 31C P0 56W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 0000:85:00.0 Off | 0 | | N/A 40C P0 72W / 149W | 0MiB / 11439MiB | 98% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ The output above shows that there are four Tesla K80 cards available on the system (in reality there are only 2). To launch the training model with multiple GPU cards: compute3133:pwolinsk:/image-classification$ python train_mnist.py --gpus 0,1,2,3