===== Virtual Machines on Pinnacle =====
One of the new features introduced in the Pinnacle cluster is the ability to spin up virtual machines. Virtual machines allow users to:
* run operating systems versions other than that installed on the Pinnacle compute nodes
* root access to the virtual machine - ability to install any software
* mount the underlying Pinnacle lustre file system directly on the VM (/home/$USER and /scratch )
* suspend the virtual machine with it's current memory state (executing programs) and restart at a later time (or another job)
* remote desktop access to the VM
Because all virtual machines have to be created and started by root we developed a set of scripts executable by users with the following functionality:
vm-clone.sh - clone a VM template and save in user VM library
vm-delete.sh - delete a VM from the user VM library
vm-list.sh - list VM's in the user library
vm-get-ip.sh - retrieve the ip address of a currently running VM
vm-log.sh - show the startup log of a currently running VM
vm-bootup-info.sh -
=== User Virtual Machine Library ===
The user VM library is stored at **/storage/$USER/.virtual-machines**. This directory is owned by root and can only be modified by user through the vm-*.sh scripts. Each VM is defined by a set of 2 files in that directory:
.qcow2 - the hard drive file of the VM
.xml - the XML definition file of the VM
The **vm-list.sh** script will list the contents of the library:
pinnacle-l1:pwolinsk:~$ vm-list.sh
pwolinsk's VMS STATE VM IP HOST
======================================================================
centos7.6-desktop-pwolinsk SHUT OFF
centos7.6-lustre-pwolinsk SHUT OFF
centos7.6-lustre-pwolinsk-1 SHUT OFF
library-pwolinsk SHUT OFF
ubuntu-18.04-desktop-pwolinsk SHUT OFF
ubuntu-18.04-lustre-pwolinsk RUNNING 172.16.254.127 c1329
Virtual machines are stored in /storage/pwolinsk/.virtual-machines.
Total storage on disk: 35G total
pinnacle-l1:pwolinsk:~$
=== Creating a new VM ===
VMs are created from VM templates using the **vm-clone.sh** script. To get a listing of available templates run vm-clone.sh without any arguments:
pinnacle-l1:pwolinsk:~$ vm-clone.sh
Usage: vm-clone.sh
where is one of:
tmpl-centos7.6
tmpl-centos7.6-desktop
tmpl-centos7.6-lustre
tmpl-ubuntu-18.04
tmpl-ubuntu-18.04-desktop
tmpl-ubuntu-18.04-lustre
pinnacle-l1:pwolinsk:~$
Currently we have a total of 6 templates, using two different operating systems Centos 7.6 and Ubuntu 18.04. For each OS we have 3 different options: Base level packages without desktop or Lustre support:
* tmpl-centos7.6, tmpl-ubuntu-18.04 - full root access
* tmpl-centos7.6-desktop, tmpl-ubuntu-18.04-dekstop - full root access, desktop suppport
* tmpl-centos7.6-lustre, tmpl-ubuntu-18.04-lustre - no root access, local file system mounted
Additional VM templates will be added on request.
pinnacle-l1:pwolinsk:~$ vm-clone.sh tmpl-ubuntu-18.04
Cloning tmpl-ubuntu-18.04 for pwolinsk as ubuntu-18.04-pwolinsk.....
Found tmpl-ubuntu-18.04 defined on c1329. Cloning....
Allocating 'ubuntu-18.04-pwolinsk.qcow2' | 10 GB 00:00:07
Clone 'ubuntu-18.04-pwolinsk' created successfully.
pinnacle-l1:pwolinsk:~$
=== Starting the VM ===
A special job queue named **cloud72** has been set up on Pinnacle to run all user VM jobs. The name of the VM to be started has to be specified in the name of the job script. The VM is started in the job prolog and destroyed in the job epilog.
You can start the VM by starting a job on a node in **cloud72** queue. Because the actual job launch command has multiple flags, we created a script **vm-job-lauch.sh** which takes 3 arguments: and . In this example we are using 4 cores for the VM and specifying 1 hour run time:
pinnacle-l1:pwolinsk:~$ vm-job-launch.sh ubuntu-18.04-pwolinsk 4 1
Submitting job to the queue with command:
sbatch -N1 -n4 -p cloud72 -C cloud -t 1:00:00 -J ubuntu-18.04-pwolinsk waitforvm.sh ubuntu-18.04-pwolinsk
Submitted batch job 91063
Found job #91063
Waiting for log file /scrfs/storage/pwolinsk/home/cloud-91063.log ................
--------/scrfs/storage/pwolinsk/home/cloud-91063.log-----------------------------------------
Starting ubuntu-18.04-pwolinsk for pwolinsk
Redirecting console output to /scrfs/storage/pwolinsk/home/console-91063.log.
Domain ubuntu-18.04-pwolinsk created from /scrfs/storage/pwolinsk/home/vmdef-91063.xml
ubuntu-18.04-pwolinsk booting up.......IP assigned 172.16.254.149 ... Waiting for SSH access ...done.
ubuntu-18.04-pwolinsk is Ready. "ssh ubuntu@172.16.254.149" password: ubuntu
Starting waitforvm script on host c1331
pinnacle-l1:pwolinsk:~$
To log into the VM:
pinnacle-l1:pwolinsk:~$ ssh ubuntu@172.16.254.149
Warning: Permanently added '172.16.254.149' (ECDSA) to the list of known hosts.
ubuntu@172.16.254.149's password:
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-48-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
Last login: Wed Oct 2 14:58:31 2019
ubuntu@vm-ubuntu-18:~$ sudo /bin/bash
[sudo] password for ubuntu:
root@vm-ubuntu-18:~# cat /proc/cpuinfo |grep processor
processor : 0
processor : 1
processor : 2
processor : 3
root@vm-ubuntu-18:~#
=== Stopping the VM ===
The **vm-job-launch.sh** will continue to monitor the VM state. When it detects that the VM has been shut down, the job is automatically terminated. So simply logging out of the VM will not end the VM job. It will continue to run until the walltime of the job expires.
root@vm-ubuntu-18:~# exit
exit
ubuntu@vm-ubuntu-18:~$ exit
logout
Connection to 172.16.254.149 closed.
pinnacle-l1:pwolinsk:~$
pinnacle-l1:pwolinsk:~$ vm-list.sh
pwolinsk's VMS (Pinnacle Cluster) STATE VM IP HOST
================================================================================
centos6.10-pwolinsk SHUT OFF
centos7.6-desktop-pwolinsk-1 SHUT OFF
centos7.6-lustre-pwolinsk SHUT OFF
centos7.6-pwolinsk SHUT OFF
centos7.6-pwolinsk-1 SHUT OFF
centos7.6-pwolinsk-2 SHUT OFF
library-pwolinsk SHUT OFF
pqs-devel SHUT OFF
ubuntu-18.04-desktop-pwolinsk SHUT OFF
ubuntu-18.04-pwolinsk RUNNING 172.16.254.149 c1331 (52:54:00:59:ae:74)
Virtual machines are stored in /storage/pwolinsk/.virtual-machines.
Total storage on disk: 42G total
pinnacle-l1:pwolinsk:~$ squeue -u pwolinsk
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
91063 cloud72 ubuntu-1 pwolinsk R 10:19 1 c1331
pinnacle-l1:pwolinsk:~$
To end the job before the job walltime limit, simply log into the VM and shut it down:
pinnacle-l1:pwolinsk:~$ ssh ubuntu@172.16.254.149
ubuntu@172.16.254.149's password:
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-48-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
Last login: Fri May 22 09:37:50 2020 from 172.16.16.51
ubuntu@vm-ubuntu-18:~$ sudo /bin/bash
[sudo] password for ubuntu:
root@vm-ubuntu-18:~# shutdown -h now
Connection to 172.16.254.149 closed by remote host.
Connection to 172.16.254.149 closed.
pinnacle-l1:pwolinsk:~$ vm-list.sh
pwolinsk's VMS (Pinnacle Cluster) STATE VM IP HOST
================================================================================
centos6.10-pwolinsk SHUT OFF
centos7.6-desktop-pwolinsk-1 SHUT OFF
centos7.6-lustre-pwolinsk SHUT OFF
centos7.6-pwolinsk SHUT OFF
centos7.6-pwolinsk-1 SHUT OFF
centos7.6-pwolinsk-2 SHUT OFF
library-pwolinsk SHUT OFF
pqs-devel SHUT OFF
ubuntu-18.04-desktop-pwolinsk SHUT OFF
ubuntu-18.04-pwolinsk SHUT OFF
Virtual machines are stored in /storage/pwolinsk/.virtual-machines.
Total storage on disk: 42G total
pinnacle-l1:pwolinsk:~$ squeue -u pwolinsk
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
pinnacle-l1:pwolinsk:~$
The job is terminated as soon as the VM stops. The **cloud72** queue has a limit of 72 hours, so VM jobs are treated just like any other job in the queue. They cannot run indefinitely.