Cluster reopening 05/20/26

See hpcportal1_login,ssh_2fa for REQUIRED STEPS to use two-factor identification to login with ssh. All passwords are invalidated and replaced with Cirrus login and ssh keys to be regenerated from the OOD portal.

Non-UA-Fayetteville or UA-system users may be delayed for a day or two, initially UAF VPN and UA system login is requred.

Slurm environment

With login nodes now all Rocky 9 and compute nodes split either Rocky 9 and Centos 7, slurm jobs may need some modification. We recommend that you do not load modules or conda init or venvs in .bashrc on the login nodes, and do an srun cloud job to test functionality instead of doing that on the login node.

Slurm default is to export its environment ($PATH,$LDLIBRARYPATH) to submitted jobs (the default –export=ALL). Modules and conda loaded on a Rocky 9 login node may not work for a batch job running on a Centos 7 compute node. sbatch –export=NONE [sbatch script] will avoid that by not exporting the login environment to the batch/srun job. We always recommend to set the module/conda/python environment in the batch job itself, especially if you run different jobs with different setups. When we finish moving all the compute nodes and everything runs Rocky 9, the effects will be lessened.

All accounts should be able to use the Rocky9 cluster. If you want to run a Rocky 9 job, start a “Magazine desktop” from OOD, or ssh from pinnacle-l[1-3] to pinnacle-l12 to submit to the separate slurm instance that runs Rocky 9.

There are minor changes in the slurm partitions/queues to accommodate newer hardware. Hours are no longer encoded in the partition names, but number of cores are when relevant so that jobs can be submitted to a computer with the optimum core count. Old [comp|himem|cloud][01|06|72] partitions become New comp32 [32 cores] or comp64 [64 cores] or cloud or himem. Old [gpu|agpu|qgpu][01|06|72] become New [vgpu for v100|agpu|qgpu]. qos are unchanged [comp|cloud|gpu|himem].