User Tools

Site Tools


trestles_usage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
trestles_usage [2022/02/02 17:26]
jpummil removed
— (current)
Line 1: Line 1:
-===== How to Use the Trestles Cluster ===== 
-This is a brief "how to" summary of Trestles for users familiar with the Razor cluster. 
  
-== Equipment == 
- 
-Trestles has about 240 usable identical nodes, each with four AMD 6136 8-core 2.4 GHz processors.  Each node has 64GB of memory and a flash drive with about 90GB usable for temporary space in ''/local_scratch/$USER/'', much less local storage than Razor nodes, but it's faster.  Nodes are interconnected with Mellanox QDR Infiniband.  Each 32-core node has about the same computational power as the 16-core Intel nodes in the Razor cluster, so an AMD core is about half the power of an Intel core for these models. Because Trestles has more cores which are less powerful, it is most suitable for highly scalable codes, that is codes with good parallel performance, and also for codes that require more than the 24 to 32 GB of memory on most Razor nodes.  There is no serial queue on Trestles; use Razor's serial queue.  If you have a serial job that needs more than 32GB of memory, you can use a full Trestles node (nodes=1:ppn=32), though a Razor 96GB node would be faster. 
- 
-== Usage == 
- 
-Login node is ''trestles.uark.edu'' which is a load balancer to identical login nodes with local names ''tres-l1'' and ''tres-l2'' You can also access from Razor by ''ssh tbridge ; ssh tres-l2''. 
- 
-Initially there are three queues with maximum runtimes of 30 minutes, 6 hour, and 72 hours: ''q30m32c'', ''q06h32c'', ''q72h32c''. All nodes are identical with a few reserved for shorter jobs. Only whole node access (''nodes=N:ppn=32'') is supported initially.  Serial (single-core) jobs should be run on the Razor cluster unless you specifically need the 64GB memory capacity of Trestles. In that case allocate the whole node with ppn=32. 
- 
-The user environment is as similar as possible to the Razor cluster.  Most codes will need to be recompiled because of (a) the different Infiniband network may affect the low-level links for some MPI versions, and (2) anything compiled on the Intel compilers with processor vectorization greater than ''SSE2'' won't run on an AMD processor. When recompiling, the compiler modules will handle the low-level links, and Intel compiler  
-''-xSSE2'' will be the best that will run for compiler vectorization on Trestles.  
- 
-== File Systems == 
- 
-Parallel file systems are Lustre ''/storage/$USER'' and ''/scratch'' Your home area is located at ''/storage/$USER/home'' and is symlinked to ''/home/$USER''. For most applications the home will be transparent vs. the slightly different "autohome" setup of the Razor cluster.  There are also additional reserved condo storage areas ''/storaged'' (Douglas group) and ''/storageb'' (Bellaiche group, and ''/storage2'' (Track II). 
- 
-**UPDATE** Trestles ''/scratch'' and ''/local_scratch'' are very small, and there will be no directories corresponding to your userid.  Existing directories ''/scratch/$USER'' with data will be wiped Thursday 3/17/16. There will be only per-job directories, created by the job prologue script, that will expire 14 days after each job ends. In your batch scripts you can use the defined environment variables below, or reconstruct the names of the already created scratch areas at runtime as ''**/scratch/$PBS_JOBID**''  and ''**/local_scratch/$PBS_JOBID**'' on the head compute node. Here is an example for a particular job. 
- 
-Trestles ''/storage'' quota is 900 GiB soft/1000 GiB hard.  Condo storage partitions ''/storage[x]'' don't have quotas. Unlike razor, your trestles ''/home'' area is part of ''/storage'' (''/home/username'' is a symlink to ''/storage/username/home'') and does not have its own quota.  Future backup schemes that may pull from ''/home'' may limit that size. At this time **Nothing** on trestles is backed up. 
-<code> 
-cd $PBS_O_WORKDIR 
-cp *.in *UPF /scratch/$PBS_JOBID 
-cd /scratch/$PBS_JOBID 
-</code> 
- 
-To use this effectively, you need to know what your job's input files are, and which output files you need to keep.  A [[quantum_espresso]] job might look like: 
-<code> 
-#PBS -N espresso 
-#PBS -j oe 
-#PBS -m ae 
-#PBS -o zzz.$PBS_JOBID 
-#PBS -l nodes=4:ppn=32,walltime=6:00:00 
-#PBS -q q06h32c 
-cd $PBS_O_WORKDIR 
-cp *.in *UPF /scratch/$PBS_JOBID 
-cd /scratch/$PBS_JOBID 
-NP=$(wc -l < $PBS_NODEFILE) 
-module load intel/14.0.3 mkl/14.0.3 openmpi/1.8.8 
-mpirun -np $NP -machinefile $PBS_NODEFILE -x LD_LIBRARY_PATH \ 
-/share/apps/espresso/espresso-5.1-intel-openmpi/bin/pw.x -npools 1 <ausurf.in >ausurf.log 
-mv ausurf.log *mix* *wfc* *igk* $PBS_O_WORKDIR/ 
-</code> 
- 
-We expect to port these ''/scratch'' changes back to Razor to help with the space crisis there. 
- 
-== File Transfer to/from Razor == 
- 
-**UPDATE** ''tbridge/rbridge'' renamed ''bridge'' from both sides. 
- 
-There is an interface node ''bridge'' to transfer files between Razor and Trestles. The interface node is only to move data; it can't submit jobs. Please login to the interface node and use ''cp/mv'' to move files between parallel file systems over Infiniband instead of using ''scp'', which will send files over the slower ethernet network. You may also use ''rsync'', which, on a single node, defaults to using ''cp'' On the bridge node, Trestles file systems are mounted at ''/storage'',''/scratch'',and ''/home'', and Razor file systems are mounted at ''/razor/storage'',''/razor/scratch'', and ''/razor/home'' Trestles file systems are also mounted at ''/trestles/storage'' and so on. To copy ''/storage/$USER/mydir'' from Razor to Trestles, starting on Razor: 
- 
-<code> 
-ssh bridge 
-cd /razor/storage/$USER 
-rsync -av mydir /storage/$USER/ 
-</code> 
- 
-== File Transfer to/from World == 
- 
-The Trestles network doesn't yet have a file transfer node to the world, it's on the list to do. Until then, please use Razor ''tgv'' to ''bridge'' Please avoid sending huge files to the login nodes on either Trestles or Razor. 
- 
-For more information about moving files please visit: [[moving_data|Data Transfer to and from AHPCC Clusters]] 
- 
-== Queues == 
- 
-Public queues start with q##. q06h is the shared instance of unused private condo nodes.  Please note that shared jobs that request memory far in excess of their requirements may be terminated. 
-<code>     
-Queue     Time Limit   Cores(ppn=)  Nodes 
-q10m32c     10 min        32        trestles  formerly qtraining 
-q30m32c     30 min        32        trestles 
-q06h32c      6 hr         32        trestles 
-q72h32c     72 hr         32        trestles 
-q06h         6 hr        16-48     shared condo,select by ppn and memory properties below 
-</code> 
-Condo nodes are selected in queue qcondo by PBS node properties. Only a sufficient property is required (i.e. m256gb is unique over the currently installed nodes). Nodes with Intel Broadwell E5-26xx v4 cpus have the property "v4". 
-<code> 
-Node                Cores(ppn=)  Number  Properties 
-ABI 3072 GB           48            1     abi:m3072gb 
-Bellaiche 64 GB       16            3     laurent:v4:m64gb 
-Douglas 768 GB        32            1     douglas:m768gb 
-Douglas 256 GB        16            3     douglas:v4:m256gb 
-</code> 
-**Public queues**: examples in single-line interactive form 
-<code> 
-#shared 6-hour 
-$ qsub -I -q q06h32c -l nodes=1:ppn=12 -l walltime=6:00:00 
-#shared 72-hour 
-$ qsub -I -q q72h32c -l nodes=1:ppn=12 -l walltime=72:00:00 
-</code> 
-**Condo queues**: examples in single-line interactive form 
-<code> 
-#condo 256gb 
-$ qsub -I -q qcondo -l nodes=3:ppn=16:m256gb -l walltime=8:00:00 
-#condo 768gb 
-$ qsub -I -q qcondo -l nodes=1:ppn=32:m768gb -l walltime=8:00:00 
-#condo 768gb, equivalent 
-$ qsub -I -q qcondo -l nodes=1:ppn=32:douglas -l walltime=8:00:00 
-#condo 64gb 
-$ qsub -I -q qcondo -l nodes=1:ppn=16:m64gb -l walltime=10:00:00 
-#shared 3072gb 
-$ qsub -I -q q06h -l nodes=1:ppn=48:m3072gb -l walltime=6:00:00 
-</code> 
trestles_usage.1643822769.txt.gz ยท Last modified: 2022/02/02 17:26 by jpummil