User Tools

Site Tools


equipment

Equipment/Selecting Resources

We describe the resources available at AHPCC and how to select the best one for your computing job. Computing resources are presently divided into four clusters that use separate schedulers. This will be condensed in the future, as all logins will be moved to pinnacle.uark.edu and all schedulers are migrated to Slurm, with one or multiple Slurm schedulers to be determined.

Pinnacle Cluster

The newest resource at AHPCC has 100 compute nodes, with 12 compute nodes on order. GPU and GPU-ready nodes are Dell R740, non-GPU nodes are Dell R640, which are different packaging but have the same software interface.

There are 49 public standard compute nodes with two Xeon Gold 6130 processors, 32 cores, and 192 GB of memory, which use the queues comp01/comp06/comp72. There are 7 public high-memory compute nodes with two Xeon Gold 6126 processors, 24 cores, and 768 GB of memory, which use the queues himem06/himem72. These have fewer cores of higher frequency since they are used largely for bioinformatics in which many codes are not efficiently threaded. There are 19 public GPU nodes, like standard compute nodes but with a single NVidia V100 Tesla GPU which use the queues gpu06/gpu72.

There are 25 condo nodes: 20 Wang, standard compute nodes with also NVMe local drives; one Alverson standard compute, one Alverson high-memory compute, one Kaman high-memory compute with two V100s, and two Bernhard with two AMD 7351 and 256 GB of memory. These use the queues condo for system owners and pcon06 for the public, with appropriate modifiers to select the right nodes, see queues. Please do not submit condo,pcon06 jobs without any modifiers, such jobs will be assigned randomly by the scheduler and will be killed.

Efficient and Best Use

Public pinnacle and public condo usage is in general reserved for jobs that can use at least one of 1) the entire complement of 32 or 24 cores per node, 2) the V100 GPUs, or 3) the 192 GB of memory (that is more than 64 GB of trestles) or 768 GB of memory (that is more than 192 GB of standard nodes) in standard or high-memory nodes, or 4) its 100 Gb/s Infiniband for multi-node computing. This excludes smaller jobs that can run adequately inside the core and memory footprint of single trestles or razor nodes, that is razor-1: 12 cores and 24 GB, razor-2: 16 cores and 32 GB, trestles: 32 cores and 64 GB.

In addition, the gpu queues and nodes are reserved for jobs that use the GPU (since the GPU costs almost as much as the node does, and filling the CPU makes the GPU unavailable). Also the large-memory himem queues and nodes are reserved for jobs that use more than 192 GB of shared memory, that is cannot run on the standard nodes. If you don't know the memory usage of your job, test jobs are allowable on himem queues; however production jobs that don't use more than 192 GB are not.

These efficient-use requirements do not apply to condo owners on their own nodes.

Trestles Cluster

trestles includes about 200 public nodes with four AMD 6136 8-core 2.4 GHz processes, 64 GB of memory, and about 90 GB of temporary space in '/local_scratch/'. This generation of AMD processors has about half the per-core compute capacity of recent Intel processors, so the trestles 32-core nodes are roughly equivalent in computational power per-node in scalable programs to razor-2 16-core nodes below. However the trestles nodes have twice the memory. The trestles cluster also includes a number of Intel condo nodes. The trestles nodes are connected with slow but reasonably reliable Mellanox QDR Infiniband.

Efficient and Best Use

trestles works well with scalable programs that fit into one node: 32 cores and 64 GB of memory. Multi-node programs on trestles will run better on pinnacle, which in one node runs about the same as two to five trestles nodes, and has a much better network for larger jobs. Single-core to 12-core jobs will run better on razor-1 or razor-2 if the memory footprint of 24 or 32 GB works, as Intel cores are much faster.

If your job fits into one node of 32 cores and 64 GB, please use trestles to allow pinnacle to run larger jobs.

Razor Clusters

The razor-1 cluster has about 100 nodes with dual Intel X5670 processor, 12 cores, and 24 GB of memory. The razor-2 cluster has about 80 nodes with dual Intel E5-2670v1 processor, 16 cores, and 32 GB of memory. The razor clusters have large local disks of about 900 GB capacity. They are connected with now troublesome QLogic QDR Infiniband.

Efficient and Best Use

The razor clusters also work well with programs that fit their core and memory footprint in one node, that is 12 cores and 24 GB on razor-1 and 16 cores and 32 GB on razor-2. The QLogic Infiniband on razor has become unreliable and multi-node jobs are not recommended. Jobs that require multiple nodes on razor are better run on pinnacle in one node or multiple nodes with EDR Infiniband. The razor clusters are also recommended for single core jobs. razor-1 is generally less crowded if the single-core job uses less than 24 GB of memory.

Overall Recommendations

We recommend the following clusters depending on the needs of your program and system load. These are rules of thumb not covering every possible situation, contact hpc-support@listserv.uark.edu with questions. Here “memory” refers to shared memory of one node.

  • GPU-capable
    • use pinnacle GPU queues
  • not GPU-capable
    • 1 to 12 cores and up to 24 GB memory: use razor-1
    • 1 to 16 cores and up to 32 GB memory: use razor-2
    • Up to 32 cores and up to 64 GB memory: use trestles, though low-core count jobs will be slow compared with Intel
    • more than 64 GB shared memory, or all 32 cores: use pinnacle standard comp01/comp06/comp72
    • more than 192 GB shared memory: use pinnacle himem06/himem72 or high memory razor/trestles nodes
    • more than 32 cores: use pinnacle multiple nodes standard comp01/comp06/comp72

Discretionary cases:

  • anything requiring two or more razor/trestles nodes: pinnacle standard comp01/comp06/comp72 will run much faster but probably start the job more slowly because of the job queue.
  • 1 node, 32 cores and less than 192 GB memory: use pinnacle standard or trestles if memory is less than 64 GB. pinnacle will run much faster but will probably start the job more slowly because of the job queue.
equipment.txt · Last modified: 2020/02/04 18:42 by root