All jobs on AHPCC clusters which require a significant amount of CPU or memory should be submitted through the queueing system. In general, two types of jobs may be passed into the queue:
A compute node is an individual computer which can be used to execute jobs. Compute nodes are grouped into queues. All nodes assigned to a particular queue are identical. The queues differ from each other by the following factors:
All compute nodes are divided into groups called partitions. A node can only belong to one partition. A queue is made up of a collection of partitions. A given partition can be assigned to multiple queues. As a result most nodes are not exclusively assigned to a single queue, but are shared between multiple queues. This configuration improves queue flexibility, but conceptually complicates the view of the queueing system for the user, i.e. makes it difficult to predict how many free nodes are there for a given queue. To help to determine the number of available nodes per queue, a script max_job_size is available:
tres-l1:pwolinsk:$ max_job_size Maximum jobs size in number of nodes for immediate start per queue: q30m32c: 26 nodes (max in partition: 26 queue cap: 64) q06h32c: 26 nodes (max in partition: 26 queue cap: 64) q72h32c: 8 nodes (max in partition: 8 queue cap: 32) qcDouglas: 0 nodes (max in partition: 0 queue cap: 1) qcABI: 0 nodes (max in partition: 0 queue cap: 1) qcondo: 0 nodes (max in partition: 0 queue cap: 1) qtraining: 2 nodes (max in partition: 2 queue cap: 64) tres-l1:pwolinsk:$
The output of the script above shows that a job requesting up to 26 nodes in the queue q06h32c should start immediately.
Queues - summary of public queues