User Tools

Site Tools


queueing_system

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
queueing_system [2016/03/07 20:32]
pwolinsk
queueing_system [2017/04/19 15:31] (current)
pwolinsk
Line 1: Line 1:
 ===== Queueing System ===== ===== Queueing System =====
 +All jobs on AHPCC clusters which require a significant amount of CPU or memory should be submitted through the queueing system. ​ In general, two types of jobs may be passed into the queue:  ​
  
-[[queues|Queues]]+  * A //**batch job**// - a specific command is executed on the node(s) assigned to the job without the need for user interaction. ​ A vast majority of jobs ran on the HPC clusters are batch jobs. 
 +  * An //​**interactive job**// - a login shell is started on the first node assigned to the job.  The user, in turn, specifies the commands to execute at the command prompt.
  
-[[batch|Batch Jobs]+A //**compute node**// is an individual computer which can be used to execute jobs.  Compute nodes are grouped into //​**queues**//​. ​ All nodes assigned to a particular queue are identical. ​ The queues differ from each other by the following factors: 
 + 
 +  * type of cpu and number of cores on each node 
 +  * number of nodes assigned 
 +  * the maximum number of nodes allowed to be used by a single job 
 +  * amount of memory 
 +  * walltime - the maximum amount of execution time for a single job   
 + 
 +=== Node to Queue Assignment === 
 +All compute nodes are divided into groups called partitions. ​ A node can only belong to one partition. ​ A queue is made up of a collection of partitions. ​ A given partition can be assigned to multiple queues. ​ As a result most nodes are not exclusively assigned to a single queue, but are shared between multiple queues. ​ This configuration improves queue flexibility,​ but conceptually complicates the view of the queueing system for the user, i.e. makes it difficult to predict how many free nodes are there for a given queue. ​ To help to determine the number of available nodes per queue, a script //​**max_job_size**//​ is available:​ 
 + 
 +<​code>​ 
 +tres-l1:​pwolinsk:​$ max_job_size  
 +Maximum jobs size in number of nodes for immediate start per queue: 
 + 
 +        q30m32c: ​ 26 nodes     (max in partition: 26  queue cap:  64) 
 +        q06h32c: ​ 26 nodes     (max in partition: 26  queue cap:  64) 
 +        q72h32c: ​  8 nodes     (max in partition: ​ 8  queue cap:  32) 
 +      qcDouglas: ​  0 nodes     (max in partition: ​ 0  queue cap:   1) 
 +          qcABI: ​  0 nodes     (max in partition: ​ 0  queue cap:   1) 
 +         ​qcondo: ​  0 nodes     (max in partition: ​ 0  queue cap:   1) 
 +      qtraining: ​  2 nodes     (max in partition: ​ 2  queue cap:  64) 
 +tres-l1:​pwolinsk:​$  
 +</​code>​ 
 + 
 +The output of the script above shows that a job requesting up to 26 nodes in the queue //​**q06h32c**//​ should start immediately. 
 + 
 +[[queues|Queues]] - summary of public queues 
 + 
 +[[batch|Batch Jobs]]
  
 [[interactive|Interactive Jobs]] [[interactive|Interactive Jobs]]
 +
 +[[condo queues|Condo Queues]]
 +
 +[[walltime extensions|Job Walltime Extensions]]
queueing_system.1457382773.txt.gz · Last modified: 2016/03/07 20:32 by pwolinsk