User Tools

Site Tools


sub_node_jobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
sub_node_jobs [2024/03/01 17:23]
root
sub_node_jobs [2024/03/01 18:13] (current)
root
Line 14: Line 14:
   - the multiple sub-jobs should be relatively uniform in workload so that one doesn't take 10 times as long as the others, which would end up a long period of 1 core taking a full compute node, which is what we are trying to avoid here   - the multiple sub-jobs should be relatively uniform in workload so that one doesn't take 10 times as long as the others, which would end up a long period of 1 core taking a full compute node, which is what we are trying to avoid here
   - the ``wait`` at the end of the submit script is necessary to keep from Slurm from prematurely ending, as the backgrounded jobs will return the console to Slurm and it will think the overall job is done and log out and either kill the background jobs or leave zombies running, either of which is bad.   - the ``wait`` at the end of the submit script is necessary to keep from Slurm from prematurely ending, as the backgrounded jobs will return the console to Slurm and it will think the overall job is done and log out and either kill the background jobs or leave zombies running, either of which is bad.
 +  - in this example, the 8 jobs running the same script with different parameters (1-8) simulate running 8 different sub-jobs with different data in the same directory. If your process to run always has the same input and output files, the sub-jobs will need to be in different directories, which should be handled by the script here called ``bench.sh``.  But the 8 ``at now`` statements could also be 8 totally different scripts, though it would be hard in that case to ensure that they take about the same time to run.
  
 <code> <code>
sub_node_jobs.txt ยท Last modified: 2024/03/01 18:13 by root