User Tools

Site Tools


scratch_output

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
scratch_output [2017/06/28 21:22]
root
scratch_output [2017/06/29 15:53] (current)
root
Line 9: Line 9:
 There are two temporary file systems on each cluster: (1) ''/​scratch/'',​ a parallel networked file system, GPFS or Lustre, and (2) ''/​local_scratch/'',​ a local disk array on each compute node.  The parallel file system ''/​scratch/''​ is better for (1) MPI parallel output (rare even with MPI programs), (2) very large output files that are larger that are larger than the local disk array (see example below), (3) depending on the compute node and system state, the parallel file system should have a higher bandwidth than the local disk.  The local array ''/​local_scratch''​ is better for (1) very small writes/​reads at high rates, since the latency of each read is smaller than the networked system, or (2) any write/read when the parallel file system is overloaded. There are two temporary file systems on each cluster: (1) ''/​scratch/'',​ a parallel networked file system, GPFS or Lustre, and (2) ''/​local_scratch/'',​ a local disk array on each compute node.  The parallel file system ''/​scratch/''​ is better for (1) MPI parallel output (rare even with MPI programs), (2) very large output files that are larger that are larger than the local disk array (see example below), (3) depending on the compute node and system state, the parallel file system should have a higher bandwidth than the local disk.  The local array ''/​local_scratch''​ is better for (1) very small writes/​reads at high rates, since the latency of each read is smaller than the networked system, or (2) any write/read when the parallel file system is overloaded.
  
-NOTE: As of summer 2017 the (now old, pending upgrade) parallel file systems are chronically overloaded, so we recommend **using ''/​local_scratch/''​ whenever feasible**. ​ Exception: on Razor-1 12-core nodes, the local disks are very slow (10's of MB/​s). ​ Standard Trestles node local disks are both moderately slow (~100 MB/s) and small (~100 MB available size).+NOTE: As of summer 2017 the (now old, pending upgrade) parallel file systems are chronically overloaded, so we recommend **using ''/​local_scratch/''​ whenever feasible**. ​ Exception: on Razor-1 12-core nodes, the local disks are very slow (10's of MB/​s). ​ Standard Trestles node local disks are both moderately slow (~100 MB/s) and moderately ​small (~90 GB available size).
  
 PENDING UPGRADE: We expect a complete overhaul of storage by late summer 2017.  A number of changes will be made, primarily for storage: (1) razor and trestles clusters will be combined with common storage, (2) there will be no user directories on ''/​scratch/''​ or ''/​local_scratch/'',​ only per-job directories. ​ Per-job scratch directories are already being created for each job but are not yet required. These directories for job ''​532889.torque''​ will be '/​scratch/​532889.torque/'​ and, on the first compute node, '/​local_scratch/​532889.torque/'​. To facilitate data recovery, the directories will be retained for 10 days after the end of the job, unless they fill a significant fraction of the disk, in which case they may be purged after as little as 1 day. We recommend purging the temporary directory at the end of the job, see example below. PENDING UPGRADE: We expect a complete overhaul of storage by late summer 2017.  A number of changes will be made, primarily for storage: (1) razor and trestles clusters will be combined with common storage, (2) there will be no user directories on ''/​scratch/''​ or ''/​local_scratch/'',​ only per-job directories. ​ Per-job scratch directories are already being created for each job but are not yet required. These directories for job ''​532889.torque''​ will be '/​scratch/​532889.torque/'​ and, on the first compute node, '/​local_scratch/​532889.torque/'​. To facilitate data recovery, the directories will be retained for 10 days after the end of the job, unless they fill a significant fraction of the disk, in which case they may be purged after as little as 1 day. We recommend purging the temporary directory at the end of the job, see example below.
scratch_output.txt · Last modified: 2017/06/29 15:53 by root