====Storage/Filesystems/Quotas/Backup==== Pinnacle, Trestles and Razor clusters share the same Lustre storage. The Karpinski cluster has separate storage because it lacks InfiniBand over which the Lustre storage works. ==storage== The main bulk storage area for user ''rfeynman'' is ''/storage/rfeynman/'' which is a symlink to ''/scrfs/storage/rfeynman/'' . In most contexts either name may be used. The home directory is ''/home/rfeynman/'' which is a symlink to ''/scrfs/storage/rfeynman/home'', so home is a subsection of your ''/storage'' area. home is the storage area to which ''cd'' with no arguments takes you, and contains important configuration files and directories such as the file ''.bashrc'' and the directory ''.ssh''. You can organize your files arbitrarily inside your storage area with one exception: cluster operation over many different nodes requires ssh keys to work, which requires ''/home/rfeynman/.ssh'' to be present and correctly configured, which requires ''/scrfs/storage/rfeynman/home'' to be configured in its default setup, so don't move or rename your home directory. Your storage directory has a quota of 10 TB. At this time quotas are non-enforcing so you can still write files if over quota, but you will be reminded. You can check your own quota like so: tres-l1:rfeynman $ lfs quota /scrfs Disk quotas for usr rfeynman (uid 9594): Filesystem kbytes quota limit grace files quota limit grace /scrfs 10870257212 0 0 - 14450 0 0 - Disk quotas for grp oppenhiemer (gid 572): Filesystem kbytes quota limit grace files quota limit grace /scrfs 0 0 0 - 0 0 0 - tres-l1:rfeynman $ This shows usage of 10.87 TB (over quota) and 14,450 files (no quota set). Group quotas are shown but are not active currently. **Your storage area is not backed up**. The most common data loss issue is a user accidentally deleting their own files, or typos with ''rm -r'' which is a dangerous command. We suggest that, instead of deleting with ''rm'', if you want to rearrange your storage, make a trash folder and move files to there, make sure it's right, then delete the trash folder. mkdir /scrfs/storage/rfeynman/trash cd (to anywhere under /scrfs/storage/rfeynman) mv some-directory /scrfs/storage/rfeynman/trash/ cd /scrfs/storage/rfeynman (check trash again) rm -rf trash ==home backup== We do attempt to make disk-to-disk backups of the home area subset of your storage directory. Because the capacity is limited, we make this backup only if the size of the home area is less than 150 GB. You can check the size of your home area like so: tres-l1:rfeynman:$ cd tres-l1:rfeynman:$ du -sh ../home 627M ../home tres-l1:rfeynman:$ so this home area is 627MB < 150 GB and should be backed up. The ''du'' command has to go query every file so can take a while if there are a lot (such as millions) of files. The purpose of the home area is to store important files such as source code and configuration data, and large data collections should go elsewhere in your storage area. If your home area fails the 150 GB size test, **nothing at all will be backed up**. ==Lab Storage== Some labs have purchased auxiliary storage and if you are part of that corresponding lab group you can have a directory on it. These have names such as ''/storageb/''. If your lab has some storage, they are suitable for moving over-quota files or for backing up your ''/scrfs/storage/'' area. ==scratch and local_scratch== There is a dedicated and small high-speed temporary storage called ''/scratch/''. This is intended for large inputs (and especially outputs) directly to computational jobs. There is also local disk storage on each compute node called ''/local_scratch/''. The job queueing system creates for each job temporary job directories ''/scratch/$SLURM_JOB_ID/'' and ''/local_scratch/$SLURM_JOB_ID/'' on the first compute node of the job. On torque systems ''$PBS_JOBID'' is used instead. If your job creates more than 500 MB of output, please route output to the job scratch or local_scratch directory. There are no quotas on ''/scratch/'' or ''/local_scratch/'', but ''/scratch/'' has a total size of 19 TB and ''/local_scratch/'' varies by node but may be as small as 90 GB. The purpose for this rerouting is performance. The main storage ''/scrfs/'' is composed of fairly large and slow 8 TB SATA drives that do not well handle hundreds of concurrent streams of data, particularly those with small data blocks. ''/scrfs/'' can handle a fairly large throughput of efficiently large-blocked data, but that is rare in application programs. The NVMe drives of ''/scratch/'' and the mostly SSD drives of ''/local_scratch/'' are better for the typically inefficient and small data blocksput out by programs. At the end of your job, copy the files you want to keep from ''/scratch/'' or ''/local_scratch'/' back to main storage ''/scrfs/''. There are no user directories such as ''/scratch/rfeynman/'' since we found that such directories soon filled the small ''/scratch/'' partition. Each job directory is normally retained until a week after the job ends unless space becomes critical. See [[ torque_slurm_scripts ]] for some hints on moving data into and out of ''/scratch/'' areas during jobs.