User Tools

Site Tools


blast

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
blast [2016/07/21 21:29]
root
blast [2016/07/22 21:41]
root
Line 1: Line 1:
 ==== blast+ , blastall ==== ==== blast+ , blastall ====
  
-NCBI Blast+ is a shared-memory program that runs on a single node with multiple threads. The Intel processors on the razor cluster will run blast about three times as fast as the AMD processors on trestles (but trestles has twice as many per node). Razor 12-core nodes are sufficient since blast+ scales to about 8 threads as shown by user/real time, but the number of cores present is used in each example. ​ Blast works better with a database located on a local file system, so if doing a number of runs, it may be worth the couple of minutes to copy the database to your area of the local scratch disk, as shown. ​ For a single run it is probably faster overall to specify to blast the parallel filesystem database. If copying the database please remember to remove it at the end of the job.  ​+NCBI Blast+ is a shared-memory program that runs on a single node with multiple threads. The Intel processors on the razor cluster will run blast about three times as fast as the AMD processors on trestles (but trestles has twice as many per node). Razor 12-core nodes are sufficient since blast+ scales to about 8 threads as shown by user/real time, but the number of cores actually ​present is used as the threads variable ​in each example.
  
-trestles+Blast works better with a database located on a local file system, so if doing a number of runs, it may be worth the couple of minutes to copy the database to your area of the local scratch disk, as shown. ​ For a single run it is probably faster overall to blast directly from the parallel filesystem database. If copying the database please remember to remove it at the end of the job.   
 + 
 +==trestles== 
 +Unfortunately the 2.4.0+ version has a significant performance regression on AMD, and blast+ overall runs better on Intel. ​ Time-to-solution may still depend on cluster load.
  
 <​code>​ <​code>​
Line 15: Line 18:
 user    1m33.556s user    1m33.556s
 sys     ​0m10.938s sys     ​0m10.938s
 +/​local_scratch/​rfeynman$ module purge;​module load blast/​2.4.0+ ​
 +/​local_scratch/​rfeynman$ time blastn -num_threads 32 -db nt/nt -query \
 +/​home/​rfeynman/​NM_001005648 >​blast-2.4.0.out
 +
 +real 0m18.026s
 +user 1m48.788s
 +sys 0m11.792s
 +
 /​local_scratch/​rfeynman$ rm -rf ./nt /​local_scratch/​rfeynman$ rm -rf ./nt
 </​code>​ </​code>​
  
-razorExamples are shown on a 16-core node for the last 3 versions of blastn and for blastall used with qiime. In this case 2.2.29 and 2.3.0 give the same output while 2.2.28 and blastall are different.+==razor== 
 +Examples are shown on a 16-core node for the last 3 versions of blastn and for blastall used with qiime. In this case 2.2.29 and 2.3.0 give the same output while 2.2.28 and blastall are different.
  
 <​code>​ <​code>​
Line 65: Line 77:
 </​code>​ </​code>​
  
 +On Intel razor, 2.4.0+ timed comparably or better to 2.3.0+. This is timed with a longer query that times more repeatably:
 +<​code>​
 +/​local_scratch/​rfeynman$ module purge;​module load blast/​2.3.0+
 +/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \
 +-query /​share/​apps/​blast/​queries/​blastn/​NM_010585 >​blast-2.3.0.out
 +
 +real 0m7.097s
 +user 0m25.746s
 +sys 0m2.325s
 +
 +/​local_scratch/​rfeynman$ module purge;​module load blast/​2.4.0+
 +/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \
 +-query /​share/​apps/​blast/​queries/​blastn/​NM_010585 >​blast-2.4.0.out
 +
 +real 0m6.667s
 +user 0m22.723s
 +sys 0m2.219s
 +</​code>​
 +
 +==Disk considerations==
 +Please recall that the shared parallel scratch disks on both systems have ~5,000 MB/s bandwidth, and local scratch disks have a bandwidth of ~150 MB/s (razor, hard disks) or ~300 MB/s (trestles, flash drives). So a single Blast job may run faster on the shared disk, depending on load. But distributed Blast runs on every trestles node will have about 15 times more aggregate bandwidth (256*300=76,​800 MB/s) if using the local disks.
blast.txt · Last modified: 2016/07/22 21:41 by root