User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
blast [2016/02/15 22:25]
blast [2016/07/22 21:41] (current)
Line 1: Line 1:
-==== blast ====+==== blast+ , blastall ​====
-The Intel processors on the razor cluster will run blast+ faster than the AMD processors on trestles.  The latest 2.3.0 version is considerably faster multi-threaded than earlier versions, ​but it still cannot effectively use all the cores in a node. 12-core nodes are sufficient since blast+ ​is not very scalable. ​ If doing a number of runs, it may be worthwhile ​to copy the database to your area of the local scratch disk, as shown.  For a single run it is probably faster overall to read directly from the reference copy. If copying the database please remember to remove it at the end of the job.  Please note the difference ​in paths between the clusters+NCBI Blast+ is a shared-memory program that runs on a single node with multiple threads. ​The Intel processors on the razor cluster will run blast about three times as fast as the AMD processors on trestles ​(but trestles has twice as many per node)Razor 12-core nodes are sufficient since blast+ ​scales ​to about 8 threads ​as shown by user/real time, but the number ​of cores actually present is used as the threads variable ​in each example.
-trestles+Blast works better with a database located on a local file system, so if doing a number of runs, it may be worth the couple of minutes to copy the database to your area of the local scratch disk, as shown. ​ For a single run it is probably faster overall to blast directly from the parallel filesystem database. If copying the database please remember to remove it at the end of the job.   
 +Unfortunately the 2.4.0+ version has a significant performance regression on AMD, and blast+ overall runs better on Intel. ​ Time-to-solution may still depend on cluster load.
 <​code>​ <​code>​
-$ rsync -a /​share/​apps/​bioinformatics/​blast/​db20150912/​nt ​/​local_scratch/​rfeynman/​ +/​home/​rfeynman$ cd /​local_scratch/​rfeynman 
-$ export BLASTDB=/​local_scratch/​rfeynman/nt +/​local_scratch/​rfeynman$ rsync -a /​share/​apps/​bioinformatics/​blast/​db20150912/​nt ​. 
-export PATH=$PATH:/​share/​apps/​bioinformatics/​blast/ncbi-blast-2.3.0+/bin +/​local_scratch/​rfeynman$ ​module purge;​module load blast/​2.3.0+  
-$ time  blastn -num_threads 32 -query ​queries/blastn/NM_001005648 -db $BLASTDB/nt >blast.output+/​local_scratch/​rfeynman$ time blastn -num_threads 32 -db nt/nt -query ​
 +/home/rfeynman/NM_001005648 ​>blast-2.3.0.out
-real 0m11.438s +real    0m11.146s 
-user 1m28.905s +user    1m33.556s 
-sys 0m11.403s+sys     ​0m10.938s 
 +/​local_scratch/​rfeynman$ module purge;​module load blast/​2.4.0+  
 +/​local_scratch/​rfeynman$ time blastn -num_threads 32 -db nt/nt -query \ 
 +/​home/​rfeynman/​NM_001005648 >​blast-2.4.0.out 
 +real 0m18.026s 
 +user 1m48.788s 
 +sys 0m11.792s 
 +/​local_scratch/​rfeynman$ rm -rf ./nt
 </​code>​ </​code>​
 +Examples are shown on a 16-core node for the last 3 versions of blastn and for blastall used with qiime. In this case 2.2.29 and 2.3.0 give the same output while 2.2.28 and blastall are different.
 <​code>​ <​code>​
-$ rsync -a /​share/​apps/​blast/​db20150912/nt /​local_scratch/​rfeynman +/​home/​rfeynman$ cd /​local_scratch/​rfeynman 
-export BLASTDB=/​local_scratch/​rfeynmann/nt +/​local_scratch/​rfeynman$ rsync -a /​share/​apps/​blast/​db/nt 
-export PATH=$PATH:/​share/​apps/​blast/​ncbi-blast-2.3.0+/bin +/​local_scratch/​rfeynman$ ​module purge;​module load blast/​2.2.28+ 
-$ time  blastn -num_threads 16 -query queries/​blastn/​NM_001005648 -db $BLASTDB/nt >blast.output+/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \ 
 +-query /​share/​apps/​blast/​queries/​blastn/​NM_001005648 >​blast-2.2.28.out 
 +real 0m3.273s 
 +user 0m2.204s 
 +sys 0m1.530s 
 +/​local_scratch/​rfeynmanmodule purge;​module load blast/​2.2.29+ 
 +/​local_scratch/​rfeynmantime blastn -num_threads 16 -db nt/nt \ 
 +-query ​/​share/​apps/​blast/​queries/​blastn/​NM_001005648 >blast-2.2.29.out 
 +real 0m3.817s 
 +user 0m38.202s 
 +sys 0m3.077s 
 +/​local_scratch/​rfeynman$ module purge;​module load blast/2.3.0+ 
 +/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \ 
 +-query ​/​share/​apps/​blast/​queries/​blastn/​NM_001005648 ​>blast-2.3.0.out 
 +real 0m3.327s 
 +user 0m33.666s 
 +sys 0m3.284s 
 +/​local_scratch/​rfeynmanmodule purge 
 +module load gcc/4.6.3 mkl/13.1.0 python/​2.7.5 R/3.1.2-mkl qiime/​1.9.1 
 +/​local_scratch/​rfeynman$ time blastall -p blastn -a 16 -d nt/nt \ 
 +-i /​share/​apps/​blast/​queries/​blastn/​NM_001005648 -o blastall.out 
 +real 0m6.231s 
 +user 1m20.847s 
 +sys 0m3.195s 
 +/​local_scratch/​rfeynman$ ls -al *out 
 +-rw-r--r-- 1 rfeynman rfeynman ​ 119520 Feb 29 13:10 blast-2.2.28.out 
 +-rw-r--r-- 1 rfeynman rfeynman ​ 498945 Feb 29 13:11 blast-2.2.29.out 
 +-rw-r--r-- 1 rfeynman rfeynman ​ 498538 Feb 29 13:11 blast-2.3.0.out 
 +-rw-r--r-- 1 rfeynman rfeynman 1422859 Feb 29 13:14 blastall.out 
 +/​local_scratch/​rfeynman$ diff -ibw blast-2.2.29.out blast-2.3.0.out 
 +< BLASTN 2.2.29+ 
 +BLASTN 2.3.0+ 
 +/​local_scratch/​rfeynman$ rm -rf ./nt 
 +On Intel razor, 2.4.0+ timed comparably or better to 2.3.0+. This is timed with a longer query that times more repeatably:​ 
 +/​local_scratch/​rfeynman$ module purge;​module load blast/2.3.0+ 
 +/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \ 
 +-query /​share/​apps/​blast/​queries/​blastn/​NM_010585 >​blast-2.3.0.out 
 +real 0m7.097s 
 +user 0m25.746s 
 +sys 0m2.325s
 +/​local_scratch/​rfeynman$ module purge;​module load blast/​2.4.0+
 +/​local_scratch/​rfeynman$ time blastn -num_threads 16 -db nt/nt \
 +-query /​share/​apps/​blast/​queries/​blastn/​NM_010585 >​blast-2.4.0.out
-real 0m5.917s +real 0m6.667s 
-user 0m48.965s +user 0m22.723s 
-sys 0m6.227s+sys 0m2.219s
 </​code>​ </​code>​
 +==Disk considerations==
 +Please recall that the shared parallel scratch disks on both systems have ~5,000 MB/s bandwidth, and local scratch disks have a bandwidth of ~150 MB/s (razor, hard disks) or ~300 MB/s (trestles, flash drives). So a single Blast job may run faster on the shared disk, depending on load. But distributed Blast runs on every trestles node will have about 15 times more aggregate bandwidth (256*300=76,​800 MB/s) if using the local disks.
blast.1455575106.txt.gz · Last modified: 2016/02/15 22:25 by root