mpiBlast is a freely available, opensource, parallel implementation of NCBI Blast. mpiBlast takes advantage of shared parallel computign resources, i.e. a cluster this gives it access to more avaliable resources unlike NCBI blast which only can take advantage of shared-memory multi-processors(SMP's).
More information is available here.
Edit the $HOME/.bashrc
file to contain these modules.
module load gcc/4.5.0 module load openmpi/1.5.1 module load mpiblast/1.6.0
You may have to logout and log back in for the modules to load. You can check with the command module list
, which should also be displayed on login.
Make a directory to contain the FASTA database that will be fragmented. Download the database and decompress it.
razor-l1:jokinsey:~$ mkdir db razor-l1:jokinsey:~$ cd db razor-l1:jokinsey:~/db$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gz razor-l1:jokinsey:~/db$ gunzip mito.nt.gz
Create a $HOME/.nbirc
file with these values. The shared path tells mpiBlast where to access the FASTA database.
[mpiBLAST] Shared=/home/YourUserName/db Local=/local_scratch/YourUserName
Format the database for parallel use, by fragmenting the database for each processor. We will be using a node with 12 processors so we will include the option - -nfrags=12.
razor-l1:jokinsey:~$ mpiformatdb -i ~/db/mito.nt --nfrags=12 Reading input file Done, read 2605891 lines Database type unspecified, assuming nucleotide Breaking mito.nt into 12 fragments Executing: formatdb -i /home/jokinsey/db/mito.nt -p F -N 12 -o T Created 12 fragments. <<< Please make sure the formatted database fragments are placed in /home/jokinsey/db/ before executing mpiblast. >>>
Create a directory $HOME/SCHED_PLACE
to store the PBS schedule files. Create another directory to store the PBS input scripts, along with the FASTA input and output files.
razor-l1:jokinsey:~$ mkdir SCHED_PLACE razor-l1:jokinsey:~$ mkdir testing razor-l1:jokinsey:~$ cd testing
Create an input FASTA search file and name it input.
>gi|45238842|gb|AY563103.1| Homo sapiens interleukin 2 receptor, alpha (IL2RA) gene, complete cds GTCCATCTCAGAACCAAGAGTTGGGCCTCTTATTTACCAGAAAAATTGTGGGGGCTTTGTGATATGGCTT TAAAAAAATCTTGTAATTGCCAGGCGTGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGT GGGTGAATCGCCTAAGGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAA AATACAAAAACTAGCTGGATGTGGTGACGCGTGCCTGTAATCCTAGCTACTCAGGAGGCTGACGCAGGAG AATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCCAAGATTGTGCCATTGCGCTCCAAAAAAAAAAA AAAAAAGACATTAACATAAATTTAAATATTTTATAATGACAATCCACATTAACTACTTAAAGCATAAGCT ATTTTCCAGGAGAGGCAGCAAGTGCATTCTACTCCCATGCCCAAGAAGAAAGGAGCGTGACTTTGGTGGG AGTACTAGGAGTTTCTACTGGAGCACTTGCCCGCAGAGTGAGAAACGTTCCTAGAGAGGAAGTTATACCT GCTGTGGAATTTAAGAGAATCTTGTCATATTTTGACAAGTTTTTTGAGATGGAAGTCTCACTCTGTCGCC
Create a PBS input script and name it mpiBlastTest.pbs
#!/bin/bash #PBS -N MPIBLAST #PBS -q tiny12core #PBS -j oe #PBS -m abe #PBS -M jokinsey@uark.edu #PBS -o MPIBLAST.$PBS_JOBID #PBS -l nodes=1:ppn=12 #PBS -l walltime=02:00:00 cd "$PBS_O_WORKDIR" cp input /scratch/$PBS_JOBID cd /scratch/$PBS_JOBID mpirun -np 12 mpiblast -p blastn -d mito.nt -i input -o $HOME/testing/output
In the PBS
script first we copy our input file to the directory we will be working in /scratch/$PBS_JOBID
. Then we go into that directory to run the computation and send the output to $HOME/testing/output
Then submit the job.
razor-l3:jokinsey:~/testing$ qsub mpiBlastTest.pbs
Then notice we submit the job from the directory $HOME/SCHED_PLACE
this will save the schedule file in this directory, since we declared we want to run the job from the current directory. The output will be in $HOME/testing/output
.