==== mpiBLAST ====
{{ :mpiblast.png?nolink&200|}}
mpiBlast is a freely available, opensource, parallel implementation of NCBI Blast. mpiBlast takes advantage of shared parallel computign resources, i.e. a cluster this gives it access to more avaliable resources unlike NCBI blast which only can take advantage of shared-memory multi-processors(SMP's).
More information is available [[http://www.mpiblast.org/|here]].
==== Environment Setup ====
Edit the ''$HOME/.bashrc'' file to contain these modules.
module load gcc/4.5.0
module load openmpi/1.5.1
module load mpiblast/1.6.0
You may have to logout and log back in for the modules to load. You can check with the command ''module list'', which should also be displayed on login.
Make a directory to contain the FASTA database that will be fragmented. Download the database and decompress it.
razor-l1:jokinsey:~$ mkdir db
razor-l1:jokinsey:~$ cd db
razor-l1:jokinsey:~/db$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gz
razor-l1:jokinsey:~/db$ gunzip mito.nt.gz
Create a ''$HOME/.nbirc'' file with these values. The shared path tells mpiBlast where to access the FASTA database.
[mpiBLAST]
Shared=/home/YourUserName/db
Local=/local_scratch/YourUserName
Format the database for parallel use, by fragmenting the database for each processor. We will be using a node with 12 processors so we will include the option **- -nfrags=12**.
razor-l1:jokinsey:~$ mpiformatdb -i ~/db/mito.nt --nfrags=12
Reading input file
Done, read 2605891 lines
Database type unspecified, assuming nucleotide
Breaking mito.nt into 12 fragments
Executing: formatdb -i /home/jokinsey/db/mito.nt -p F -N 12 -o T
Created 12 fragments.
<<< Please make sure the formatted database fragments are placed in /home/jokinsey/db/ before executing mpiblast. >>>
==== Example Job ====
Create a directory ''$HOME/SCHED_PLACE'' to store the PBS schedule files. Create another directory to store the PBS input scripts, along with the FASTA input and output files.
razor-l1:jokinsey:~$ mkdir SCHED_PLACE
razor-l1:jokinsey:~$ mkdir testing
razor-l1:jokinsey:~$ cd testing
Create an input FASTA search file and name it input.
>gi|45238842|gb|AY563103.1| Homo sapiens interleukin 2 receptor, alpha (IL2RA) gene, complete cds
GTCCATCTCAGAACCAAGAGTTGGGCCTCTTATTTACCAGAAAAATTGTGGGGGCTTTGTGATATGGCTT
TAAAAAAATCTTGTAATTGCCAGGCGTGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGT
GGGTGAATCGCCTAAGGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAA
AATACAAAAACTAGCTGGATGTGGTGACGCGTGCCTGTAATCCTAGCTACTCAGGAGGCTGACGCAGGAG
AATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCCAAGATTGTGCCATTGCGCTCCAAAAAAAAAAA
AAAAAAGACATTAACATAAATTTAAATATTTTATAATGACAATCCACATTAACTACTTAAAGCATAAGCT
ATTTTCCAGGAGAGGCAGCAAGTGCATTCTACTCCCATGCCCAAGAAGAAAGGAGCGTGACTTTGGTGGG
AGTACTAGGAGTTTCTACTGGAGCACTTGCCCGCAGAGTGAGAAACGTTCCTAGAGAGGAAGTTATACCT
GCTGTGGAATTTAAGAGAATCTTGTCATATTTTGACAAGTTTTTTGAGATGGAAGTCTCACTCTGTCGCC
Create a PBS input script and name it mpiBlastTest.pbs
#!/bin/bash
#PBS -N MPIBLAST
#PBS -q tiny12core
#PBS -j oe
#PBS -m abe
#PBS -M jokinsey@uark.edu
#PBS -o MPIBLAST.$PBS_JOBID
#PBS -l nodes=1:ppn=12
#PBS -l walltime=02:00:00
cd "$PBS_O_WORKDIR"
cp input /scratch/$PBS_JOBID
cd /scratch/$PBS_JOBID
mpirun -np 12 mpiblast -p blastn -d mito.nt -i input -o $HOME/testing/output
In the ''PBS'' script first we copy our input file to the directory we will be working in ''/scratch/$PBS_JOBID''. Then we go into that directory to run the computation and send the output to ''$HOME/testing/output''
Then submit the job.
razor-l3:jokinsey:~/testing$ qsub mpiBlastTest.pbs
Then notice we submit the job from the directory ''$HOME/SCHED_PLACE'' this will save the schedule file in this directory, since we declared we want to run the job from the current directory. The output will be in ''$HOME/testing/output''.