User Tools

Site Tools


mpiblast

mpiBLAST

mpiBlast is a freely available, opensource, parallel implementation of NCBI Blast. mpiBlast takes advantage of shared parallel computign resources, i.e. a cluster this gives it access to more avaliable resources unlike NCBI blast which only can take advantage of shared-memory multi-processors(SMP's).

More information is available here.

Environment Setup

Edit the $HOME/.bashrc file to contain these modules.

module load gcc/4.5.0
module load openmpi/1.5.1
module load mpiblast/1.6.0

You may have to logout and log back in for the modules to load. You can check with the command module list, which should also be displayed on login.

Make a directory to contain the FASTA database that will be fragmented. Download the database and decompress it.

razor-l1:jokinsey:~$ mkdir db
razor-l1:jokinsey:~$ cd db
razor-l1:jokinsey:~/db$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gz
razor-l1:jokinsey:~/db$ gunzip mito.nt.gz

Create a $HOME/.nbirc file with these values. The shared path tells mpiBlast where to access the FASTA database.

[mpiBLAST]
Shared=/home/YourUserName/db
Local=/local_scratch/YourUserName

Format the database for parallel use, by fragmenting the database for each processor. We will be using a node with 12 processors so we will include the option - -nfrags=12.

razor-l1:jokinsey:~$ mpiformatdb -i ~/db/mito.nt --nfrags=12
Reading input file
Done, read 2605891 lines
Database type unspecified, assuming nucleotide
Breaking mito.nt into 12 fragments
Executing: formatdb -i /home/jokinsey/db/mito.nt -p F -N 12 -o T 
Created 12 fragments.
<<< Please make sure the formatted database fragments are placed in /home/jokinsey/db/ before executing mpiblast. >>> 

Example Job

Create a directory $HOME/SCHED_PLACE to store the PBS schedule files. Create another directory to store the PBS input scripts, along with the FASTA input and output files.

razor-l1:jokinsey:~$ mkdir SCHED_PLACE
razor-l1:jokinsey:~$ mkdir testing
razor-l1:jokinsey:~$ cd testing

Create an input FASTA search file and name it input.

>gi|45238842|gb|AY563103.1| Homo sapiens interleukin 2 receptor, alpha (IL2RA) gene, complete cds
GTCCATCTCAGAACCAAGAGTTGGGCCTCTTATTTACCAGAAAAATTGTGGGGGCTTTGTGATATGGCTT
TAAAAAAATCTTGTAATTGCCAGGCGTGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGT
GGGTGAATCGCCTAAGGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAA
AATACAAAAACTAGCTGGATGTGGTGACGCGTGCCTGTAATCCTAGCTACTCAGGAGGCTGACGCAGGAG
AATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCCAAGATTGTGCCATTGCGCTCCAAAAAAAAAAA
AAAAAAGACATTAACATAAATTTAAATATTTTATAATGACAATCCACATTAACTACTTAAAGCATAAGCT
ATTTTCCAGGAGAGGCAGCAAGTGCATTCTACTCCCATGCCCAAGAAGAAAGGAGCGTGACTTTGGTGGG
AGTACTAGGAGTTTCTACTGGAGCACTTGCCCGCAGAGTGAGAAACGTTCCTAGAGAGGAAGTTATACCT
GCTGTGGAATTTAAGAGAATCTTGTCATATTTTGACAAGTTTTTTGAGATGGAAGTCTCACTCTGTCGCC

Create a PBS input script and name it mpiBlastTest.pbs

#!/bin/bash 
#PBS -N MPIBLAST
#PBS -q tiny12core
#PBS -j oe
#PBS -m abe
#PBS -M jokinsey@uark.edu
#PBS -o MPIBLAST.$PBS_JOBID
#PBS -l nodes=1:ppn=12
#PBS -l walltime=02:00:00


cd "$PBS_O_WORKDIR"
cp input /scratch/$PBS_JOBID
cd /scratch/$PBS_JOBID

mpirun -np 12 mpiblast -p blastn -d mito.nt -i input -o $HOME/testing/output

In the PBS script first we copy our input file to the directory we will be working in /scratch/$PBS_JOBID. Then we go into that directory to run the computation and send the output to $HOME/testing/output

Then submit the job.

razor-l3:jokinsey:~/testing$ qsub mpiBlastTest.pbs 

Then notice we submit the job from the directory $HOME/SCHED_PLACE this will save the schedule file in this directory, since we declared we want to run the job from the current directory. The output will be in $HOME/testing/output.

mpiblast.txt · Last modified: 2020/09/21 21:58 by root