==== mpiBLAST ==== {{ :mpiblast.png?nolink&200|}} mpiBlast is a freely available, opensource, parallel implementation of NCBI Blast. mpiBlast takes advantage of shared parallel computign resources, i.e. a cluster this gives it access to more avaliable resources unlike NCBI blast which only can take advantage of shared-memory multi-processors(SMP's). More information is available [[http://www.mpiblast.org/|here]]. ==== Environment Setup ==== Edit the ''$HOME/.bashrc'' file to contain these modules. module load gcc/4.5.0 module load openmpi/1.5.1 module load mpiblast/1.6.0 You may have to logout and log back in for the modules to load. You can check with the command ''module list'', which should also be displayed on login. Make a directory to contain the FASTA database that will be fragmented. Download the database and decompress it. razor-l1:jokinsey:~$ mkdir db razor-l1:jokinsey:~$ cd db razor-l1:jokinsey:~/db$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/mito.nt.gz razor-l1:jokinsey:~/db$ gunzip mito.nt.gz Create a ''$HOME/.nbirc'' file with these values. The shared path tells mpiBlast where to access the FASTA database. [mpiBLAST] Shared=/home/YourUserName/db Local=/local_scratch/YourUserName Format the database for parallel use, by fragmenting the database for each processor. We will be using a node with 12 processors so we will include the option **- -nfrags=12**. razor-l1:jokinsey:~$ mpiformatdb -i ~/db/mito.nt --nfrags=12 Reading input file Done, read 2605891 lines Database type unspecified, assuming nucleotide Breaking mito.nt into 12 fragments Executing: formatdb -i /home/jokinsey/db/mito.nt -p F -N 12 -o T Created 12 fragments. <<< Please make sure the formatted database fragments are placed in /home/jokinsey/db/ before executing mpiblast. >>> ==== Example Job ==== Create a directory ''$HOME/SCHED_PLACE'' to store the PBS schedule files. Create another directory to store the PBS input scripts, along with the FASTA input and output files. razor-l1:jokinsey:~$ mkdir SCHED_PLACE razor-l1:jokinsey:~$ mkdir testing razor-l1:jokinsey:~$ cd testing Create an input FASTA search file and name it input. >gi|45238842|gb|AY563103.1| Homo sapiens interleukin 2 receptor, alpha (IL2RA) gene, complete cds GTCCATCTCAGAACCAAGAGTTGGGCCTCTTATTTACCAGAAAAATTGTGGGGGCTTTGTGATATGGCTT TAAAAAAATCTTGTAATTGCCAGGCGTGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGT GGGTGAATCGCCTAAGGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACTCCGTCTCTACTAAA AATACAAAAACTAGCTGGATGTGGTGACGCGTGCCTGTAATCCTAGCTACTCAGGAGGCTGACGCAGGAG AATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCCAAGATTGTGCCATTGCGCTCCAAAAAAAAAAA AAAAAAGACATTAACATAAATTTAAATATTTTATAATGACAATCCACATTAACTACTTAAAGCATAAGCT ATTTTCCAGGAGAGGCAGCAAGTGCATTCTACTCCCATGCCCAAGAAGAAAGGAGCGTGACTTTGGTGGG AGTACTAGGAGTTTCTACTGGAGCACTTGCCCGCAGAGTGAGAAACGTTCCTAGAGAGGAAGTTATACCT GCTGTGGAATTTAAGAGAATCTTGTCATATTTTGACAAGTTTTTTGAGATGGAAGTCTCACTCTGTCGCC Create a PBS input script and name it mpiBlastTest.pbs #!/bin/bash #PBS -N MPIBLAST #PBS -q tiny12core #PBS -j oe #PBS -m abe #PBS -M jokinsey@uark.edu #PBS -o MPIBLAST.$PBS_JOBID #PBS -l nodes=1:ppn=12 #PBS -l walltime=02:00:00 cd "$PBS_O_WORKDIR" cp input /scratch/$PBS_JOBID cd /scratch/$PBS_JOBID mpirun -np 12 mpiblast -p blastn -d mito.nt -i input -o $HOME/testing/output In the ''PBS'' script first we copy our input file to the directory we will be working in ''/scratch/$PBS_JOBID''. Then we go into that directory to run the computation and send the output to ''$HOME/testing/output'' Then submit the job. razor-l3:jokinsey:~/testing$ qsub mpiBlastTest.pbs Then notice we submit the job from the directory ''$HOME/SCHED_PLACE'' this will save the schedule file in this directory, since we declared we want to run the job from the current directory. The output will be in ''$HOME/testing/output''.