JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. If was developed by Guillaume Marçais and Carl Kingsford. You can find more information on JellyFish here.
To work with jellyfish first we need to load the module. The easiest way to do this is to modify the .bashrc
file in your $HOME
directory.
module load jellyfish
Then in your $HOME
directory create another directory to submit your jellyfish jobs, and copy the fasta
data file we will be using for the test job into the new directory.
razor-l3:jokinsey:~$ mkdir JELLYFISH-JOBS razor-l3:jokinsey:~$ cp /share/apps/jellyfish/jellyfish-1.1.10/tests/seq10m.fa JELLYFISH-JOBS/
To run the example job create a PBS
script in the directory where you will submit the job and that has the necessary fasta
file. The script will look like the one below.
#!/bin/bash #PBS -N jellyfish #PBS -q tiny12core #PBS -j oe #PBS -o jellyfish.$PBS_JOBID #PBS -l nodes=1:ppn=12 #PBS -l walltime=1:00:00 cd $PBS_O_WORKDIR cp seq10m.fa /scratch/$PBS_JOBID cd /scratch/$PBS_JOBID jellyfish count -m 10 -o output -c 3 -s 10000000 -t 12 seq10m.fa cp output $PBS_O_WORKDIR
This will give us an output
hash file that we can process with jellyfish. For more information on the flags we used see the manual here.
All thats left is to submit the job.
razor-l3:jokinsey:~/JELLYFISH-JOBS$ qsub jellyfish.pbs
Once we have the output
hash we can run other jellyfish operations with is like the histo
operation shown below, which might be a way to verify your output
.
razor-l3:jokinsey:~/JELLYFISH-JOBS$ jellyfish histo output 1 742 2 3399 3 10929 4 26162 5 49704 6 78834 7 107855 8 128207 9 136220 10 129521 11 113132 12 89579 13 65480 14 44554 15 28390 16 16847 17 9360 18 4951 19 2493 20 1173 21 529 22 264 23 104 24 47 25 15 26 7 28 1