JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. If was developed by Guillaume Marçais and Carl Kingsford. You can find more information on JellyFish here.
To work with jellyfish first we need to load the module. The easiest way to do this is to modify the
.bashrc file in your
module load jellyfish
Then in your
$HOME directory create another directory to submit your jellyfish jobs, and copy the
fasta data file we will be using for the test job into the new directory.
razor-l3:jokinsey:~$ mkdir JELLYFISH-JOBS razor-l3:jokinsey:~$ cp /share/apps/jellyfish/jellyfish-1.1.10/tests/seq10m.fa JELLYFISH-JOBS/
To run the example job create a
PBS script in the directory where you will submit the job and that has the necessary
fasta file. The script will look like the one below.
#!/bin/bash #PBS -N jellyfish #PBS -q tiny12core #PBS -j oe #PBS -o jellyfish.$PBS_JOBID #PBS -l nodes=1:ppn=12 #PBS -l walltime=1:00:00 cd $PBS_O_WORKDIR cp seq10m.fa /scratch/$PBS_JOBID cd /scratch/$PBS_JOBID jellyfish count -m 10 -o output -c 3 -s 10000000 -t 12 seq10m.fa cp output $PBS_O_WORKDIR
This will give us an
output hash file that we can process with jellyfish. For more information on the flags we used see the manual here.
All thats left is to submit the job.
razor-l3:jokinsey:~/JELLYFISH-JOBS$ qsub jellyfish.pbs
Once we have the
output hash we can run other jellyfish operations with is like the
histo operation shown below, which might be a way to verify your
razor-l3:jokinsey:~/JELLYFISH-JOBS$ jellyfish histo output 1 742 2 3399 3 10929 4 26162 5 49704 6 78834 7 107855 8 128207 9 136220 10 129521 11 113132 12 89579 13 65480 14 44554 15 28390 16 16847 17 9360 18 4951 19 2493 20 1173 21 529 22 264 23 104 24 47 25 15 26 7 28 1