Table of Contents

JellyFish

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. If was developed by Guillaume Marçais and Carl Kingsford. You can find more information on JellyFish here.

Enivironment Setup

To work with jellyfish first we need to load the module. The easiest way to do this is to modify the .bashrc file in your $HOME directory.

module load jellyfish

Then in your $HOME directory create another directory to submit your jellyfish jobs, and copy the fasta data file we will be using for the test job into the new directory.

razor-l3:jokinsey:~$ mkdir JELLYFISH-JOBS
razor-l3:jokinsey:~$ cp /share/apps/jellyfish/jellyfish-1.1.10/tests/seq10m.fa JELLYFISH-JOBS/

Example Job

To run the example job create a PBS script in the directory where you will submit the job and that has the necessary fasta file. The script will look like the one below.

#!/bin/bash
#PBS -N jellyfish
#PBS -q tiny12core
#PBS -j oe
#PBS -o jellyfish.$PBS_JOBID
#PBS -l nodes=1:ppn=12
#PBS -l walltime=1:00:00

cd $PBS_O_WORKDIR
cp seq10m.fa  /scratch/$PBS_JOBID
cd /scratch/$PBS_JOBID

jellyfish count -m 10 -o output -c 3 -s 10000000 -t 12 seq10m.fa
cp output $PBS_O_WORKDIR

This will give us an output hash file that we can process with jellyfish. For more information on the flags we used see the manual here.

All thats left is to submit the job.

razor-l3:jokinsey:~/JELLYFISH-JOBS$ qsub jellyfish.pbs 

Once we have the output hash we can run other jellyfish operations with is like the histo operation shown below, which might be a way to verify your output.

razor-l3:jokinsey:~/JELLYFISH-JOBS$ jellyfish histo output
1 742
2 3399
3 10929
4 26162
5 49704
6 78834
7 107855
8 128207
9 136220
10 129521
11 113132
12 89579
13 65480
14 44554
15 28390
16 16847
17 9360
18 4951
19 2493
20 1173
21 529
22 264
23 104
24 47
25 15
26 7
28 1