Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. More information on kallisto can be found here

Enviornment Setup

Edit your $HOME/.bashrc file to include the load command below for kallisto. You will have to log back in for this to take effect.

module load kallisto

In your $HOME directory create the directory KALLISTO-JOBS where we will submit jobs to the queue. Inside that directory copy the 'test' directory from where kallisto is located.

razor-l1:jokinsey:~$ mkdir KALLISTO-JOBS
razor-l1:jokinsey:~$ cd KALLISTO-JOBS/
razor-l1:jokinsey:~/KALLISTO-JOBS$ cp -r share/apps/bioinformatics/kallisto/kallisto_linux-v0.43.1/test/ .

Now we can write the script to run out test job.

Example Job

Inside the $HOME/KALLISTO-JOBS/test create a PBS script with the information below to run our example job.

#PBS -N kallisto
#PBS -q tiny12core
#PBS -j oe
#PBS -o kallisto.$PBS_JOBID
#PBS -l nodes=1:ppn=12
#PBS -l walltime=1:00:00

cp {transcripts.fasta.gz,reads_1.fastq.gz,reads_2.fastq.gz} /scratch/$PBS_JOBID
cd /scratch/$PBS_JOBID

kallisto index -i transcripts.idx transcripts.fasta.gz
kallisto quant -i transcripts.idx -o output -b 100 reads_1.fastq.gz reads_2.fastq.gz

cp -r output $PBS_O_WORKDIR/output.$PBS_JOBID

The script above first goes to the directory we submitted the job from, then it copies the files that are needed to /scratch/$PBS_JOBID the directory where we will be running out job and it moves to that directory. Using kallisto it builds an index from the transcripts.fasta.gz file. Next it quantifies the number of abundances of the transcripts using the two read files. The output will be in the directory output which we move to the submitted directory and append the $PBS_JOBID so it is unique in case you want to run this job more than once.

All thats left is to submit the job.

razor-l1:jokinsey:~/KALLISTO-JOBS/test$ qsub kallisto.pbs

In the $HOME/KALLISTO-JOBS/test/output.$PBS_JOBID directory the most important output should be the abundance.tsv or abundance.txt file. More information on the output can be found here

