Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. More information on kallisto can be found here
Edit your $HOME/.bashrc
file to include the load command below for kallisto. You will have to log back in for this to take effect.
module load kallisto
In your $HOME
directory create the directory KALLISTO-JOBS
where we will submit jobs to the queue. Inside that directory copy the 'test' directory from where kallisto is located.
razor-l1:jokinsey:~$ mkdir KALLISTO-JOBS razor-l1:jokinsey:~$ cd KALLISTO-JOBS/ razor-l1:jokinsey:~/KALLISTO-JOBS$ cp -r share/apps/bioinformatics/kallisto/kallisto_linux-v0.43.1/test/ .
Now we can write the script to run out test job.
Inside the $HOME/KALLISTO-JOBS/test
create a PBS
script with the information below to run our example job.
#!/bin/bash #PBS -N kallisto #PBS -q tiny12core #PBS -j oe #PBS -o kallisto.$PBS_JOBID #PBS -l nodes=1:ppn=12 #PBS -l walltime=1:00:00 #PBS -M jokinsey@email.uark.edu cd $PBS_O_WORKDIR cp {transcripts.fasta.gz,reads_1.fastq.gz,reads_2.fastq.gz} /scratch/$PBS_JOBID cd /scratch/$PBS_JOBID kallisto index -i transcripts.idx transcripts.fasta.gz kallisto quant -i transcripts.idx -o output -b 100 reads_1.fastq.gz reads_2.fastq.gz cp -r output $PBS_O_WORKDIR/output.$PBS_JOBID
The script above first goes to the directory we submitted the job from, then it copies the files that are needed to /scratch/$PBSJOBID
so it is unique in case you want to run this job more than once.
the directory where we will be running out job and it moves to that directory. Using kallisto it builds an index from the
transcripts.fasta.gz file. Next it quantifies the number of abundances of the transcripts using the two read files. The output will be in the directory
output which we move to the submitted directory and append the
$PBSJOBID
All thats left is to submit the job.
razor-l1:jokinsey:~/KALLISTO-JOBS/test$ qsub kallisto.pbs
In the $HOME/KALLISTO-JOBS/test/output.$PBS_JOBID
directory the most important output should be the abundance.tsv
or abundance.txt
file. More information on the output can be found here