User Tools

Site Tools


parabricks

Parabricks

Parabricks is a GPU accelerated software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. A major benefit of Parabricks is that it is designed to deliver results at blazing fast speeds and low cost. Parabricks can analyze whole human genomes in about 45 minutes, compared to about 30 hours for 30x WGS data. The best part is the output results exactly match the commonly used software. So, it's fairly simple to verify the accuracy of the output.

Example Job

An example input for a Parabricks run is available at /share/apps/singularity/images/parabricks/parabricks_sample.tar.gz (also avaialbe for download: wget https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz). To uncompress the archine in your home directory run:

pinnacle-l1:pwolinsk:~$ tar -xzvf /share/apps/singularity/images/parabricks/parabricks_sample.tar.gz
parabricks_sample/
parabricks_sample/Data/
parabricks_sample/Data/sample_2.fq.gz
parabricks_sample/Data/sample_1.fq.gz
parabricks_sample/Ref/
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.pac
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.ann
parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.amb
parabricks_sample/Ref/Homo_sapiens_assembly38.dict
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.fai
parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.bwt
parabricks_sample/Ref/Homo_sapiens_assembly38.fasta.sa
c1612:pwolinsk:~$

Create a slurm script to submit the job to the gpu queue:

pinnacle-l9:pwolinsk:~$ cat parabricks.slurm 
#!/bin/bash
#SBATCH -p gpu06
#SBATCH -N1 
#SBATCH -n32
#SBATCH -t 1:00:00

module load singularity parabricks

pbrun fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam

The script above requests 1 node with all 32 cores in the gpu06 partition for 1 hour.

Submit the job:

pinnacle-l9:pwolinsk:~$ sbatch parabricks.slurm 
Submitted batch job 56682
pinnacle-l9:pwolinsk:~$ squeue -u pwolinsk
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             56682     gpu06 parabric pwolinsk  R       0:05      1 c1612
pinnacle-l9:pwolinsk:~$ 

While the job is running you can verify that it is using the GPU by running the nvidia-smi command remotely over ssh on the compute node running your job. In this case it is c1612 as listed in the 'NODELIST' column above:

pinnacle-l9:pwolinsk:~$ ssh c1612 "nvidia-smi"
Mon Mar 23 11:28:08 2020      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   35C    P0    39W / 250W |  11941MiB / 32510MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7209      C   PARABRICKS                                 11921MiB |
+-----------------------------------------------------------------------------+

And follow progress using tail -f on the job standard output file:

pinnacle-l9:pwolinsk:~$ tail -f slurm-56682.out 
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v2.5.0                              ||
||              GPU-BWA mem, Sorting, Marking Duplicates, BQSR              ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs

GPU-BWA mem
ProgressMeter	Reads		Base Pairs Aligned
[16:23:11]	5043564		580000000
[16:23:38]	10087128		1160000000
[16:24:04]	15130692		1740000000
[16:24:32]	20174256		2320000000
[16:24:59]	25217820		2900000000
[16:25:25]	30261384		3480000000
[16:25:53]	35304948		4060000000
[16:26:20]	40348512		4640000000
[16:26:47]	45392076		5220000000
[16:27:14]	50435640		5800000000

GPU-BWA Mem time: 297.342929 seconds
GPU-BWA Mem is finished.

GPU Sorting, Marking Dups, BQSR
ProgressMeter	SAM Entries Completed
[16:27:46]	5000000
[16:27:52]	10000000
[16:27:58]	15000000
[16:28:05]	20000000
[16:28:12]	25000000
[16:28:19]	30000000
[16:28:26]	35000000
[16:28:32]	40000000
[16:28:38]	45000000
[16:28:43]	50000000

Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 387.945936 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /scrfs/storage/pwolinsk/home/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /scrfs/storage/pwolinsk/home/parabricks_sample/Data/sample_1.fq.gz /scrfs/storage/pwolinsk/home/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 392.828 sec; CPU: 3587.779 sec
------------------------------------------------------------------------------
||        Program:    GPU-BWA mem, Sorting, Marking Duplicates, BQSR        ||
||        Version:                                            v2.5.0        ||
||        Start Time:                       Mon Mar 23 16:22:34 2020        ||
||        End Time:                         Mon Mar 23 16:29:07 2020        ||
||        Total Time:                           6 minutes 33 seconds        ||
------------------------------------------------------------------------------
^C

Then check for new output files:

pinnacle-l9:pwolinsk:~$ ls -ltr
...
-rw-rw-r--.  1 pwolinsk pwolinsk         288 Mar 23 11:22 parabricks.slurm
-rw-r--r--.  1 pwolinsk pwolinsk     6882792 Mar 23 11:28 output.bam.bai
-rw-r--r--.  1 pwolinsk pwolinsk  4728882999 Mar 23 11:28 output.bam
-rw-rw-r--.  1 pwolinsk pwolinsk        2659 Mar 23 11:29 slurm-56682.out
-rw-r--r--.  1 pwolinsk pwolinsk       87690 Mar 23 11:29 output_chrs.txt
pinnacle-l9:pwolinsk:~$ 
parabricks.txt · Last modified: 2020/03/23 18:58 by pwolinsk