Quantcast
Channel: best-practices — GATK-Forum
Viewing all articles
Browse latest Browse all 328

How to speed up CombineGVCFs - seems unfeasibly slow?

$
0
0

Dear @Sheila & @Geraldine_VdAuwera,

I hope you had a good weekend! My question, to start this week (hopefully the only thread!), is how to use CombineGVCFs. As you may remember I currently have ~100 WES samples for joint genotyping, and I have been testing GenotypeGVCFs directly and VQSR. However, in the near future I am going to have several hundred more samples in my hand, and hence I understand that making batches using CombineGVCFs is the way forward, as this is the only way to allow for subsequent merging of new gVCFs once sample size passes 200.

So I have been trying to run it this afternoon on a node that has 16 CPUs and 128Gb of RAM. Initial attempt was with -nt 14, but this gave the following message.

ERROR MESSAGE: Invalid command line: Argument nt has a bad value: The analysis CombineGVCFs currently does not support parallel execution with nt. Please run your analysis without the nt option.

So, therefore I started it without any threading, but then it appears it is going to take ~75hrs by it's own estimate after running for almost 1hr.

Is this really how long it should take? I have been trying to decipher the various possible issues looking at old threads on the forum, but I am not sure if there is any way I should be able to speed this up (short of splitting out the chromosomes and running them all individually)?

Also, I have seen mention somewhere that it may be because the files have been zipped in an incompatible manner - however, the input gVCFs are exactly the same that were successfully passed through GenotypeGVCFs, so this seems unlikely. I saw on one old thread that CombineGVCFs was super-slow, but I don't know if this is still the situation? For reference, my command was as shown below:

java -Xmx100000m -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.3-0/GenomeAnalysisTK.jar \
        -T CombineGVCFs \
        -R hsapiens.hs37d5.fasta \
        -V /path/File000001.g.vcf.gz \
             ....
        -V /path/File000100.g.vcf.gz \
        -nt 14 \
         -o AllgVCFsCombined.nt14.g.vcf

Thanks, in advance, as always.


Viewing all articles
Browse latest Browse all 328

Trending Articles