Where can I find the old format of best practices from GATK?
There was a link for best practices which has a links to whole genome, whole exome SNVs and CNVs? Which details each steps in different workflows?
View ArticleGATK best practices Short somatic variant calling: ExAC file
Hi, I was wondering if you could give some insight in the creation/preparation of the ExAC vcf file mentioned in the jason file (mutect2.exome.inputs.json) on Github...
View ArticleError MarkDuplicates (GATK4, Best Practices)
Hi! I'm trying to create the GATK4 pipeline but during the "MarkDuplicates" step I have the below error: CODE java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MarkDuplicates...
View ArticleBQSR and False Negatives?
I am looking at GATK Best Practices for Data Processing and Germline Variant calling. I see that the workflow calls for base quality score recalibration using BaseRecalibrator...
View ArticleSortSam before MarkDuplicates?
Hi GATK team, I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:...
View ArticleGenotypeGVCFs output is empty gvcf file
I ran java -Xmx48g -jar gatk/3.5-0/src/GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19/hg19.fa -V merged.240samples.chr22.gvcf.gz -o merged.jointGT.chr22.gvcf.gz it got successfully completed INFO...
View ArticleGATK3.8 vs GATK4 HaplotypeCaller
Hello, maybe I'm asking a naive question and maybe it has been answered somewhere else, but as the title states are there differences in the algorithm of the HaplotypeCaller between GATK3.8 release and...
View ArticleDo I use ImportGenomicsDB on all files or 1?
So I am analyzing around 50 or so genomes and at the ImportGenomicsDB step, I am unsure as to if I combine all the genomes to create the database folder or if I do it for each genome
View ArticleImplementation of GATK4 for variant calling in WES of human cancer samples...
Dear GATK community, i would like to ask a very specific question concerning the implementation of GATK toolkit for exome sequencing data. In detail, i have for 3 patients both whole exome sequencing...
View ArticleHow does the BQSR step not create bias in SNP detection?
Hello, I am using the GATK best practices to call variants in my RNA-seq data. So far, I have completed all of the steps up to the base recalibration (I skipped the optional indel step). I have been...
View Articlesimple QC task for comparing two BAM files
Hi all, I have received two BAM files and I would like to create a report for each file and compare them ( Tumor and normal files). Which method is the best and what input and output files do I need to...
View ArticleWhere is "known_indels_sites_VCFs" defined?
Dear GATK team, I have been translating your wdl files into shell scripts to map them better to the scheduler on our Linux cluster (shell scripts are not already available anywhere, are they?). At some...
View ArticleBQSR in GATK 4.0
Hi, Thanks first for such a great tool! I have a question about BQSR in GATK 4.0 Best Practices. In 3.8, PrintReads supports application of a covariates table file (with --BQSR) outputted from...
View ArticleHigh depth - tumor-only variant calling with mutect2
Hello, I'm trying to call somatic variants (snv and indels) on targeted sequencings (usually from amplicon-based enrichment). Using Mutect1 seems to work very well, but MuTect2 is proving more...
View ArticleWhy is converting from fastq to uBAM nesessary before preprocessing?
Hi Everyone, I am brand new to this so please go easy on me. I have just taken over a project where we are going to be doing variant calling on a large number of human samples. I have inherited a...
View ArticleCan the GATK Best Practices Pipeline on Google Cloud Platform be used on...
I read the documentation on this pipeline (https://cloud.google.com/genomics/docs/tutorials/gatk) and saw that its input is unaligned BAMs. Is there a way to use the pipeline for input FASTQs?
View ArticleInvalid SAM?
I used BWA MEM to map reads from an interleaved FASTQ. fastq="all.fastq" fasta="/share/PI/apps/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa" bwa="/share/PI/apps/bcbio/anaconda/bin/bwa" nThreads="12"...
View ArticleApplying VQSR to the Raw VCF vs Filtered VCF
Hi, I am working on a germline WES dataset with ~450 samples, all the variants are called following an adapted version of GATK Best Practices, using GATK 4.0.3. My question is about at which step we...
View ArticleGATK 3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel
We noticed that GATK3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel when selecting SNP and INDEL variants separately for filtering. We followed the DNA-seq best-practices. Our...
View ArticleInformatica Java Transformation for replacing spaces in a string
I'm trying to implement a Java transformation in Informatica which has to do a simple task of replacing all the Spaces in an array of strings. I've tried the following: OriginalArray[i] =...
View ArticleWhy is there difference of variants between after-BQSR bam and...
Dear GATK team, Hi, I have followed Best Practices to find out germline variants (GATK-3.7) of my samples designed by case-control study for ~500 samples in total. I have run BQSR, Prind Reads, and...
View ArticleSamToFastqAndBwaMem error when running...
I am trying to run processing-for-variant-discovery-gatk4.wdl on my MacBook Pro. Instead of using the google drives, I have downloaded the relevant files. I have also pared down the list of unmapped...
View ArticleGetting out-of-memory errors while running the worflow for germline short...
I'm trying to run the wdl posted on the gatk-workflows Github page, under the gatk4-germline-snps-indels repository. The wdl is "haplotypecaller-gvcf-gatk4.wdl" I'm attempting to run this wdl locally...
View ArticleRecursive folders creation when running "Data Pre-Processing" workflow
Hello, I tried to run locally the Data Pre-Processing workflow found in GTAK4 Best Practices (both wdl and json files were downloaded from Github/gatk-workflows/gatk4-data-processing) but it...
View ArticleAdd GATK3 Variant Refinement
I note that the GATK 4 pipelines have a joint_discovery workflow, e.g. the gatk-workflows/gatk4-germline-snps-indels repo. However, this doesn't exist for the GATK 3 pipelines:...
View ArticleVariant discovery starting from gVCF file
Hello, as the title suggests I'm looking to use the variant discovery tools, specifically SNP discovery. However I am not starting with a FASTA or BAM file, indeed I do not currently have access to...
View ArticleRunning joint-discovery-gatk4-local.wdl on hg19
Quoting from the 'About "Ask the team"' thread, since the "ask a question" button is working again: @oneillkza said: Running joint-discovery-gatk4-local.wdl on hg19 (Posting this here, since per the...
View ArticleAnalysis Pipeline Discrepancy in SNP Calling and Coverage
Hi, All, So I am new to GATK so please bear with me... Essentially, I have developed a unix script to analyze the fastq sequencing output for a novel targeting technique. I am only targeting 27 SNPs...
View ArticleSomatic mutation pipeline error at"WorkflowManagerActor Workflow...
Dear GATK team, I am trying to call somatic mutation by using best practice pipeline for somatic mutation calling. Almost all data was successfully called somatic mutation in VCF files. However,...
View ArticleGATK resource bundles scattered_calling_intervals exclude small contigs
Hi there, I was just going over some Haplotypecaller and VQSR results generated using your best practices Cromwell workflows, and found that the scattered_calling_intervals files you provide (and which...
View ArticleFilterByOrientationBias in GATK4 hasn't filtered out any artefacts in FFPE...
Hi, I am currently using GATK4 to identify somatic mutations from FFPE WGS data. Everything else works fine apart from the second filtering step trough FilterByOrientationBias. Since the given sample...
View ArticleFilterMutectCalls error :"there is no such column: sample"
Hi, I've been following the best practice for tumor somatic mutation calling. Everything runs like a charm until FilterMutectCalls which keeps throwing a java error:...
View ArticleThe stop position is less than start for Broad.human.exome.b37.scattered.txt
I was running a test with the the gatk3 germline workflow (located at `gatk-workflows/gatk3-germline-snps-indels` on GitHub), but since I'm only interested in exome performance I used the...
View ArticleHow to set GVCF genotypes too ./. based on the GQ score
Hi, I have a reasonably large non-human multi-VCF dataset containing ~280 samples and ~70M variants. I want to filter low quality genotype calls (but not variants as a whole). This does not seem to be...
View ArticleGermlineCNVCaller parameters for targeted sequencing
Hi all, I am testing the presence of CNVs on target sequencing data related to a gene panel of ~100 genes. I have seen in the forum and from various posts that some of the parameters changes between...
View ArticleGATK4 best practices error
with respect to the gatk best practice [manual]: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165. Mark Duplicates section:- which reads MarkDuplicates to perform the duplicate...
View ArticleError with GenomicsDBImport input file
Hi, I am trying to run GenomicsDBImport to test my pipeline with just 2 samples. I am using the following code: gatk GenomicsDBImport \ -V $SCRATCH/active/memtest2/ SRR112728.raw.snps.indels.g.vcf \ -V...
View ArticleWDL + Cromwell + AWS Batch
Hi all - I'm trying to figure out the best way to write pipelines in WDL with AWS Batch. As I understand it, each Task in WDL is a separate AWS Batch job. As such, each Batch job can run on any...
View Articlewhy variant callers's (GATK3.8 and GATK 4.0) results are different ?
hello, i am beginner . i used two different tools to analyze my data but i got the two different why ?
View ArticleBWA and fastqtosam failing with read names do not match errors
I'm using the GATK best practices to call public exome data. Out of over 600 exomes, most of the samples did fine with BWA mem alignment and fastqtosam. About 100 samples failed both steps (the same...
View ArticlePipeline Index
This document is under construction. It aims to provide an overview of use cases covered by GATK Best Practices workflows. Variant Discovery Germline Somatic Notes Data pre-processing Single-sample...
View ArticleMissing variants using the GATK best practices.
Hi, I am working with human whole exome (WES - Illumina, paired end) data and trying to perform variant calling by following the GATK best practices with GATK v4.1.2.0 installation(I know that there...
View ArticleDifferent resource for Mutect2/GetPileupSummaries when dealing with genome data
Hi, Firstly, could someone please set the category for this question to the most relevant? For some reason it's only letting me select Zoo & Garden from the menu. I'm currently using GATK 4.1.2.0,...
View ArticleError running GenomicsDBImport in parallel (java.lang.UnsatisfiedLinkError:...
Hi, I am using the scatter gather approach to run the Germline short variant discovery workflow. I am using GenomicsDBImport to consolidate GVCFs per scatter interval to allow joint genotyping with...
View ArticleAlternative resources for Mutect2/GetPileupSummaries when dealing with genome...
Hi I'm currently using GATK 4.1.2.0, following the best practices for somatic variant calling. I already have this set up for exomes, but I'm now attempting to run the same pipeline on genome data. I'm...
View ArticleMergeBamAlignment – Select primary alignment
Hi, In the current best practices workflow gatk4-data-processing, you recommend using uBAMs instead of FASTQ files. Great idea! However, when it comes to merging with the BWA alignment BAM, there is...
View ArticleMasking Polymorphic Regions Before Variant Calling
I notice that the best practices workflows treat all regions in the reference genome the same. A region such as the MHC region containing the HLA genes is extremely polymorphic. There are thousands of...
View ArticleHow to identify duplicated genes in VCF file obtained after GATK pipeline?
I am working to find which gene type is more duplicated. I had mapped and annotated my VCF file by GATK pipeline. Please guide me how to proceed now.
View ArticleGot error of java.lang.IllegalArgumentException: Invalid interval. Contig:81...
I have run 96 samples with somatic short variant calling pipeline with GATK version (gatk-4.1.4.0) and only three of them have this problem :java.lang.IllegalArgumentException: Invalid interval....
View ArticleLarge vcf files after running the GATK SNV + indel pipeline
Hi Simple question: Why do I get large vcf files after filtering variant calls? I am following your best practice pipeline (SNV + indel), with some minor modifications suggested in another thread (with...
View Article