Quantcast
Channel: best-practices — GATK-Forum
Browsing latest articles
Browse All 328 View Live

Where can I find the old format of best practices from GATK?

There was a link for best practices which has a links to whole genome, whole exome SNVs and CNVs? Which details each steps in different workflows?

View Article


GATK best practices Short somatic variant calling: ExAC file

Hi, I was wondering if you could give some insight in the creation/preparation of the ExAC vcf file mentioned in the jason file (mutect2.exome.inputs.json) on Github...

View Article


Error MarkDuplicates (GATK4, Best Practices)

Hi! I'm trying to create the GATK4 pipeline but during the "MarkDuplicates" step I have the below error: CODE java -Dsamjdk.compression_level=${cl} -Xms5000m -jar ${ph3} MarkDuplicates...

View Article

BQSR and False Negatives?

I am looking at GATK Best Practices for Data Processing and Germline Variant calling. I see that the workflow calls for base quality score recalibration using BaseRecalibrator...

View Article

SortSam before MarkDuplicates?

Hi GATK team, I'm setting up a GATK best practices workflow. It is described here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165 that after mapping, which I did like this:...

View Article


GenotypeGVCFs output is empty gvcf file

I ran java -Xmx48g -jar gatk/3.5-0/src/GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19/hg19.fa -V merged.240samples.chr22.gvcf.gz -o merged.jointGT.chr22.gvcf.gz it got successfully completed INFO...

View Article

GATK3.8 vs GATK4 HaplotypeCaller

Hello, maybe I'm asking a naive question and maybe it has been answered somewhere else, but as the title states are there differences in the algorithm of the HaplotypeCaller between GATK3.8 release and...

View Article

Do I use ImportGenomicsDB on all files or 1?

So I am analyzing around 50 or so genomes and at the ImportGenomicsDB step, I am unsure as to if I combine all the genomes to create the database folder or if I do it for each genome

View Article


Implementation of GATK4 for variant calling in WES of human cancer samples...

Dear GATK community, i would like to ask a very specific question concerning the implementation of GATK toolkit for exome sequencing data. In detail, i have for 3 patients both whole exome sequencing...

View Article


Image may be NSFW.
Clik here to view.

How does the BQSR step not create bias in SNP detection?

Hello, I am using the GATK best practices to call variants in my RNA-seq data. So far, I have completed all of the steps up to the base recalibration (I skipped the optional indel step). I have been...

View Article

simple QC task for comparing two BAM files

Hi all, I have received two BAM files and I would like to create a report for each file and compare them ( Tumor and normal files). Which method is the best and what input and output files do I need to...

View Article

Where is "known_indels_sites_VCFs" defined?

Dear GATK team, I have been translating your wdl files into shell scripts to map them better to the scheduler on our Linux cluster (shell scripts are not already available anywhere, are they?). At some...

View Article

BQSR in GATK 4.0

Hi, Thanks first for such a great tool! I have a question about BQSR in GATK 4.0 Best Practices. In 3.8, PrintReads supports application of a covariates table file (with --BQSR) outputted from...

View Article


High depth - tumor-only variant calling with mutect2

Hello, I'm trying to call somatic variants (snv and indels) on targeted sequencings (usually from amplicon-based enrichment). Using Mutect1 seems to work very well, but MuTect2 is proving more...

View Article

Why is converting from fastq to uBAM nesessary before preprocessing?

Hi Everyone, I am brand new to this so please go easy on me. I have just taken over a project where we are going to be doing variant calling on a large number of human samples. I have inherited a...

View Article


Can the GATK Best Practices Pipeline on Google Cloud Platform be used on...

I read the documentation on this pipeline (https://cloud.google.com/genomics/docs/tutorials/gatk) and saw that its input is unaligned BAMs. Is there a way to use the pipeline for input FASTQs?

View Article

Invalid SAM?

I used BWA MEM to map reads from an interleaved FASTQ. fastq="all.fastq" fasta="/share/PI/apps/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa" bwa="/share/PI/apps/bcbio/anaconda/bin/bwa" nThreads="12"...

View Article


Applying VQSR to the Raw VCF vs Filtered VCF

Hi, I am working on a germline WES dataset with ~450 samples, all the variants are called following an adapted version of GATK Best Practices, using GATK 4.0.3. My question is about at which step we...

View Article

GATK 3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel

We noticed that GATK3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel when selecting SNP and INDEL variants separately for filtering. We followed the DNA-seq best-practices. Our...

View Article

Informatica Java Transformation for replacing spaces in a string

I'm trying to implement a Java transformation in Informatica which has to do a simple task of replacing all the Spaces in an array of strings. I've tried the following: OriginalArray[i] =...

View Article

Image may be NSFW.
Clik here to view.

Why is there difference of variants between after-BQSR bam and...

Dear GATK team, Hi, I have followed Best Practices to find out germline variants (GATK-3.7) of my samples designed by case-control study for ~500 samples in total. I have run BQSR, Prind Reads, and...

View Article


SamToFastqAndBwaMem error when running...

I am trying to run processing-for-variant-discovery-gatk4.wdl on my MacBook Pro. Instead of using the google drives, I have downloaded the relevant files. I have also pared down the list of unmapped...

View Article


Getting out-of-memory errors while running the worflow for germline short...

I'm trying to run the wdl posted on the gatk-workflows Github page, under the gatk4-germline-snps-indels repository. The wdl is "haplotypecaller-gvcf-gatk4.wdl" I'm attempting to run this wdl locally...

View Article

Image may be NSFW.
Clik here to view.

Recursive folders creation when running "Data Pre-Processing" workflow

Hello, I tried to run locally the Data Pre-Processing workflow found in GTAK4 Best Practices (both wdl and json files were downloaded from Github/gatk-workflows/gatk4-data-processing) but it...

View Article

Add GATK3 Variant Refinement

I note that the GATK 4 pipelines have a joint_discovery workflow, e.g. the gatk-workflows/gatk4-germline-snps-indels repo. However, this doesn't exist for the GATK 3 pipelines:...

View Article


Variant discovery starting from gVCF file

Hello, as the title suggests I'm looking to use the variant discovery tools, specifically SNP discovery. However I am not starting with a FASTA or BAM file, indeed I do not currently have access to...

View Article

Running joint-discovery-gatk4-local.wdl on hg19

Quoting from the 'About "Ask the team"' thread, since the "ask a question" button is working again: @oneillkza said: Running joint-discovery-gatk4-local.wdl on hg19 (Posting this here, since per the...

View Article

Analysis Pipeline Discrepancy in SNP Calling and Coverage

Hi, All, So I am new to GATK so please bear with me... Essentially, I have developed a unix script to analyze the fastq sequencing output for a novel targeting technique. I am only targeting 27 SNPs...

View Article

Somatic mutation pipeline error at"WorkflowManagerActor Workflow...

Dear GATK team, I am trying to call somatic mutation by using best practice pipeline for somatic mutation calling. Almost all data was successfully called somatic mutation in VCF files. However,...

View Article



GATK resource bundles scattered_calling_intervals exclude small contigs

Hi there, I was just going over some Haplotypecaller and VQSR results generated using your best practices Cromwell workflows, and found that the scattered_calling_intervals files you provide (and which...

View Article

FilterByOrientationBias in GATK4 hasn't filtered out any artefacts in FFPE...

Hi, I am currently using GATK4 to identify somatic mutations from FFPE WGS data. Everything else works fine apart from the second filtering step trough FilterByOrientationBias. Since the given sample...

View Article

FilterMutectCalls error :"there is no such column: sample"

Hi, I've been following the best practice for tumor somatic mutation calling. Everything runs like a charm until FilterMutectCalls which keeps throwing a java error:...

View Article

The stop position is less than start for Broad.human.exome.b37.scattered.txt

I was running a test with the the gatk3 germline workflow (located at `gatk-workflows/gatk3-germline-snps-indels` on GitHub), but since I'm only interested in exome performance I used the...

View Article


How to set GVCF genotypes too ./. based on the GQ score

Hi, I have a reasonably large non-human multi-VCF dataset containing ~280 samples and ~70M variants. I want to filter low quality genotype calls (but not variants as a whole). This does not seem to be...

View Article

GermlineCNVCaller parameters for targeted sequencing

Hi all, I am testing the presence of CNVs on target sequencing data related to a gene panel of ~100 genes. I have seen in the forum and from various posts that some of the parameters changes between...

View Article

GATK4 best practices error

with respect to the gatk best practice [manual]: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165. Mark Duplicates section:- which reads MarkDuplicates to perform the duplicate...

View Article


Error with GenomicsDBImport input file

Hi, I am trying to run GenomicsDBImport to test my pipeline with just 2 samples. I am using the following code: gatk GenomicsDBImport \ -V $SCRATCH/active/memtest2/ SRR112728.raw.snps.indels.g.vcf \ -V...

View Article


WDL + Cromwell + AWS Batch

Hi all - I'm trying to figure out the best way to write pipelines in WDL with AWS Batch. As I understand it, each Task in WDL is a separate AWS Batch job. As such, each Batch job can run on any...

View Article

why variant callers's (GATK3.8 and GATK 4.0) results are different ?

hello, i am beginner . i used two different tools to analyze my data but i got the two different why ?

View Article

BWA and fastqtosam failing with read names do not match errors

I'm using the GATK best practices to call public exome data. Out of over 600 exomes, most of the samples did fine with BWA mem alignment and fastqtosam. About 100 samples failed both steps (the same...

View Article

Pipeline Index

This document is under construction. It aims to provide an overview of use cases covered by GATK Best Practices workflows. Variant Discovery Germline Somatic Notes Data pre-processing Single-sample...

View Article


Image may be NSFW.
Clik here to view.

Missing variants using the GATK best practices.

Hi, I am working with human whole exome (WES - Illumina, paired end) data and trying to perform variant calling by following the GATK best practices with GATK v4.1.2.0 installation(I know that there...

View Article

Different resource for Mutect2/GetPileupSummaries when dealing with genome data

Hi, Firstly, could someone please set the category for this question to the most relevant? For some reason it's only letting me select Zoo & Garden from the menu. I'm currently using GATK 4.1.2.0,...

View Article


Error running GenomicsDBImport in parallel (java.lang.UnsatisfiedLinkError:...

Hi, I am using the scatter gather approach to run the Germline short variant discovery workflow. I am using GenomicsDBImport to consolidate GVCFs per scatter interval to allow joint genotyping with...

View Article

Alternative resources for Mutect2/GetPileupSummaries when dealing with genome...

Hi I'm currently using GATK 4.1.2.0, following the best practices for somatic variant calling. I already have this set up for exomes, but I'm now attempting to run the same pipeline on genome data. I'm...

View Article


MergeBamAlignment – Select primary alignment

Hi, In the current best practices workflow gatk4-data-processing, you recommend using uBAMs instead of FASTQ files. Great idea! However, when it comes to merging with the BWA alignment BAM, there is...

View Article

Masking Polymorphic Regions Before Variant Calling

I notice that the best practices workflows treat all regions in the reference genome the same. A region such as the MHC region containing the HLA genes is extremely polymorphic. There are thousands of...

View Article

How to identify duplicated genes in VCF file obtained after GATK pipeline?

I am working to find which gene type is more duplicated. I had mapped and annotated my VCF file by GATK pipeline. Please guide me how to proceed now.

View Article

Got error of java.lang.IllegalArgumentException: Invalid interval. Contig:81...

I have run 96 samples with somatic short variant calling pipeline with GATK version (gatk-4.1.4.0) and only three of them have this problem :java.lang.IllegalArgumentException: Invalid interval....

View Article


Large vcf files after running the GATK SNV + indel pipeline

Hi Simple question: Why do I get large vcf files after filtering variant calls? I am following your best practice pipeline (SNV + indel), with some minor modifications suggested in another thread (with...

View Article

Browsing latest articles
Browse All 328 View Live