Adding Annotations to VCFs

The main consideration is to make sure that the reference resource matches the genomic build of your VCF. i.e. make sure you are using GRCh37 resources for a GRCh37 aligned VCF, or GRCh38 resources for a GRCh38 aligned VCF.

dbSNP

A vcf from dbSNP can be used to annotate rsIDs into the ID column of a vcf. For instance the dbSNP v151 VCF can be downloaded here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/ - one key thing is make sure that you use the same genome build as your VCF that you want to annotate.

Once you have a copy (Merrimanlab can find it in /Volumes/archive/merrimanlab/reference_files/VCF/dbSNP_reference/dbsnp151_GRCh37/dbsnp_151_20180423.vcf.gz)

Then to annotate using bcftools

# use on the server
module load bcftools/bcftools-1.11

bcftools annotate -a dbsnp_151_20180423.vcf.gz -c ID -o myvcf.dbsnp_annotated.vcf.gz -O z myvcf.vcf.gz

dbNSFP

Creation of latest dbNSFP

https://sites.google.com/site/jpopgen/dbNSFP

wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.0a.zip
unzip dbNSFP4.0a.zip

The file has GRCh38 coordinates as the default, so to change them you need this perl script dbNSFP_sort.pl and the GRCh37 coordinates are columns 8 and 9, but the script is zero indexed hence using 7 and 8 as the column numbers.

# Set to your downloaded dbNSFP version
version="4.0a"

# Replace coordinates by columns 7 and 8 (hg19 coordinates) and sort by those coordinates
zcat dbNSFP${version}_variant.chr*.gz | dbNSFP_sort.pl 7 8 > dbNSFP4.0a_hg19.txt

# Compress and index
bgzip dbNSFP${version}_hg19.txt
tabix -s 1 -b 2 -e 2 dbNSFP${version}_hg19.txt.gz

Once this annotation file is created it can be used to annotate VCF files using SnpSift.

For Merrimanlab, a version is available at /Volumes/archive/merrimanlab/reference_files/dbNSFP/dbNSFP4.0a/dbNSFP4.0a_hg19.txt.gz

Annotate

Make sure to check out the readme that came with dbNSFP for an idea of the annotations that are available (/Volumes/archive/merrimanlab/reference_files/dbNSFP/dbNSFP4.0a/dbNSFP4.0a.readme.txt).

To perform the annotation, we use SnpSift which comes as part of snpEff.

module load snpEff/snpEff_4.3t

java -jar $SNPSIFT dbnsfp -db dbNSFP4.0a_hg19.txt.gz -f $(zcat /Volumes/archive/merrimanlab/reference_files/dbNSFP/dbNSFP4.0a/dbNSFP4.0a_hg19.txt.gz | head -1 | tr "\t" "\n" | grep -v "hg18\|hg19\|Geuvadis_eQTL_target_gene" | sed -n 5,"$"p | tr "\n" "\," |sed "s/,$//g") myvcf.vcf.gz | bgzip -c > myvcf.dbnsfp.vcf.gz

Effect predictors

snpEff

module load snpEff/snpEff_4.3t
java -jar $SNPEFF GRCh37.75 -stats myvcf_stats.html -lof -csvStats myvcf_stats.csv myvcf.vcf.gz > myvcf.ann.vcf

Adding Annotations to VCFs