Make Locus Zoom-like plots with your own LD matrix
This script creates an R function to create regional Manhattan plots with points coloured according to LD and genes annotated beneath.
Input file format
You will need at least two files to generate a Locus Zoom (LZ) plot: GWAS summary stats file, and a “gene” file.
The LD file can be generated on the fly, as long as you have access to the 1000GP reference data and the correct tools to calculate the LD (see Before you begin).
Please also note that the column names of the required files must match exactly to those specified below, so please take your time to check your headers and edit it to match with the specification.
GWAS summary statistics
Though you may have more information in your GWAS summary statistics, you only need the following four columns:
Column name | Description |
---|---|
CHR |
Chromosome number of the variant |
SNP |
rsID of the variant |
BP |
Base position of the variant |
P |
P-value of the variant |
Instead of using P
as significance, you may use logBF
(e.g. from a trans-ancestry meta-analysis that used MANTRA).
Gene file
A file of the genes within the region for use in the annotation step.
This file must have four columns:
Column name | Description |
---|---|
Gene |
Gene name |
Chrom |
Chromosome number of the gene |
Start |
Start position of the gene |
End |
End position of the gene |
The UCSC_GRCh37_Genes_UniqueList.txt
file is provided with the code.
LD file
This is a file that contains the LD between the SNP to be labelled (top-hit / SNP of interest) and the SNPs included in the GWAS summary statistics.
This file must contain the following two columns:
Column name | Description |
---|---|
SNP_B |
List of all the SNPs in the region of interest |
R2 |
The r2 LD-value of each SNP with the SNP of interest |
The SNP names of the LD file must match the SNP names in the GWAS summary statistics file.
Therefore, we recommend you to use rsID wherever possible to match with the 1000GP reference data (used to calculate the LD).
To calculate the LD yourself, follow the code below:
|
|
where:
Column name | Description |
---|---|
CHR |
Chromosome number |
START |
Start position of the region of interest |
END |
End position of the region of interest |
rsid-of-interest |
rsID of interest (usually the lead variant) |
For more information on why this could be useful, take a look at Generate multiple LZ plots.
Examples
Simple Locus Zoom
You will be able to run the following code and produce the same Locus Zoom plot using the example data set from the GitHub repo.
|
|
Locus Zoom with coloured genes based on MAGMA output
This is not reproducible from the example data.
|
|