19 Two Sample Mendelian Randomisation

We will use the TwoSampleMR package to assess the causal effect of type 2 diabetes on coronary artery disease.

Briefly, the workflow of a Two-Sample MR is as follows (see also diagram below):

  1. Select instruments for the exposure (perform LD clumping if necessary);
  2. Extract the instruments from the MR Base GWAS database for the outcomes of interest;
  3. Harmonise the effect sizes for the instruments on the exposures and the outcomes to be each for the same reference allele;
  4. Perform MR analysis, sensitivity analyses, create plots, compile reports.

19.1 Loading the package

TwoSampleMR is free use. It used to require server-authentication using a gmail-account, now it doesn’t anymore. Should run into an issue, please refer to this page.

First we load the package.

library("TwoSampleMR")
library("MRInstruments")
library("ieugwasr")

19.2 Available outcome

We are ready to load in the available outcomes.

ao_twosamplemr <- available_outcomes()
Available data.

Figure 19.1: Available data.

19.3 Instruments

The MRInstruments-package, which comes with TwoSampleMR, includes several types of instruments:

  • gwas_catalog, which includes data from GWAS collected by the GWAS Catalog
  • metab_qtls, all known quantitative trait loci (meQTLs) associated to circulating metabolites
  • proteomic_qtls, all known protein QTLs (pQTLs) associated to circulating protein levels
  • gtex_eqtl, all known expression QTLs (eQTLs) associated to gene expression in 53 different tissues from GTEx Portal
  • aries_mqtl, all known methylation QTLs (mQTLs) associated to DNA methylation in blood from ARIES

Let’s select T2D associated instruments.

data("gwas_catalog")
t2d_gwas <- subset(gwas_catalog, grepl("Mahajan", Author) & PubmedID == "24509480" & Phenotype == "Type 2 diabetes")
t2d_exp_dat <- format_data(t2d_gwas)
Available data.

Figure 19.2: Available data.

19.4 Clumping

Like with identifying independent GWAS hits, it is important for TwoSampleMR to ensure that the instruments for the exposure are independent. We can use MR Base to perform clumping. The European samples from the 1000 genomes project are used to estimate LD between SNPs. You can provide a list of SNP IDs, the SNPs will be extracted from 1000 genomes data, LD calculated between them, and amongst those SNPs that have LD R-square above the specified threshold only the SNP with the lowest P-value will be retained. To do this, use the following command.

t2d_exp_dat <- clump_data(t2d_exp_dat)
All available data.

Figure 19.3: All available data.

19.5 Outcomes

Right, so now we are ready to select the proper outcome, CAD. There are multiple datasets available in MRBase, the database used by TwoSampleMR. Let’s review them all and select one with the largest overlap with our exposure SNPs.

ao_twosamplemr[grepl("heart disease", ao_twosamplemr$trait), ]
Available CHD data.

Figure 19.4: Available CHD data.

It turns out there are several options listed below.

Code Year Author Description Ncases Ncontrols
ukb-d-I9_IHD 2018 Neale Labe Ischaemic heart disease, wide definition 20857 340337
ukb-d-I9_CHD_NOREV 2018 Neale Labe Major coronary heart disease event excluding revascularizations 10157 351037
ieu-a-7 2015 Nikpay et al. Coronary artery disease 60801 123504
ieu-a-9 2013 Deloukas et al.  Coronary artery disease 63746 130681
ieu-a-8 2011 Schunkert et al. Coronary artery disease 22233 64762

We will use the ieu-a-7 dataset as this provides the latest and the greatest, but at the same time (when you go into the literature and dig up the publication) you need a good and sharp definition of the outcome you are interested in.

chd_out_dat <- extract_outcome_data(
    snps = t2d_exp_dat$SNP,
    # outcomes = "UKB-a:534" # - this one is deprecated, use the one below
    # Deprecated IDs being used? Detected numeric IDs. Trying to fix, but please note the changes below for future.
    # UKB-a:534  ->  ukb-a-534
    outcomes = "ieu-a-7"
)
Extracting available top SNPs.

Figure 19.5: Extracting available top SNPs.

19.5.1 Harmonisation

It is important to make sure that the effect allele associated to T2D of any given SNP is the same as the effect allele associated CAD. In other words, we need to harmonise the data.

dat <- harmonise_data(
    exposure_dat = t2d_exp_dat,
    outcome_dat = chd_out_dat
)
Harmonising the instruments.

Figure 19.6: Harmonising the instruments.

19.5.2 Mendelian randomisation

Now that the data is harmonised, we are ready to perform a MR and test for a causal relation of T2D with CAD.

res <- mr(dat)

DT::datatable(res, caption = "MR of type 2 diabetes and coronary artery disease", rownames = FALSE)
Primary MR analysis.

Figure 19.7: Primary MR analysis.

Question: What is the causal effect of T2D on CAD? Can you describe this using the knowledge from the lecture and the results in the table.

Several tests are applied. You can change the method(s) applied by using the method_list flag. Below a list of available methods.

(#tab:MR_methods)Available MR methods
obj name PubmedID Description use_by_default heterogeneity_test
mr_wald_ratio Wald ratio TRUE FALSE
mr_two_sample_ml Maximum likelihood FALSE TRUE
mr_egger_regression MR Egger 26050253 TRUE TRUE
mr_egger_regression_bootstrap MR Egger (bootstrap) 26050253 FALSE FALSE
mr_simple_median Simple median FALSE FALSE
mr_weighted_median Weighted median TRUE FALSE
mr_penalised_weighted_median Penalised weighted median FALSE FALSE
mr_ivw Inverse variance weighted TRUE TRUE
mr_ivw_radial IVW radial FALSE TRUE
mr_ivw_mre Inverse variance weighted (multiplicative random effects) FALSE FALSE
mr_ivw_fe Inverse variance weighted (fixed effects) FALSE FALSE
mr_simple_mode Simple mode TRUE FALSE
mr_weighted_mode Weighted mode TRUE FALSE
mr_weighted_mode_nome Weighted mode (NOME) FALSE FALSE
mr_simple_mode_nome Simple mode (NOME) FALSE FALSE
mr_raps Robust adjusted profile score (RAPS) FALSE FALSE
mr_sign Sign concordance test Tests for concordance of signs between exposure and outcome FALSE FALSE
mr_uwr Unweighted regression Doesn’t use any weights FALSE TRUE

19.5.3 Heterogeneity & Pleiotropy

Many genetic variants are associated to multiple traits, thus pleiotropy is common and widespread in the human genome. This poses as an issue to tease apart causal relations in Mendelian randomisation. Vertical pleiotropy arises because a SNP influences one trait, which in turn influences another. We can use MR to estimate the causal influence between the traits. Horizontal pleiotropy arises due to SNPs influencing two traits through independent pathways(Hemani G. 2018).

Further, we should test for study heterogeneity.

mr_heterogeneity(dat)
MR heterogeneity.

Figure 19.8: MR heterogeneity.

mr_pleiotropy_test(dat)
MR pleiotropy.

Figure 19.9: MR pleiotropy.

Question: What do the results of the heterogeneity and pleiotropy tests tell you?

19.5.4 Leave-one-out

It is also key to inspect the causal effects per SNP and perform a leave-one-out analysis.

res_single <- mr_singlesnp(dat)
DT::datatable(res_single, caption = "Single SNP MR analysis", rownames = FALSE)
Single SNP analysis results.

Figure 19.10: Single SNP analysis results.

res_loo <- mr_leaveoneout(dat)
DT::datatable(res_loo, caption = "Leave-one-out MR analysis", rownames = FALSE)
Leave-One-Out analysis results.

Figure 19.11: Leave-One-Out analysis results.

Question: What do the results of the single SNP and leave-one-out analyses tell you?

19.5.5 Visualisation

It is important to present the causal inference graphically.

Scatter plot

res <- mr(dat)
p1 <- mr_scatter_plot(res, dat)
p1
Scatter plot.

Figure 19.12: Scatter plot.

Question: What do you see in the scatter plot? And what can you conclude from this?

Single SNP plot

res_single <- mr_singlesnp(dat)
p2 <- mr_forest_plot(res_single)
p2
Single SNP plot.

Figure 19.13: Single SNP plot.

Question: What do you see in the forest plot? And what can you conclude from this?

Leave-One-Out plot

res_loo <- mr_leaveoneout(dat)
p3 <- mr_leaveoneout_plot(res_loo)
p3
Leave-One-Out plot.

Figure 19.14: Leave-One-Out plot.

Question: what does a leave-one-out ploy tell you? And what can you conclude from this?

Funnel plot

res_single <- mr_singlesnp(dat)
p4 <- mr_funnel_plot(res_single)
p4
Funnel plot.

Figure 19.15: Funnel plot.

Question: Why would yoy use a funnel plot? And what can it tell you?

Question: What are your main conclusions from the MR analysis?

19.6 The end?

You are almost at the end. Time to return to the previous chapter on post-GWAS analyses 15.

References

Hemani G., et al. 2018. “Evaluating the Potential Role of Pleiotropy in Mendelian Randomization Studies.” https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061876.