12 The Welcome Trust Case-Control Consortium
Now that you know your way around PLINK
, bash
and r
and have done some basic quality control and association testing, you are ready for the real thing. We have prepared a real dataset: the first release of the Welcome Trust Case-Control Consortium (WTCCC) on coronary artery disease (CAD) and a control dataset used for that project.
12.1 Genotyping
The WTCCC1 data were genotyped using a chip from Affymetrix, nowadays part of ThermoFisher. As a brand Affymetrix still exists, but the chips aren’t made anymore. Unfortunately, most links to the old generation Affymetrix chips are borken, but you can still find some information about the 500K chip that was used for WTCCC1. It’s good practice to read up a bit on what chip was used, and what support materials are available.
12.2 The data
Before quality control the original data included:
- CAD cohort, n ± 2,000
- Healthy controls, from the UK 1958 birth control cohort (UKB58), n ± 1,500 (we won’t use this)
- Healthy controls, from the UK National Blood Service (NBS), n ± 1,500.
12.3 Assignment
Your assignment in the next chapter is to do the following:
- Explore the individual datasets by calculating some statistics and visualising these.
- Merge the datasets in the folder
wtccc1
. - Calculate PCs using
--pca
inPLINK
. - Perform an association test using available covariates.
- Visualize the results.
- Identify independent SNPs.
- Make regional association plots.
12.4 There you go
As I wrote, you are ready for the real stuff in Chapter 13.