4 Getting started - programs
We’ll use a few programs throughout this practical. You’ll probably need these for your (future) genetic epidemiology work too (Table 4.1).
Program | Link | Description |
---|---|---|
PLINK | https://www.cog-genomics.org/plink2/ | PLINK is a free, open-source genetic analysis tool set, designed to perform a range of basic data parsing and quality control, as well as basic and large-scale analyses in a computationally efficient manner. |
R | https://cran.r-project.org/ | A program to perform statistical analysis and visualizations. |
RStudio | https://www.rstudio.com | A user-friendly R-wrap-around for code editing, debugging, analyses, and visualization. |
Homebrew | https://brew.sh | A great extension for Mac-users to install really useful programs that Apple didn't. |
4.1 RStudio
RStudio is a very user-friendly interface around R
that makes your R
-scripting-life a lot easier. You should get used to that. RStudio comes with R
so you don’t have to worry about that.
4.2 PLINK
Right, onto PLINK
.
All genetic analyses can be done in PLINK, even on your laptop, but with large datasets, for example UK Biobank size, it is better to switch to a high-performance computing cluster (HPC) like we have available at the Utrecht Science Park. The original PLINK v1.07 can be found here, but nowadays we are using a newer, faster version: PLINK v1.9 which can be found here. It still says ‘PLINK 1.90 beta’ (Figure 4.1), but you can consider this version stable and save to work with, but as you can see, some functions are not supported anymore.

Figure 4.1: The PLINK v1.9 website.
4.2.1 Alternatives to PLINK
Nowadays, a lot of people also use programs like SNPTEST, BOLT-LMM, GCTA, or regenie as alternatives to execute GWAS. These programs were designed with specific use-cases in mind, for instance really large biobank data including hundreds of thousands individuals, better control for population stratification, the ability to estimate trait heritability or Fst, and so on.
4.3 Other programs
Mendelian randomization can be done either with the SMR or GSMR function from GCTA, or with R-packages, like TwoSampleMR
.
As you are following the Genetic Epidemiology course, the next thing we’ll cover are the CoCalc-instructions in Chapter 5.