4 Getting started - programs

We’ll use a few programs throughout this practical. You’ll probably need these for your (future) genetic epidemiology work too (Table 4.1).

Table 4.1: Programs needed for genetic epidemiology.
Program	Link	Description
PLINK	https://www.cog-genomics.org/plink2/	PLINK is a free, open-source genetic analysis tool set, designed to perform a range of basic data parsing and quality control, as well as basic and large-scale analyses in a computationally efficient manner.
R	https://cran.r-project.org/	A program to perform statistical analysis and visualizations.
RStudio	https://www.rstudio.com	A user-friendly R-wrap-around for code editing, debugging, analyses, and visualization.
Homebrew	https://brew.sh	A great extension for Mac-users to install really useful programs that Apple didn't.

4.1 RStudio

RStudio is a very user-friendly interface around R that makes your R-scripting-life a lot easier. You should get used to that. RStudio comes with R so you don’t have to worry about that.

4.2 PLINK

Right, onto PLINK.

All genetic analyses can be done in PLINK, even on your laptop, but with large datasets, for example UK Biobank size, it is better to switch to a high-performance computing cluster (HPC) like we have available at the Utrecht Science Park. The original PLINK v1.07 can be found here, but nowadays we are using a newer, faster version: PLINK v1.9 which can be found here. It still says ‘PLINK 1.90 beta’ (Figure 4.1), but you can consider this version stable and save to work with, but as you can see, some functions are not supported anymore.

Figure 4.1: The PLINK v1.9 website.

4.2.1 Alternatives to `PLINK`

Nowadays, a lot of people also use programs like SNPTEST, BOLT-LMM, GCTA, or regenie as alternatives to execute GWAS. These programs were designed with specific use-cases in mind, for instance really large biobank data including hundreds of thousands individuals, better control for population stratification, the ability to estimate trait heritability or Fst, and so on.

4.3 Other programs

Mendelian randomization can be done either with the SMR or GSMR function from GCTA, or with R-packages, like TwoSampleMR.

As you are following the Genetic Epidemiology course, the next thing we’ll cover are the CoCalc-instructions in Chapter 5.

4 Getting started - programs

4.1 RStudio

4.2 PLINK

4.2.1 Alternatives to PLINK

4.3 Other programs

4.2.1 Alternatives to `PLINK`