This is an R tutorial for the epigenetic age prediction tool MethylAger, integrated into the RnBeads software package (http://rnbeads.mpi-inf.mpg.de). It should guide you how to use the epigenetic age prediction for different data sets. In this specific case, we focus on an Infinium 450K BeadChip data set as a representative, but MethylAger also supports sequencing-based data such as RRBS or WGBS. In this tutorial, we show two common use cases of age prediction that can be performed by MethylAger. First, we present the integration into the RnBeads module Covariate Inference and describe the HTML report that is created by the RnBeads pipeline. Then we go into more detail about customizing MethylAger and introduce the integration into a complete DNA methylation analysis pipeline.
RnBeads is a software package for the integrative, start-to-finish analysis of DNA methylation data. It is part of R/Bioconductor (http://bioconductor.org/packages/release/bioc/html/RnBeads.html) and installation, including dowloading of dependencies,can be achieved via:
Now you can load RnBeads into your current R session by:
For the analysis conducted in this tutorial, we collected a 450K BeadChip data set derived from two independent studies, one focused on different tissues (H. Y. Lee et al. 2015, GSE59505) and the other on head and neck cancer patients (GSE40005, publication not available). This data set was already imported and preprocessed with RnBeads and can be found here. Download the zip-file into your working directory and load it.
#Load the RnBSet rnbSet <- load.rnb.set("analysis_set.zip")
MethylAger is situated in the Covariate Inference module of RnBeads. This module deals with the analysis of potentially confounding factors, for instance by performing Surrogate Variable Analysis. One of these unkown factors is the individual’s epigenetic age, which could largely influence the overall epigenetic pattern. MethylAger infers this epigenetic age information from DNA methylation data and provides the information for follow-up analysis. To run the Covariate Inference module, a single command has to be executed:
The parameters of this method are a
RnBSet object on which the analysis should be conducted and
dir.reports, a path to which a HTML report should be written out. This report contains the results of the analysis, for instance plots and tables.
#A new folder 'inference_report' is created in your working directory dir.reports <- file.path(getwd(),"inference_report") #Execute the Inference module result <- rnb.run.inference(rnb.set=rnbSet, dir.reports=dir.reports)
## 2017-02-03 10:27:21 0.8 STATUS STARTED Covariate Inference ## 2017-02-03 10:27:21 0.8 INFO Number of cores: 1 ## 2017-02-03 10:27:21 0.8 STATUS STARTED Age Prediction using predefined predictor ## 2017-02-03 10:27:30 1.5 STATUS COMPLETED Age Prediction using predefined predictor ## 2017-02-03 10:27:30 1.5 STATUS STARTED Adding Age Prediction Section to Report ## 2017-02-03 10:29:58 1.4 STATUS Added Comparison Plot< ## 2017-02-03 10:29:59 1.4 STATUS Added Error Plot ## 2017-02-03 10:29:59 1.4 STATUS COMPLETED Adding Age Prediction Section to Report ## 2017-02-03 10:29:59 1.4 STATUS COMPLETED Covariate Inference
The corresponding report consists of plots that describe the relationship between predicted and annotated age for all samples. For instance, MethylAger compares the annotated ages in the data set to the predicted epigenetic ages with the following scatterplot:
In the interactive HTML report you are able to select different sample characteristics to annotate the points with different shapes or colors.
Furthermore, MethylAger can modify the input
RnBSet object by adding additional columns to the phenotypic table. This is achieved by running MethylAger separately. Without customizing any further options, MethylAger will check the assay type of the input data set and select the corresponding pre-defined predictor. Epigenetic age prediction is then performed by selecting the methylation values at the CpG sites of interest.
#Refresh the data set rnbSet <- load.rnb.set("analysis_set.zip")
#Execute MethylAger ?rnb.execute.age.prediction rnbSet <- rnb.execute.age.prediction(rnbSet)
## 2017-02-03 10:30:01 1.4 STATUS STARTED Performing Age Prediction ## 2017-02-03 10:30:04 1.9 STATUS COMPLETED Performing Age Prediction
This command starts MethylAger to predict epigenetic age from the provided data set’s methylation data. MethylAger returns a modified
RnBSet object, where the predicted epigenetic age resides in the column
predicted_ages of the phenotypic table. The epigenetic ages could be linked to the donor’s health state or lifestyle.
#Show the phenotypic information about the data set head(pheno(rnbSet)[,c("geo_accession","tissue","age","age_combined","predicted_ages","age_increase")])
## geo_accession tissue age age_combined predicted_ages age_increase ## 1 GSM1438496 saliva <NA> 20 33.74164 NA ## 2 GSM1438497 saliva <NA> 24 32.28709 NA ## 3 GSM1438498 saliva <NA> 27 22.48387 NA ## 4 GSM1438499 saliva <NA> 50 49.94989 NA ## 5 GSM1438500 saliva <NA> 59 57.01872 NA ## 6 GSM1438501 saliva <NA> 55 55.95207 NA
age_increase contains the difference between predicted, epigenetic and annotated, chronological age. For healthy samples, this difference should be close to zero, otherwise the individual’s epigenetic aging has been influenced by some, probably unkown, factor. Since the age is annotated in the column
age_combined (and not in the default column
age), MethylAger has to be told where to find the age information by:
#Set the option to the correct column rnb.options(inference.age.column="age_combined") #Refresh the data set rnbSet <- load.rnb.set("analysis_set.zip")
rnbSet <- rnb.execute.age.prediction(rnbSet)
## 2017-02-03 10:30:05 1.9 STATUS STARTED Performing Age Prediction ## 2017-02-03 10:30:08 1.7 STATUS COMPLETED Performing Age Prediction
#Only show the relevant columns of the phenotypic table head(pheno(rnbSet)[,c("age_combined","predicted_ages","age_increase")])
## age_combined predicted_ages age_increase ## 1 20 33.74164 13.74164055 ## 2 24 32.28709 8.28709313 ## 3 27 22.48387 -4.51613149 ## 4 50 49.94989 -0.05010714 ## 5 59 57.01872 -1.98127925 ## 6 55 55.95207 0.95207303
A high value of
age_increase indicates issues with the corresponding sample For instance, the sample’s donor could have suffered from a disease, which changed the epigenetic pattern in a way that the sample became epigenetically older.
In addition to using the pre-defined predictors of MethylAger, a custom predictor can be trained from the provided data set. For this, some parameters have to be set accordingly:
rnb.options(inference.age.column="age_combined", inference.age.prediction.training=TRUE, inference.age.prediction.cv=TRUE)
This activates both the training of a new predictor and the evaluation of its performance by applying 10-fold cross validation. Now start the Covariate Inference module to perform training:
#Refresh the data set rnbSet <- load.rnb.set("analysis_set.zip")
#A new folder 'training_report' is created in your working directory dir.reports <- file.path(getwd(),"training_report") result <- rnb.run.inference(rnb.set=rnbSet,dir.reports=dir.reports)
## 2017-02-03 10:30:10 1.7 STATUS STARTED Covariate Inference ## 2017-02-03 10:30:10 1.7 INFO Number of cores: 1 ## 2017-02-03 10:30:10 1.7 STATUS STARTED Training new age predictor ## 2017-02-03 10:31:18 3.2 STATUS COMPLETED Training new age predictor ## 2017-02-03 10:31:18 3.2 STATUS STARTED Age Prediction using specified predictor ## 2017-02-03 10:31:19 3.5 STATUS COMPLETED Age Prediction using specified predictor ## 2017-02-03 10:31:19 3.5 STATUS STARTED Adding Age Prediction Section to Report ## 2017-02-03 10:33:20 1.4 STATUS Added Comparison Plot ## 2017-02-03 10:33:22 1.4 STATUS Added Error Plot ## 2017-02-03 10:33:22 1.4 STATUS COMPLETED Adding Age Prediction Section to Report ## 2017-02-03 10:33:22 1.4 STATUS STARTED 10-fold Cross Validation ## 2017-02-03 10:33:44 4.1 INFO 1 ## 2017-02-03 10:34:22 3.6 INFO 2 ## 2017-02-03 10:34:59 3.4 INFO 3 ## 2017-02-03 10:35:37 4.0 INFO 4 ## 2017-02-03 10:36:13 3.6 INFO 5 ## 2017-02-03 10:36:49 3.8 INFO 6 ## 2017-02-03 10:37:24 4.0 INFO 7 ## 2017-02-03 10:38:00 4.0 INFO 8 ## 2017-02-03 10:38:36 4.0 INFO 9 ## 2017-02-03 10:39:15 3.8 INFO 10 ## 2017-02-03 10:39:52 3.5 STATUS COMPLETED 10-fold Cross Validation ## 2017-02-03 10:39:52 3.5 STATUS COMPLETED Covariate Inference
The resulting HTML report additionally contains information about the outcome of the cross validation. In this report, the new predictor that was trained on the input data set was applied to the data set itself. Thus, the high prediction accuracy shown in Figure 2 can be explained.
If there had been a shift away from mean zero, we would have been conforted with a systematic error in our data set. On top, the trained predictor is available as a CSV file in the data folder of the report directory. Similar to perfoming prediction, training can be invoked by
rnb.execute.training without report generation.
If you trained a predictor in a previous run of MethylAger, you could use this predictor for a follow-up analysis by setting the appropriate option:
#Set predictor to the CSV file created with the training function rnb.options(inference.age.prediction.predictor=file.path(getwd(),"training_report","covariate_inference_data","trained_predictor.csv"))
After performing training, the option is automatically set to the new predictor for the current R session. MethylAger can also be integrated into a start-to-finish analysis of RnBeads. For instance, the command below executes parts of RnBeads’ pipeline and executes age prediction with the previously trained predictor. The arguments for
rnb.run.analysis are a path to which the report should be written out (
data.type, which tells the method that the import into RnBeads was already done. We will deactivate all RnBeads modules except Covariate Inference and Exploratory Analysis to save time.
#Refresh the data set rnbSet <- load.rnb.set("analysis_set.zip")
#Set some options for the RnBeads pipeline, so the analysis finishes in a reasonable time rnb.options(inference=TRUE, qc=FALSE, preprocessing=FALSE, exploratory.principal.components=4, exploratory.columns=c("tissue","gender","disease state","age_increase","predicted_ages","age_combined"), exploratory.beta.distribution=FALSE, exploratory.correlation.qc=FALSE, exploratory.intersample=FALSE, exploratory.region.profiles=vector("character"), differential=FALSE, export.to.bed=FALSE, export.to.trackhub=NULL, inference.age.prediction.training=FALSE, inference.age.prediction.cv=FALSE) #A new folder 'analysis_report' is created in your working directory dir.reports <- file.path(getwd(),"analysis_report") #running time ~50min result <- rnb.run.analysis(data.source=rnbSet,dir.reports=dir.reports,data.type="rnb.set")
MethylAger was executed within the RnBeads pipeline and in this way, the epigenetic age is accounted for in the follow-up module Exploratory Analysis. The resulting HTML report consists of a separate report for each of the executed modules. For instance, RnBeads tests for associations between epigenetic age and other sample characteristics and creates the following figure in the report:
If we find a significant assocation to epigenetic aging, the p-value resulting from the performed test will be shown in the corresponding cell of Figure 3. We then could analyze the effect of the correspoding sample’s characteristic to increased epigenetic aging of the individual. In this case, the epigenetic age is significantly linked to the sample’s tissue indicating different epigenetic ages for the analyzed tissues
MethylAger can be deactivated within the Covariate Inference module by:
Furthermore, some functions of MethylAger can run in parallel, so please consider the parallel processing options of RnBeads when analyzing a large data set, but keep in mind that MethylAger uses nested cross validation to assess model performance. Therefore, for instance, setting the number of workers to 3 results in 9 threads.
We recommend to execute cross validation only on data sets with at least 30 samples, since the folds become too small otherwise.
In this tutorial we presented MethylAger, an easy-to-use epigenetic age prediction tool that was integrated into the RnBeads software package. It is able to derive epigenetic age of samples analyzed by different methylation assays, such as the Illumina BeadChips or sequencing-based approaches. Increased epigenetic aging could be linked to a sample’s disease history by additional modules of RnBeads. Due to the integration of MethylAger into the RnBeads package, MethylAger can easily be included into a methylation data analysis workflow.
Lee, Hwan Young, Ja Hyun An, Sang Eun Jung, Yu Na Oh, Eun Young Lee, Ajin Choi, Woo Ick Yang, and Kyoung Jin Shin. 2015. “Genome-wide methylation profiling and a multiplex construction for the identification of body fluids using epigenetic markers.” Forensic Science International: Genetics 17: 17–24. doi:10.1016/j.fsigen.2015.03.002.