RnBeads
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

RnBeads

This page shows the answers to those user questions that we believe are of general interest. If your specific problem is not covered, please do not hesitate to contact us by sending an e-mail to rnbeads@mpi-inf.mpg.de.

General Questions

I get an error 'cannot locate Ghostscript / gswin32c'. How can I fix this?

The R platform uses Ghostscript for convertion of PDF files to PNG images. The error you see shows that R cannot locate the Ghostscript executable on your machine. First, make sure Ghostscript is installed. You can download it from Ghostscript's official web site if necessary. After that, one solution is to add the Ghostscript's installation directory to the system path. Here we provide a brief description of steps to follow in order to achieve this on Windows operating systems:


1. Open the Advanced system settings. In Windows 7, for example, it can be reached through: Control Panel > System and Security > System > Advanced System Settings.
2. You see the "System Properties" dialog, open the "Advanced" tab. Click the button "Environment Variables..." to update the search path.
3. Locate the environment variable "Path" (it doesn't matter if it is the user or the system variables, as long as you are the user who starts R). Select it, click on Edit, and prepend the location of the Ghostscript executable, followed by a semicolon. The text you need to add is usually similar to C:\Program Files\gs\gs9.15\bin;
4. After starting a new R session, Ghostscript should be accessible from R. If it still cannot be located, you need to check the corresponding environment variable. In an R session, the command Sys.getenv()["R_GSCMD"] shows the contents of the dedicated Ghostcript variable. If the variable does not exists or points to the wrong executable file, you can set it to the full path of the Ghostscript executable. This is achieved by editing or creating the file Renviron.site in the etc subdirectory of your R installation. Make sure the file contents includes the line R_GSCMD=C:\Program Files\gs\gs9.15\bin\gswin64c.exe (assuming Ghostscript is located in C:\Program Files\gs\gs9.15 and you are using the 64-bit version of R). For more information, please check the R documentation on getting and setting environment variables.

I receive a warning that I need to install zip on my windows machine. What can I do?

In order to be able to save disk-backed RnBSet objects on Windows, Zip archive creating utility should be installed and properly configured. There are multiple ways to get Zip utility installed on your Windows system. For instance, Zip is available as a part of the Rtools distribution, which is a collection of packages for R development on Windows (link). For the minimal install at the "Select Components" stage of the Rtools installation choose "Custom installation" and check only the "R toolset" item below. In the "Additional Tasks" dialogue, which appears a couple of steps later, make sure that both available items for "Edit the system PATH" are checked ("Current value" and "Save version number XXX in registry"). To test the installation start the Windows terminal ("Start" > "Run" > "CMD") and try executing command "zip" in the command line. Given the installation and configuration were successful you should see the Zip version and brief usage instructions.
In some cases, the environment variables also need to be set in order for R to locate and use the installed zip utility. One way to do this is to create or edit the file Renviron.site in the etc subdirectory of your R installation. Make sure the file contents includes the lines:
R_ZIPCMD=zip
R_UNZIPCMD=unzip
For more information, please check the R documentation on getting and setting environment variables.

I am running RnBeads on a computer with limited resources. What can I do to reduce the memory consumption by RnBeads?

First, try to run RnBeads on a single core and NOT in parallel. Additionally, there are several option settings that can be used to reduce the resource requirements. These option settings apply to different (sub)modules:

# Disable greedycut (filtering)
rnb.options("filtering.greedycut"=FALSE)
# Disable intersample variation plots (exploratory analysis)
rnb.options("exploratory.intersample"=FALSE)
# Reduce the subsampling number for estimating density plots
rnb.options("distribution.subsample"=100000)
# Disable regional methylation profiling (exploratory analysis)
rnb.options("exploratory.region.profiles"=NULL)
# Disable chromosome coverage plots (QC, sequencing data only)
rnb.options("qc.coverage.plots"=FALSE)

Can I start RnBeads analysis with Galaxy on the Cloud?

Yes, you can. RnBeads has a wrapper script for integration with Galaxy, accessible from the main Galaxy Tool Shed (http://toolshed.g2.bx.psu.edu/view/pavlo-lutsik/rnbeads). In order to have it running the on a custom Galaxy instance on the Amazon cloud, follow the steps provided below. Warning: Following the steps results in additional costs. The exact amount depends on the selected cloud configuration and Amazon Web Services pricing at the time of usage.

  1. Subscribe for Amazon Web Services and establish a custom Galaxy instance on the clound using the Cloud Launch interface as described here: wiki.galaxyproject.org/CloudMan
  2. Log in to CloudMan (it becomes accessible upon successful completion of Step 1). Give it some time to start the Galaxy instance and click the button Access Galaxy.
  3. Go to the menu item User and register within your Galaxy instance.
  4. Go back to the CloudMan interface, acess the Admin interface (a link on the top-right), and add an yourself as an administrator user.
  5. Go to the Galaxy instance and refresh the page until you get a new Admin menu item.
  6. Proceed to the Galaxy administrator interface and select the task Search and browse tool sheds.
  7. Click on Galaxy main toolshed, select Browse for valid repositories and find the wrapper rnbeads using the search field located at the top of the newly opened page.
  8. Click on the button with the wrapper name and select Preview and install.
  9. Click Install to Galaxy on the top-right.
  10. Check Handle tool dependencies, select the Statistics tool panel section and click Install.
  11. Give Galaxy some time to install the tool, then go back to the Analyze Data interface. RnBeads should become available in the Tool panel, just type RnBeads in the search field. Click on the tool to open its Galaxy interface.
  12. You can test functionality of the tool by starting a small analysis of a public data set. Select Gene Expression Omnibus series in the data type field and specify GSE38268 as the GEO series. The series contains only 6 Infinium 450k samples, so the analysis should not take more than an hour. After the completion you will get a link to the analysis report, displayed directly in the data display window.

Which genome assemblies does RnBeads support? Can I include a new one?

RnBeads supports the human (hg19, hg38), mouse (mm9 and mm10) and rat (rn5) genomes. If you would like to analyze a different genome, you need to have experience with R and Bioconductor. Please follow these steps:

  1. Locate the Bioconductor package that defines the genome sequence. For example, the package BSgenome.Hsapiens.UCSC.hg19 contains the Homo sapiens genome.
  2. Download the template R script siteAnnotation.R and take a look at the defined variables and functions. Of particular interest are the list CHROMOSOMES and the function get.genome.data.
  3. Execute the function rnb.update.sites and save the result to an object named sites.
  4. Create an R annotation package. You can use the structure of RnBeads.mm9 as a template.

Which libraries and tools does RnBeads rely on?

RnBeads utilizes many established R packages for data loading and manipulation. Examples for such include the Bioconductor packages methylumi, minfi, RPMM, ggbio, GEOquery, GOstats and others. RnBeads also includes code from the Google Code project Beta Mixture Quantile Model. Parallelization is implemented using the packages foreach and doParallel. Most of the report figures are created using the ggplot2 package. GO enrichment analysis results are visualized with the help of the wordcloud package. See the output of the following command for a full list of the libraries required by RnBeads:

tools::dependsOnPkgs("RnBeads")

The web service submission form uses the cross-browser tooltips library by Walter Zorn. The library is distributed under the GNU Lesser General Public License

Compatibility

Why are some analysis options not recognized in RnBeads 0.99.15 and later?

In RnBeads 0.99.15, we reorganized the analysis pipeline. We introduced the new modules Preprocessing and Exploratory Analysis, and renamed the modules Loading to Import, Data Export to Tracks and Tables, and Annotation Inference to Covariate Inference. As a result, we renamed some of the analysis options to match the new modules. The table below lists the renamed options.

Old OptionNew Option
loadingimport
loading.default.data.typeimport.default.data.type
loading.table.separatorimport.table.separator
loading.bed.styleimport.bed.style
loading.bed.columnsimport.bed.columns
loading.bed.frame.shiftimport.bed.frame.shift
loading.bed.testimport.bed.test
loading.bed.test.onlyimport.bed.test.only
batchexploratory
batch.dreduction.columnsexploratory.columns
batch.top.dimensionsexploratory.top.dimensions
batch.principal.componentsexploratory.principal.components
batch.correlation.columnsexploratory.columns
batch.correlation.pvalue.thresholdexploratory.correlation.pvalue.threshold
batch.correlation.permutationsexploratory.correlation.permutations
batch.correlation.qcexploratory.correlation.qc
profilesexploratory
profiles.beta.distributionexploratory.beta.distribution
profiles.intersampleexploratory.intersample
profiles.deviation.plotsexploratory.deviation.plots
profiles.columnsexploratory.columns
profiles.clusteringexploratory.clustering
profiles.clustering.top.sitesexploratory.clustering.top.sites
region.profiles.typesexploratory.region.profiles
export.to.ucscexport.to.trackhub

Why are some function names not recognized in RnBeads 0.99.15 and later?

In RnBeads 0.99.15, we reorganized the analysis pipeline; as described in the answer to the previous question. As a result, we renamed some of the exported functions to match the new modules. The table below lists the renamed functions.

Old FunctionNew Function
rnb.execute.loadingrnb.execute.import
rnb.execute.exportrnb.execute.tnt
rnb.export.to.ucscrnb.export.to.trackhub
rnb.run.loadingrnb.run.import
rnb.run.batchrnb.run.exploratory
rnb.run.profilesrnb.run.exploratory
rnb.run.exportrnb.run.tnt

Analysis Pipeline

I want to load my data using bed files. What formats does RnBeads support?

In principle, RnBeads can process any tabular file format that has exactly one row for each CpG which includes genomic coordinates (chromosome, start and end), and additionally information from which methylation levels can be deduced. For more details see the package's vignette. However, there are many uncertainties and parameters that have to be taken into account when specifying the exact format of the methylation data files. We thus recommend using one of the packages presets which can be set using the loading.bed.style option. Here is an overview of the currently implementd presets:

EPP
bed files in the format as output files from the Epigenome Processing Pipeline developed by Fabian Müller and Christoph Bock A tab-separated file contains: the chromosome name, start coordinate, end coordinate, methylation value and coverage as a string ('#methylated_read/#total_reads'), some score, the strand, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 0-based, spanning the first coordinate in a site and the first coordinate outside the site (i.e. end-start = 2 for a CpG). Here are some example lines (genome assembly mm9):

chr1 3010957 3010959 '27/27' 1000 +
chr1 3010971 3010973 '10/20' 500 +
chr1 3011025 3011027 '57/70' 814 -
...

BisSNP
bed files are assumed to have been generated by the methylation calling tool BisSNP. A tab-separated file contains the chromosome name, start coordinate, end coordinate, methylation value in percent, the coverage, the strand, and additional information not taken into account by RnBeads. The file should contain a header line. Coordinates are 0-based, spanning the first and the last coordinate in a site (i.e. end-start = 1 for a CpG). Sites on the - strand are shifted by +1. Here are some example lines (genome assembly hg19):

track name=file_sorted.realign.recal.cpg.filtered.sort.CG.bed type=bedDetail description="CG methylation
chr1 10496 10497 79.69 64 + 10496 10497 180,60,0 0 0
chr1 10524 10525 90.62 64 + 10524 10525 210,0,0 0 0
chr1 864802 864803 58.70 46 + 864802 864803 120,120,0 0 5
chr1 864803 864804 50.00 4 - 864803 864804 90,150,0 1 45
...

bismarkCov
cov files are assumed to have the format as defined by Bismark's coverage file output converted from its bedGraph output (Bismark's bismark2bedGraph module; see the section "Optional bedGraph output in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, cytosine coordinate (again), methylation value in percent, number of methylated reads and the number of unmethylated reads. The file should not contain a header line. Coordinates are 1-based. Strand information does not need to be provided, but is inferred from the coordinates: Coordinates on the - strand specify the C on the - strand (G on the + strand). Coordinates referring to cytosines not in CpG content are automatically discarded. Here are some example lines (genome assembly hg19):

...
chr9 73252 73252 100 1 0
chr9 73253 73253 0 0 1
chr9 73256 73256 100 1 0
chr9 73260 73260 0 0 1
chr9 73262 73262 100 1 0
chr9 73269 73269 100 1 0
...

bismarkCytosine
bed files are assumed to have the format as defined by Bismark's cytosine report output (Bismark's coverage2cytosine module; see the section "Optional genome-wide cytosine report output" in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, the strand, number of methylated reads, number of unmethylated reads, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 1-based. Coordinates on the - strand specify the C on the - strand (G on the + strand). CpG without coverage are allowed, but not required. Here are some example lines (genome assembly hg19):

...
chr22 16050097 + 0 0 CG CGG
chr22 16050098 - 0 0 CG CGA
chr22 16050114 + 0 0 CG CGG
chr22 16050115 - 0 0 CG CGT
...
chr22 16115591 + 1 1 CG CGC
chr22 16117938 - 0 2 CG CGT
chr22 16122790 + 0 1 CG CGC
...

Encode
bed files are assumed to have the format as the ones that can be downloaded from UCSC's ENCODE data portal. A tab-separated file contains: the chromosome name, start coordinate, end coordinate, some identifier, read coverage, the strand, start and end coordinates again (not sure why; we discard this information), some color value, read coverage and the methylation percentage. The file should contain a header line. Coordinates are 0-based. Note that this file format is very similar but not identical to the 'BisSNP' one. Here are some example lines (genome assembly hg19):

track name="SL1815 MspIRRBS" description="HepG2_B1__GC_" visibility=2 itemRgb="On"
chr1 1000170 1000171 HepG2_B1__GC_ 62 + 1000170 1000171 55,255,0 62 6
chr1 1000190 1000191 HepG2_B1__GC_ 62 + 1000190 1000191 0,255,0 62 3
chr1 1000191 1000192 HepG2_B1__GC_ 31 - 1000191 1000192 0,255,0 31 0
chr1 1000198 1000199 HepG2_B1__GC_ 62 + 1000198 1000199 55,255,0 62 10
chr1 1000199 1000200 HepG2_B1__GC_ 31 - 1000199 1000200 0,255,0 31 0
chr1 1000206 1000207 HepG2_B1__GC_ 31 - 1000206 1000207 55,255,0 31 10
...

Can I combine the methylome resources on this site with the data of my samples?

Yes, and it is very easy. You need to copy all data files to a single directory and to merge the sample annotation tables. Just follow the steps below.

  1. Create a new directory to host all data files and the generated reports. In the following steps, we use the directory project.
  2. Download the files data.zip, samples.csv and analysis.xml from a dataset we provide on the methylome resources page.
  3. Unzip the contents of data.zip to project/data. Copy samples.csv to the data directory as well. Keep the file analysis.xml in the parent directory project.
  4. Copy the data files of your samples also to project/data.
  5. Open and modify the file samples.csv by adding the information for your samples to the annotation table. This file is in comma-separated format and can be edited by any spreadsheet software, such as Microsoft Excel or LibreOffice. If you still have little experience with RnBeads, avoid renaming columns because this might affect the subsequent analysis steps.

Once you have added your dataset to the downloaded one, you can start the analysis pipeline using commands similar to the ones provided below:

# Set the working directory
setwd("project")

# Start the analysis pipeline
library(RnBeads)
rnb.run.xml("analysis.xml")

Feel free to experiment with different analysis options by editing the file analysis.xml or setting them in the R session using the function rnb.options().

Why are the option values reset to default when I load a saved session?

The option values are saved and handled internally by the RnBeads package. Therefore, if you save your R session using the function save.image(), the analysis options are not stored. You can copy them to a list, and reset them upon loading, as shown in the example below:

# Saving the current session
RnBeadsOptions <- rnb.options()
save.image(file = "my.analysis.RData")

# Loading a session
library(RnBeads)
load("my.analysis.RData")
do.call(rnb.options, RnBeadsOptions)

How do I tell RnBeads to perform a paired test in the differential analysis module?

Suppose you have a sample annotation table like this one:

sample individual diseaseState
sample_1 John normal
sample_2 John tumor
sample_3 Jane normal
sample_4 Jane tumor
sample_5 George normal
sample_6 George tumor

Further suppose, you want to compare tumor vs normal but with the pairing information by the patient/individual. Then you would apply the following option setting:

rnb.options("differential.comparison.columns"=c("diseaseState"),"columns.pairing"=c("diseaseState"="individual"))

Can I introduce additional sample grouping information for analysis?

After loading, you can add sample annotation information (traits) to an RnBSet object. Use the function addPheno() for this purpose. You can introduce a text string for each sample with the same designation for each group that you want to specify. The newly added column in the annotation table can then be used for grouping. You can either let RnBeads figure out the categories by itself, or explicitly set the corresponding group options (see rnb.options() for details). You can set values to NA for samples that you don't want to include in either of the groups. If you want to specify explicit pairwise comparisons, just use the differential.comparison.columns.all.pairwise option.

Reports & Figures

Can I rescale the images in the figures?

RnBeads typically generates thousands of images in one run of the pipeline, and their resolutions are tailored to the limited space in the HTML reports. In some cases, a high-resolution image can be viewed by clicking on the corresponding image in the report. Examples for such include the heatmaps in the report on methylation profiles, as well as the plots in the report on differential methylation. In other cases, you can use the generated PDF file underlying the plot of interest. There are links to the corresponding PDF images at the figure captions in the reports. PDF files store graphics in vector format, which allows rescaling to any size without loss of quality.

Can I change the background color or other properties of the generated plots?

Some of the visual properties of the images can be specified using RnBeads options such as colors.category and colors.gradient. See the section Analysis Parameter Overview of the vignette for more information.

RnBeads utilizes the package ggplot2 for generating most of the figures. Therefore, many aspects of the plots can be modified by adjusting the corresponding parameters in the default visual theme. As a simple example, executing the following command before starting the analysis pipeline sets the black-and-white theme:

theme_set(theme_bw())

Please check the documentation of ggplot2 for a detailed description of themes. We can also recommend an online quick reference on the subject, put together by members of the Sape research group at the University of Lugano.