vignettes_old/s8.oncoprint.Rmd
451e6d73
 
 <!--
 %\VignetteEngine{knitr}
12e85497
 %\VignetteIndexEntry{8. OncoPrint}
451e6d73
 -->
 
 OncoPrint
 ========================================
 
 **Author**: Zuguang Gu ( z.gu@dkfz.de )
 
 **Date**: `r Sys.Date()`
 
 -------------------------------------------------------------
 
 ```{r global_settings, echo = FALSE, message = FALSE}
 library(markdown)
 options(markdown.HTML.options = c(options('markdown.HTML.options')[[1]], "toc"))
 
 library(knitr)
 knitr::opts_chunk$set(
     error = FALSE,
     tidy  = FALSE,
     message = FALSE,
     fig.align = "center",
     fig.width = 5,
     fig.height = 5)
 options(markdown.HTML.stylesheet = "custom.css")
 
 options(width = 100)
 ```
 
 <a href="http://www.cbioportal.org/faq.jsp#what-are-oncoprints">OncoPrint</a> is a way to visualize 
 multiple genomic alteration events by heatmap. Here the **ComplexHeatmap** package provides a `oncoPrint()` function.
 Besides the default style which is provided by <a href="http://www.cbioportal.org/index.do">cBioPortal</a>, there are
 additional barplots at both sides of the heatmap which show numbers of different alterations for
 each sample and for each gene. Also with the functionality of **ComplexHeatmap**, you can control oncoPrint with
 more flexibilities.
 
f6325167
 ## General settings
 
451e6d73
 There are two different forms of input data. The first is represented as a matrix in which 
 element would include multiple alterations in a form of a complex string. In follow example,
 'g1' in 's1' has two types of alterations which are 'snv' and 'indel'.
 
 ```{r}
 mat = read.table(textConnection(
0286e386
 ",s1,s2,s3
 g1,snv;indel,snv,indel
 g2,,snv;indel,snv
 g3,snv,,indel;snv"), row.names = 1, header = TRUE, sep = ",", stringsAsFactors = FALSE)
451e6d73
 mat = as.matrix(mat)
 mat
 ```
 
ac4a2fca
 In this case, we need to define a function to extract different alteration types and pass the function
451e6d73
 to `get_type` argument. The function should return a vector of alteration types.
 
ac4a2fca
 For one gene in one sample, since different alteration types may be drawn into one same grid in the heatmap, 
 we need to define how to add the graphics by self-defined functions.
 Here if the graphics have no transparency, orders of how to add
451e6d73
 graphics matters. In following example, snv are first drawn and then the indel. You can see rectangles
 for indels are actually smaller than that for snvs so that you can visualiza both snvs and indels if they
ac4a2fca
 are in a same grid. Names in the list of functions should correspond to the alteration types (here, `snv` and `indel`).
451e6d73
 
 For the self-defined graphic function, there should be four arguments which are positions of the grids 
 on the heatmap (`x` and `y`), and widths and heights of the grids (`w` and `h`).
 
 Colors for different alterations are defined in `col`. It should be a named vector for which names correspond
ac4a2fca
 to alteration types. It is used to generate the barplots and the legends.
451e6d73
 
 
 ```{r}
e0ad503d
 library(ComplexHeatmap)
7d5a7fc0
 col = c(snv = "red", indel = "blue")
451e6d73
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = list(
 		snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = col["snv"], col = NA)),
 		indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = col["indel"], col = NA))
 	), col = col)
451e6d73
 ```
 
f6325167
 In above example, form of the string (e.g. `snv;indel`) correlates to how you define `get_type`. If the string
 has the form of `snv|indel|amp`, `get_type` should be defined as `function(x) strsplit(x, "|")[[1]]`.
 
 If you are pulzzed by how to generated the matrix, there is a second way. The second type of input data is a list of matrix for which each matrix contains binary value representing
ac4a2fca
 whether the alteration is absent or present. The list should have names which correspond to the alteration
451e6d73
 types.
 
 ```{r}
 mat_list = list(snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
6fbbae6f
 	            indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
451e6d73
 rownames(mat_list$snv) = rownames(mat_list$indel) = c("g1", "g2", "g3")
 colnames(mat_list$snv) = colnames(mat_list$indel) = c("s1", "s2", "s3")
 mat_list
 ```
 
6fbbae6f
 `oncoPrint()` expects all matrix in `mat_list` having same row names and column names. Users can use `unify_mat_list()`
 to adjust the matrix list.
 
 ```{r}
0286e386
 mat_list2 = mat_list
 mat_list2$indel = mat_list2$indel[1:2, 1:2]
 mat_list2
 mat_list2 = unify_mat_list(mat_list2)
 mat_list2
6fbbae6f
 ```
 
f6325167
 Pass `mat_list` to `oncoPrint()`:
451e6d73
 
 ```{r}
 oncoPrint(mat_list,
7d5a7fc0
 	alter_fun = list(
 		snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = col["snv"], col = NA)),
 		indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = col["indel"], col = NA))
 	), col = col)
451e6d73
 ```
 
7d5a7fc0
 In above examples, `alter_fun` is a list of functons which add graphics layer by layer. Graphics
 can also be added in a grid-by-grid style by specifying `alter_fun` as a single function. The difference
 from the function list is now `alter_fun` should accept a fifth argument which is a logical vector. 
 This logical vector shows whether different alterations exist for current gene in current sample.
 
f6325167
 Let's assume in a grid there is only snv event, `v` for this grid is:
 
 ```{r, echo = FALSE}
 print(c("snv" = TRUE, "indel" = FALSE))
 ```
 
7d5a7fc0
 ```{r}
 oncoPrint(mat_list,
 	alter_fun = function(x, y, w, h, v) {
 		if(v["snv"]) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = col["snv"], col = NA))
 		if(v["indel"]) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = col["indel"], col = NA))
 	}, col = col)
 ```
 
 If `alter_fun` is set as a single function, customization can be more flexible. In following example,
21ca48b4
 rectangles always fill the whole grid evenly.
7d5a7fc0
 
 ```{r}
 oncoPrint(mat_list,
     alter_fun = function(x, y, w, h, v) {
 		n = sum(v)
 		h = h*0.9
6d8c4ef4
 		# use `names(which(v))` to correctly map between `v` and `col`
 		if(n) grid.rect(x, y - h*0.5 + 1:n/n*h, w*0.9, 1/n*h, 
 			gp = gpar(fill = col[names(which(v))], col = NA), just = "top")
7d5a7fc0
     }, col = col)
 ```
 
fcaccd8a
 With a single function for `alter_fun`, you can define different graphics for different alterations.
 In following plot, you need to adjust the height of the whole plot to make sure the height for each cell
 is more than double of the width.
 
 ```{r, width = 6, height = 7}
 snv_fun = function(x, y, w, h) {
 	grid.rect(x, y, w, h, gp = gpar(fill = col["snv"], col = NA))
 }
 
 indel_fun = function(x, y, r) {
 	grid.circle(x, y, r, gp = gpar(fill = col["indel"], col = NA))
 }
 
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
     alter_fun = function(x, y, w, h, v) {
         n = sum(v)
         w = convertWidth(w, "cm")*0.9
         h = convertHeight(h, "cm")*0.9
         l = min(unit.c(w, h))
 
         grid.rect(x, y, w, h, gp = gpar(fill = "grey", col = NA))
 
         if(n == 0) return(NULL)
         if(n == 1) {
         	if(names(which(v)) == "snv") snv_fun(x, y, l, l)
         	if(names(which(v)) == "indel") indel_fun(x, y, l*0.5)
         } else if(n == 2) {
         	snv_fun(x, y-0.25*h, l, l)
         	indel_fun(x, y+0.25*h, l*0.5)
         }
     }, col = col)
 ```
 
f6325167
 If `alter_fun` is specified as a list, the order of the elements controls the order of adding graphics. 
 There is a special element named `background` which defines how to draw background and it should be always put
 as the first element in the `alter_fun` list. In following example, backgrond color is changed to light green with borders.
 
 ```{r}
 oncoPrint(mat_list,
 	alter_fun = list(
 		background = function(x, y, w, h) grid.rect(x, y, w, h, gp = gpar(fill = "#00FF0020")),
 		snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = col["snv"], col = NA)),
 		indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = col["indel"], col = NA))
 	), col = col)
 ```
 
 Or just remove the background:
 
 ```{r}
 oncoPrint(mat_list,
 	alter_fun = list(
 		background = NULL,
 		snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = col["snv"], col = NA)),
 		indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = col["indel"], col = NA))
 	), col = col)
 ```
 
fcaccd8a
 You can customize the oncoprot by self-defining `alter_fun`. But be careful you must
 convert `w` and `h` to absolute units.
 
 ```{r}
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
     alter_fun = list(
         snv = function(x, y, w, h) {
         	w = convertWidth(w, "cm")
         	h = convertHeight(h, "cm")
         	l = min(unit.c(w, h))
         	grid.rect(x, y, l*0.9, l*0.9, gp = gpar(fill = col["snv"], col = NA))
         },
         indel = function(x, y, w, h) {
         	w = convertWidth(w, "cm")
         	h = convertHeight(h, "cm")
         	r = min(unit.c(w, h))*0.5
         	grid.circle(x, y, r*0.9, gp = gpar(fill = col["indel"], col = NA))
         }
     ), col = col)
 ```
 
f6325167
 ## Apply to cBioPortal dataset
 
451e6d73
 Now we make an oncoPrint with a real-world data. The data is retrieved from [cBioPortal](http://www.cbioportal.org/). 
 Steps for getting the data are as follows:
 
 1. go to http://www.cbioportal.org
 2. search Cancer Study: "Lung Adenocarcinoma Carcinoma" and select: "Lung Adenocarcinoma Carcinoma (TCGA, Provisinal)"
 3. In "Enter Gene Set" field, select: "General: Ras-Raf-MEK-Erk/JNK signaling (26 genes)"
 4. submit the form
 
 In the results page,
 
 5. go to "Download" tab, download text in "Type of Genetic alterations across all cases"
 
 The order of samples can also be downloaded from the results page,
 
 6. go to "OncoPrint" tab, move the mouse above the plot, click "download" icon and click "Sample order"
 
 First we read the data and do some pre-processing.
 
 ```{r}
 mat = read.table(paste0(system.file("extdata", package = "ComplexHeatmap"), 
 	"/tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"), 
 	header = TRUE,stringsAsFactors=FALSE, sep = "\t")
 mat[is.na(mat)] = ""
 rownames(mat) = mat[, 1]
 mat = mat[, -1]
 mat=  mat[, -ncol(mat)]
 mat = t(as.matrix(mat))
98fea502
 mat[1:3, 1:3]
451e6d73
 ```
 
 There are three different alterations in `mat`: `HOMDEL`, `AMP` and `MUT`. We first 
ac4a2fca
 define how to add graphics which correspond to different alterations. 
451e6d73
 
 ```{r}
7d5a7fc0
 alter_fun = list(
451e6d73
 	background = function(x, y, w, h) {
 		grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "#CCCCCC", col = NA))
 	},
 	HOMDEL = function(x, y, w, h) {
 		grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "blue", col = NA))
 	},
 	AMP = function(x, y, w, h) {
 		grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "red", col = NA))
 	},
 	MUT = function(x, y, w, h) {
 		grid.rect(x, y, w-unit(0.5, "mm"), h*0.33, gp = gpar(fill = "#008000", col = NA))
 	}
 )
 ```
 
 Also colors for different alterations which will be used for barplots.
 
 ```{r}
 col = c("MUT" = "#008000", "AMP" = "red", "HOMDEL" = "blue")
 ```
 
 Make the oncoPrint and adjust heatmap components such as the title and the legend.
 
 ```{r, fig.width = 12, fig.height = 8}
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = alter_fun, col = col, 
451e6d73
 	column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
12e85497
 	heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
 		labels = c("Amplification", "Deep deletion", "Mutation")))
 ```
 
98fea502
 As you see, the genes and samples are sorted automatically. Rows are sorted based on the frequency
 of the alterations in all samples and columns are sorted to visualize the mutual exclusivity across genes
 based on the "memo sort" method which is
 kindly provided by [B. Arman Aksoy](https://gist.github.com/armish/564a65ab874a770e2c26). If you want
 to turn off the default sorting, set `row_order` or `column_order` to `NULL`.
 
6fbbae6f
 
 By default, if one sample has no alteration, it will still remain in the heatmap, but you can set
 `remove_empty_columns` to `TRUE` to remove it:
98fea502
 
 ```{r, fig.width = 12, fig.height = 8}
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = alter_fun, col = col, 
6fbbae6f
 	remove_empty_columns = TRUE,
98fea502
 	column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
 	heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
 		labels = c("Amplification", "Deep deletion", "Mutation")))
 ```
 
6fbbae6f
 As the normal `Heatmap()` function, `row_order` or `column_order` can be assigned with a vector of 
 orders (either numeric or character). Following the order of samples are gathered from cBio as well.
 You can see the difference for the sample order between 'memo sort' and the method used by cBio.
12e85497
 
c9d5db63
 Also notice how we move the legend to the bottom of the oncoPrint.
 
12e85497
 ```{r, fig.width = 12, fig.height = 8}
6fbbae6f
 sample_order = scan(paste0(system.file("extdata", package = "ComplexHeatmap"), 
     "/sample_order.txt"), what = "character")
c9d5db63
 ht = oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = alter_fun, col = col, 
6fbbae6f
 	row_order = NULL, column_order = sample_order,
 	remove_empty_columns = TRUE,
12e85497
 	column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
451e6d73
 	heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
c9d5db63
 		labels = c("Amplification", "Deep deletion", "Mutation"),
 		nrow = 1, title_position = "leftcenter"))
 draw(ht, heatmap_legend_side = "bottom")
451e6d73
 ```
 
 `oncoPrint()` actually returns a `HeatmapList` object, so you can add more Heatmaps or row annotations
 to it to visualize more complicated information.
 
 Following example splits the heatmap into two halves and add a new heatmap to the right.
 
 ```{r, fig.width = 12, fig.height = 8}
 ht_list = oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = alter_fun, col = col, 
6fbbae6f
 	remove_empty_columns = TRUE,
451e6d73
 	column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
 	heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
 		labels = c("Amplification", "Deep deletion", "Mutation")),
 	split = sample(letters[1:2], nrow(mat), replace = TRUE)) +
f6325167
 Heatmap(matrix(rnorm(nrow(mat)*10), ncol = 10), name = "expr", show_column_dend = FALSE, width = unit(4, "cm"))
451e6d73
 draw(ht_list, row_sub_title_side = "left")
 ```
ac4a2fca
 
ea0fe7de
 In some scenarios, you don't want to show some of the alterations on the barplots, you can set it by `barplot_ignore` argument.
 
 `````{r, fig.width = 12, fig.height = 8}
 oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
7d5a7fc0
 	alter_fun = alter_fun, col = col, 
ea0fe7de
 	remove_empty_columns = TRUE,
 	column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
 	heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
 		labels = c("Amplification", "Deep deletion", "Mutation")),
 	barplot_ignore = "AMP")
 ```
 
21ca48b4
 If you make the plot in an interactive device (e.g. `X11` or Rstudio IDE), the generation of the plot
 may be very slow. In this case, we suggest users to directly save the plot to a separate file 
 (e.g. using `png()` or `pdf()`) and then visualize the plot afterwards.
 
ea0fe7de
 
ac4a2fca
 ## Session info
 
 ```{r}
 sessionInfo()
 ```