In the plot generated by
simplifyGO(), there is a word cloud annotation attached to the heatmap which shows the general biological functions of the GO terms in each cluster. In this vignette, I will demonstrate a general function
anno_word_cloud() that generates word cloud annotations to work with the ComplexHeatmap package.
anno_word_cloud() function basically a wrapper of two components: 1. constructing the word cloud (with
word_cloud_grob()) and 2. constructing an annotation (with
ComplexHeatmap::anno_link()) that can be used in
anno_word_cloud() has two main arguments
align_to defines how to align the annotation to the heatmap. Similar as in
ComplexHeatmap::anno_link(), the value of
align_to can be a list of row indices where each index vector in the list corresponds to a word cloud. The value of
align_to can also be a categorical vector where rows with the same level correspond to a same word cloud. If
align_to is a categorical vector and
term is a list, names of
term should have overlap to the levels in
align_to is set as a categorical vector, normally the same value is set to
row_split in the main heatmap so that each row slice can correspond to a word cloud.
term defines the description texts used for constructing the word clouds. The value should have the same format as
align_to is a list,
term should also be a list. In this case, the length of vectors in
term is not necessarily the same as in
length(term[]) is not necessarily equal to
length(align_to[]. In other words,
term[[i]] can contain arbitrary text as long as
length(term) == length(align_to). If
align_to is a categorical vector,
term should only be a character vector with the same length as
Other arguments in
anno_word_cloud() are straightforward to understand:
exclude_words: The words excluded from word cloud.
max_words: Maximal number of words in each word cloud.
word_cloud_grob_param: Graphic parameters send to
word_cloud_grob(). The value should be a named list.
fontsize_range: Range of the font size. The value is a vector of length two.
bg_gp: Graphic parameters for controlling the background.
side: Side of the annotation relative to the heatmap. The value should be either “right” or “left”.
Specifically for GO terms, users do not need to provide the full GO descriptions, instead, they can only provide the GO IDs and the descriptions will be automatically extracted internally. In this case, users can use the helper function
anno_word_cloud_from_GO() and set the GO ID list via the
go_id argument. The format of
go_id is similar as
anno_word_cloud(), either a list of GO IDs or as a vector. Again note, if
go_id is a list, e.g.
length(go_id[]) is not necessarily equal to
In the first example, I generate 10 word clouds and attach to the heatmap which is split into 10 groups by rows.
## List of 10 ## $ a: chr "Diamond Shamrock Corp said that\neffective today it had cut its"| __truncated__ ## $ b: chr "OPEC may be forced to meet before a\nscheduled June session to "| __truncated__ ## $ c: chr "Texaco Canada said it lowered the\ncontract price it will pay f"| __truncated__ ## $ d: chr "Marathon Petroleum Co said it reduced\nthe contract price it wi"| __truncated__ ## $ e: chr "Houston Oil Trust said that independent\npetroleum engineers co"| __truncated__ ## $ f: chr "Kuwait\"s Oil Minister, in remarks\npublished today, said there"| __truncated__ ## $ g: chr "Indonesia appears to be nearing a\npolitical crossroads over me"| __truncated__ ## $ h: chr "Saudi riyal interbank deposits were\nsteady at yesterday's high"| __truncated__ ## $ i: chr "The Gulf oil state of Qatar, recovering\nslightly from last yea"| __truncated__ ## $ j: chr "Saudi Arabian Oil Minister Hisham Nazer\nreiterated the kingdom"| __truncated__
The value for the first argument of
anno_word_cloud() can also be explicitly converted into a list:
side can be set to
"left" to put the annotation on the left of the heatmap:
The second example is more specific to GO terms. The following example visualizes an gene expression matrix where rows are split into three groups by k-means clustering. GO enrichment analysis was applied to the genes in the three groups separately. Variable
km contains the k-means classification.
go_list contains list of IDs of significant GO terms.
##  3 1 4 4 3 3
## List of 4 ## $ 1: chr [1:160] "GO:0033993" "GO:0009725" "GO:0014070" "GO:0019725" ... ## $ 2: chr [1:784] "GO:1903047" "GO:0006259" "GO:0044772" "GO:0044770" ... ## $ 3: chr [1:241] "GO:0051050" "GO:0008015" "GO:0003013" "GO:0009725" ... ## $ 4: chr [1:732] "GO:0002274" "GO:0002366" "GO:0002263" "GO:0036230" ...
Just make sure names of
go_list should correspond to the levels in
km. Adding word cloud annotations for the enriched GO terms is very straightforward:
library(circlize) Heatmap(t(scale(t(sig_mat))), name = "z-score", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")), show_row_names = FALSE, show_column_names = FALSE, row_title = NULL, column_title = NULL, show_row_dend = FALSE, show_column_dend = FALSE, row_split = km) + rowAnnotation(go = anno_word_cloud_from_GO(km, go_list, max_words = 30))
It seems the major keywords are very similar in the four groups, these words can be excluded:
library(circlize) Heatmap(t(scale(t(sig_mat))), name = "z-score", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")), show_row_names = FALSE, show_column_names = FALSE, row_title = NULL, column_title = NULL, show_row_dend = FALSE, show_column_dend = FALSE, row_split = km) + rowAnnotation(go = anno_word_cloud_from_GO(km, go_list, max_words = 20, exclude_words = c("regulation", "process", "response", "positive", "cell")))