May 25, 2021

GO bubble plot with David and ggplot2

R

GO

ggplot2

So for a long time, I want to find a way to address my GO problem in my life forever. However, eventually I did not found one, some R packages seems not working, and some other online tools are too hard to understand for me. So eventaully I found David online tool is a good choice. It's easy to use, easy to implement, and with ggplot2 I can draw nice plot quickly.

1. Prepare gene list

Firstly, we need to prepare our gene list to David, no P value is need, in my case, I directly used clipr to copy the object into system clipboard, which is so easy and nice. clipr could directly past across server, like my previous post said.

df <- TCLT.volcano$df
tmp <- df[df$FDR <= 0.05 & !is.na(df$symbol), ]
clipr::write_clip(tmp$feature)

The gene feature is Entrz_ID here, you may use other gene name like Esemble name, Symbol .etc:

> head(tmp$feature)
[1] "105374119" "109729182" "727897"    "441478"    "406961"    "100423037"

2. Paste into David

Then I paste it into David online tool.

Then select Functional Annotation Chart, slightly modify the parameters, like ask to return Fold Change for visualisation. etc. Then select Retrun Using Options. Finally on the top-right there is button showing Download File, which is a URL we can directly load via R.

3. Calcualte Z-Score with GOplot R pacakge

Finally we can directly load the URL content for visualisation. Firstly I want to calculate the z-score for each enriched pathway. Here I use the circle_dat() function from GoPlot package.

library("GOplot")

David <- read.csv("https://david.ncifcrf.gov/data/download/chart_AA6917612AEE1620101903486.txt", sep="\t")

goDavid <- David[,c("Category", "Term", "Genes", "FDR")]
colnames(goDavid) <- c("category", "term", "genes", "adj_pval")
geneList <- tmp[,c("feature", "Fold", "p.value", "FDR")]
colnames(geneList) <- c("ID", "logFC", "P.Value", "adj.P.Val")
circ <- circle_dat(goDavid, geneList)

After calculation, the circ object looks like below:

> head(circ)
     category                 term count  genes      logFC     adj_pval
1 UP_KEYWORDS Alternative splicing   600 730112 -0.9857185 2.719938e-06
2 UP_KEYWORDS Alternative splicing   600  23604  1.2144389 2.719938e-06
3 UP_KEYWORDS Alternative splicing   600 340371 -1.6496193 2.719938e-06
4 UP_KEYWORDS Alternative splicing   600   9180 -3.9085765 2.719938e-06
5 UP_KEYWORDS Alternative splicing   600 348093 -1.4192143 2.719938e-06
6 UP_KEYWORDS Alternative splicing   600   9187 -3.7186476 2.719938e-06
     zscore
1 -21.47386
2 -21.47386
3 -21.47386
4 -21.47386
5 -21.47386
6 -21.47386
>

Note that we should only keep unique rows. Like running below two code. Also, I annotated BP/CC/MF.

df <- circ[,c("category", "term", "count", "adj_pval", "zscore")]
df <- unique(df)
df$GO <- "Other"
df$GO[df$category == "GOTERM_BP_DIRECT"] <- "BP"
df$GO[df$category == "GOTERM_CC_DIRECT"] <- "CC"
df$GO[df$category == "GOTERM_MF_DIRECT"] <- "MF"

4. Draw bubble plot with ggplot2

Finally, I use ggplot to draw simple bubble plot:

library("ggplot2")
library("ggrepel")

ggplot(df, aes(x = zscore, y = -log10(adj_pval))) +
  geom_point(aes(color = GO, size = count), alpha = 0.5) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07", "#999999")) +
  scale_size(range = c(1, 15)) +  # Adjust the range of points size
  theme_minimal(base_size = 14) +
  geom_text_repel(data = df[df$adj_pval <= 0.05, ], aes(label = term),
                  size = 3.5, box.padding = unit(0.35, "lines"), point.padding = unit(0.3,"lines"))