ontologySimliarity
comes with data objects encapsulating the GO (Gene Ontology) annotation
of genes [1]:
gene_GO_terms
, a list of character vectors of term IDs
of GO terms annotating each gene, named by gene,GO_IC
, a numeric vector containing the information
content of Gene Ontology terms based on frequencies of annotation in
gene_GO_terms
.These data objects can be loaded in an R session using
data(gene_GO_terms)
and data(GO_IC)
respectively. To process these objects, one can load the
ontologyIndex
package and a data object encapsulating the
Gene Ontology.
Users can simply subset the gene_GO_terms
object to
obtain GO annotation for their genes of interest, using a
character
vector of gene names. In this example, we’ll use
the BEACH domain containing gene family [2].
beach <- gene_GO_terms[c("LRBA", "LYST", "NBEA", "NBEAL1", "NBEAL2", "NSMAF", "WDFY3", "WDFY4", "WDR81")]
To see the names of the terms annotating a particular gene, the
go
ontology_index
object can be used, using
the term IDs to subset the name
slot. For example, for
"LRBA"
:
## GO:0000423
## "mitophagy"
## GO:0034497
## "protein localization to phagophore assembly site"
## GO:0005765
## "lysosomal membrane"
## GO:0005789
## "endoplasmic reticulum membrane"
## GO:0005794
## "Golgi apparatus"
## GO:0005886
## "plasma membrane"
## GO:0019901
## "protein kinase binding"
## GO:0005829
## "cytosol"
The gene_GO_terms
object contains annotation relating to
all branches of the Gene Ontology,
i.e. "cellular_component"
,
"biological_process"
and "molecular_function"
.
If you are only interested in one branch - for example
"cellular_component"
, you can use the
ontologyIndex
package’s function
intersection_with_descendants
to subset the annotation.
cc <- go$id[go$name == "cellular_component"]
beach_cc <- lapply(beach, function(x) intersection_with_descendants(go, roots=cc, x))
data.frame(check.names=FALSE, `#terms`=sapply(beach, length), `#CC terms`=sapply(beach_cc, length))
## #terms #CC terms
## LRBA 8 5
## LYST 15 3
## NBEA 6 4
## NBEAL1 4 2
## NBEAL2 8 5
## NSMAF 10 2
## WDFY3 18 14
## WDFY4 6 2
## WDR81 13 6
A pairwise gene semantic similarity matrix can be computed simply
using the function get_sim_grid
, and passing an
ontology_index
object, information content and annotation
list as parameters (see ?get_sim_grid
for more details).
Here we plot the resulting similarity matrix using the
paintmap
package.
sim_matrix <- get_sim_grid(
ontology=go,
information_content=GO_IC,
term_sets=beach)
library(paintmap)
paintmap(colour_matrix(sim_matrix))
One can test whether a subset of genes is significantly similar as a
group in the context of a larger collection by using the function
get_sim_p_from_ontology
to compute a p-value of
similarity. For example here, we will compare the significance of the
mean pairwise gene similarity within the BEACH group against randomly
selected subsets of genes of the same size chosen from the
gene_GO_anno
set.
get_sim_p_from_ontology(
ontology=go,
information_content=GO_IC,
term_sets=gene_GO_terms,
group=names(beach)
)
## [1] 0.0008799912