ontologyIndex
is the
foundation of the ‘ontologyX’ packages:
ontologyIndex
, for representing ontologies as R objects
and enabling simple queries,ontologySimilarity
, for computing semantic similarity
between ontological terms and annotations,ontologyPlot
for visualising sets of ontological terms
with various graphical options.The functionality of the ontologyIndex
package is
centered around ontology_index
objects: simple R
representations of ontologies as list
s and
vector
s of term properties (ID, label, etc.) which are
named by term so that simple look-ups by term can be performed.
Ontologies encoded in OBO format can be read into R as
ontology_index
es using the function
get_ontology
(see the vignette ‘Creating an
ontology_index’). Ontologies in OWL syntax can be converted to OBO
format using freely available software (e.g. the ROBOT command line
tool: https://github.com/ontodev/robot). The package comes
with three such ready-made ontology_index
objects:
hpo
, mpo
and go
, encapsulating
the Human Phenotype Ontology (HPO), Mammalian Phenotype Ontology (MPO)
and Gene Ontology (GO) respectively, each loadable with
data
. Here we’ll demonstrate the package using the HPO.
The ontology_index
object is just a list of ‘vectors and
lists’ of term properties, indexed by the IDs of the terms:
## property class
## 1 id character
## 2 name character
## 3 parents list
## 4 children list
## 5 ancestors list
## 6 obsolete logical
The properties which all ontology_index
objects contain
are id
, name
, parents
,
children
and ancestors
as these are the
properties which the functions in the ontologyIndex
package
operate on. However, additional properties per term - for example custom
annotation, or whatever terms are tagged with in the original OBO file -
can also be read in and queried in the same way (see the vignette
‘Creating an ontology_index’).
The children
and ancestors
properties are
determined by the parent
property, with the
ancestors
of a term derived by propagating the ‘is parent’
relation (i.e. with the terms for which the relation holds given in the
parent
property). When reading an
ontology_index
from an OBO file, the ‘is parent’ relation
defaults to “is_a”. However, this can be set to any relation or
combination of relations (e.g. “part_of” or both “is_a” and “part_of” -
see ‘Creating an ontology_index’ and ?get_ontology
for more
details). Usage of phrases involving ‘ancestors’ and ‘descendants’ of
terms in this document and in the names of functions exported by the
package refer to the hierarchy determined by this parent
property.
You can use the function get_term_property
to query the
ontology_index
object, and retrieve a particular attribute
for a single term. For instance:
## HP:0000001
## "All"
## HP:0000118
## "Phenotypic abnormality"
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001872
## "Abnormality of thrombocytes"
## HP:0011875
## "Abnormal platelet morphology"
## HP:0011873
## "Abnormal platelet count"
## HP:0001873
## "Thrombocytopenia"
However you can also look up properties for a given term using
[
and [[
as appropriate, since an
ontology_index
just a list
. This is the best
way to use the ontology_index
if you are operating on
multiple terms as it’s faster.
## HP:0001873
## "Thrombocytopenia"
## HP:0001873
## "HP:0001873"
## HP:0000001 HP:0000118 HP:0001871 HP:0001872 HP:0011875 HP:0011873
## "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011875" "HP:0011873"
## HP:0001873
## "HP:0001873"
## HP:0000001
## "All"
## HP:0000118
## "Phenotypic abnormality"
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001872
## "Abnormality of thrombocytes"
## HP:0011875
## "Abnormal platelet morphology"
## HP:0011873
## "Abnormal platelet count"
## HP:0001873
## "Thrombocytopenia"
A set of terms (i.e. a character
vector of term IDs) may
contain redundant terms. The function minimal_set
removes
such terms leaving a minimal set in the sense of the ontology’s
directed acyclic graph.
## HP:0001871
## "Abnormality of blood and blood-forming tissues"
## HP:0001873
## "Thrombocytopenia"
## HP:0011877
## "Increased mean platelet volume"
## HP:0001873 HP:0011877
## "Thrombocytopenia" "Increased mean platelet volume"
To find all the ancestors of a set of terms, i.e. all the terms which
are an ancestor of any term in the given set, one can use the
get_ancestors
function:
## [1] "HP:0000001" "HP:0000118" "HP:0001871" "HP:0001872" "HP:0011875"
## [6] "HP:0011873" "HP:0001873" "HP:0011876" "HP:0011877"
There are functions which allow set operations with respect to
descendancy: intersection_with_descendants
,
exclude_descendants
and prune_descendants
.
Each function accepts a set of terms terms
and a set of
root terms roots
.
intersection_with_descendants
transforms
terms
by retaining only those which are either in the set
roots
or amongst the descendants of a term in
roots
.exclude_descendants
transforms terms
by
removing terms which are either in the set roots
or amongst
the descendants of a term in roots
.prune_descendants
transforms terms
by
replacing terms which are either in the set roots
or
amongst the descendants of a term in roots
with the
associated set of terms in roots
.For more details see the help page for the individual functions,
e.g. ?exclude_descendants
. Note that to perform analagous
operations with respect to sets of ancestors, one can use the
get_ancestors
function in conjunction with the base R set
functions, e.g. setdiff
and intersect
.
The packages ontologySimilarity
and
ontologyPlot
can be used to calculate semantic similarity
between and visualise terms and sets of terms respectively: see the
corresponding vignettes for more details.