ontology_index
An ontology_index
can
be obtained by loading a pre-existing one - for example by calling
data(hpo)
, reading ontologies encoded in OBO format into R
using the function get_ontology
, or by calling the function
ontology_index
explicitly. An ontology_index
is a named list
of properties for each term, where each
property is represented by a list
or vector
.
Each of these property lists is named by term, facilitating simple
lookups of properties by term name. All valid
ontology_index
objects contain id
,
name
, parents
, children
and
ancestors
properties for each term. Additional properties
can be added to an ontology_index
, although they are not
required by functions in the package. For details on how to use an
ontology_index
, see the ‘Introduction to ontologyX’
vignette.
The function get_ontology
can read ontologies encoded in
OBO format into R as ontology_index
objects. By default,
the properties id
, name
,
obsolete
, parents
, children
and
ancestors
are populated.
To call the function:
The properties parents
, children
and
ancestors
are determined by a given set of relations
between terms: the propagate_relationships
argument (“is_a”
by default). Thus the parents
of a term are set of terms to
which it is related by any type of relation contained in
propagate_relationships
; the children
are
those terms related by the inverse relations and ancestors
are those obtained by propagating the
propagate_relationships
relations (note: the resulting set
includes the term itself).
The relations given in the propagate_relationships
argument should be named as they are labelled in the OBO file. In order
to see a complete list of relations used in an OBO file, pass the file’s
path to the function get_relation_names
. E.g. for the gene
ontology:
## [1] "is_a" "regulates" "part_of"
## [4] "has_part" "happens_during" "negatively_regulates"
## [7] "positively_regulates" "occurs_in" "ends_during"
Additional information is often present in the original file - for
example definitions, labelled by the def
tag in OBO format.
get_ontology
decides which properties to export based on
the extract_tags
argument. By default
extract_tags="minimal"
, resulting in only the properties
id
, name
, obsolete
,
parents
, children
and ancestors
being exported. It is possible to include all properties given in the
file by setting extract_tags="everything"
. The names of the
properties included in the returned ontology_index
are then
the same as the names of the tags in OBO format.
All properties are stored in the returned ontology_index
as lists, except for the following, which are coerced to
character
or logical
vectors as appropriate:
"id", "name", "def", "comment", "obsolete", "created_by", "creation_date"
.
Further properties can be mapped to vectors if required, modifying
the returned ontology_index
as a list, e.g.
Modifying an existing ontology_index
to add term
properties is the same as adding to a list
or
data.frame
. In the example below, we add the number of
children for each term:
In the same manner, a valid ontology_index
can be built
up from scratch as a list, of course requiring that the standard
properties are included for use with functions in
ontologyIndex
.
In order to read in ontologies in OWL syntax, it is recommended to first convert to OBO format, for example using the ROBOT command line tool https://github.com/ontodev/robot.
If the option merge_equivalent_terms
in
get_ontology
/get_OBO
is set to
TRUE
(the default), then terms marked
equivalent_to
target terms are merged and properties
aggregated (except for those listed above coerced to vectors, in which
case the values that would be assigned to the target term are used).
ontology_index
explicitlyThe function ontology_index
can be used to create an
object with class ontology_index
. This could be useful for
instance if the user wished to convert a directed acyclic graph (DAG)
with edges representing sub/super-class relationships into an
ontology_index
. It is similar to the function
data.frame
: it accepts a variable number of arguments
corresponding to properties for ontological terms, which must each be a
vector or list of the same length (except the version
argument, which can be any object and should contain any information
about the version of the ontology). The only mandatory argument is the
parents
argument, and should be a list
of
character
vectors giving the IDs of the
‘parents’/‘superclasses’ of each term. The term IDs can either be
supplied as the names
attribute of the parents
or as a separate id
argument of the same length as
parents
. The human-readable term names can be passed as the
names
argument (defaults to the same as id
).
As usual the children
and ancestors
properties
are derived from the parents
. Warnings are generated if any
IDs given in the parents
argument are not in the
id
argument.
A simple invocation:
animal_superclasses <- list(animal=character(0), mammal="animal", cat="mammal", fish="animal")
animal_ontology <- ontology_index(parents=animal_superclasses)
unclass(animal_ontology)
## $id
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $name
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $parents
## $parents$animal
## character(0)
##
## $parents$mammal
## [1] "animal"
##
## $parents$cat
## [1] "mammal"
##
## $parents$fish
## [1] "animal"
##
##
## $children
## $children$animal
## [1] "mammal" "fish"
##
## $children$mammal
## [1] "cat"
##
## $children$cat
## character(0)
##
## $children$fish
## character(0)
##
##
## $ancestors
## $ancestors$animal
## [1] "animal"
##
## $ancestors$mammal
## [1] "animal" "mammal"
##
## $ancestors$cat
## [1] "animal" "mammal" "cat"
##
## $ancestors$fish
## [1] "animal" "fish"
##
##
## $obsolete
## animal mammal cat fish
## FALSE FALSE FALSE FALSE
For more details, see the help page for the function,
?ontology_index
.