ontology_indexAn ontology_index can
be obtained by loading a pre-existing one - for example by calling
data(hpo), reading ontologies encoded in OBO format into R
using the function get_ontology, or by calling the function
ontology_index explicitly. An ontology_index
is a named list of properties for each term, where each
property is represented by a list or vector.
Each of these property lists is named by term, facilitating simple
lookups of properties by term name. All valid
ontology_index objects contain id,
name, parents, children and
ancestors properties for each term. Additional properties
can be added to an ontology_index, although they are not
required by functions in the package. For details on how to use an
ontology_index, see the ‘Introduction to ontologyX’
vignette.
The function get_ontology can read ontologies encoded in
OBO format into R as ontology_index objects. By default,
the properties id, name,
obsolete, parents, children and
ancestors are populated.
To call the function:
The properties parents, children and
ancestors are determined by a given set of relations
between terms: the propagate_relationships argument (“is_a”
by default). Thus the parents of a term are set of terms to
which it is related by any type of relation contained in
propagate_relationships; the children are
those terms related by the inverse relations and ancestors
are those obtained by propagating the
propagate_relationships relations (note: the resulting set
includes the term itself).
The relations given in the propagate_relationships
argument should be named as they are labelled in the OBO file. In order
to see a complete list of relations used in an OBO file, pass the file’s
path to the function get_relation_names. E.g. for the gene
ontology:
## [1] "is_a" "regulates" "part_of"
## [4] "has_part" "happens_during" "negatively_regulates"
## [7] "positively_regulates" "occurs_in" "ends_during"
Additional information is often present in the original file - for
example definitions, labelled by the def tag in OBO format.
get_ontology decides which properties to export based on
the extract_tags argument. By default
extract_tags="minimal", resulting in only the properties
id, name, obsolete,
parents, children and ancestors
being exported. It is possible to include all properties given in the
file by setting extract_tags="everything". The names of the
properties included in the returned ontology_index are then
the same as the names of the tags in OBO format.
All properties are stored in the returned ontology_index
as lists, except for the following, which are coerced to
character or logical vectors as appropriate:
"id", "name", "def", "comment", "obsolete", "created_by", "creation_date".
Further properties can be mapped to vectors if required, modifying
the returned ontology_index as a list, e.g.
Modifying an existing ontology_index to add term
properties is the same as adding to a list or
data.frame. In the example below, we add the number of
children for each term:
In the same manner, a valid ontology_index can be built
up from scratch as a list, of course requiring that the standard
properties are included for use with functions in
ontologyIndex.
In order to read in ontologies in OWL syntax, it is recommended to first convert to OBO format, for example using the ROBOT command line tool https://github.com/ontodev/robot.
If the option merge_equivalent_terms in
get_ontology/get_OBO is set to
TRUE (the default), then terms marked
equivalent_to target terms are merged and properties
aggregated (except for those listed above coerced to vectors, in which
case the values that would be assigned to the target term are used).
ontology_index explicitlyThe function ontology_index can be used to create an
object with class ontology_index. This could be useful for
instance if the user wished to convert a directed acyclic graph (DAG)
with edges representing sub/super-class relationships into an
ontology_index. It is similar to the function
data.frame: it accepts a variable number of arguments
corresponding to properties for ontological terms, which must each be a
vector or list of the same length (except the version
argument, which can be any object and should contain any information
about the version of the ontology). The only mandatory argument is the
parents argument, and should be a list of
character vectors giving the IDs of the
‘parents’/‘superclasses’ of each term. The term IDs can either be
supplied as the names attribute of the parents
or as a separate id argument of the same length as
parents. The human-readable term names can be passed as the
names argument (defaults to the same as id).
As usual the children and ancestors properties
are derived from the parents. Warnings are generated if any
IDs given in the parents argument are not in the
id argument.
A simple invocation:
animal_superclasses <- list(animal=character(0), mammal="animal", cat="mammal", fish="animal")
animal_ontology <- ontology_index(parents=animal_superclasses)
unclass(animal_ontology)## $id
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $name
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $parents
## $parents$animal
## character(0)
##
## $parents$mammal
## [1] "animal"
##
## $parents$cat
## [1] "mammal"
##
## $parents$fish
## [1] "animal"
##
##
## $children
## $children$animal
## [1] "mammal" "fish"
##
## $children$mammal
## [1] "cat"
##
## $children$cat
## character(0)
##
## $children$fish
## character(0)
##
##
## $ancestors
## $ancestors$animal
## [1] "animal"
##
## $ancestors$mammal
## [1] "animal" "mammal"
##
## $ancestors$cat
## [1] "animal" "mammal" "cat"
##
## $ancestors$fish
## [1] "animal" "fish"
##
##
## $obsolete
## animal mammal cat fish
## FALSE FALSE FALSE FALSE
For more details, see the help page for the function,
?ontology_index.