BeviMed
, which stands
for Bayesian Evaluation of Variant Involvement in Mendelian
Disease [1], is an association test which estimates the probability
of an association between a given set of variants and a case/control
label, the mode of inheritance for the disease, and the probability that
each individual variant is pathogenic. This vignette gives a quick
description of what can be done with the package and demonstrates how to
use it on simulated data. For more detailed explanations, see the
‘BeviMed Guide’ vignette and the individual function help pages.
Inference is performed by the bevimed
function, which
evaluates the data with respect to three models: a model of no
association between case/control label and allele counts, and models of
dominant and recessive association. The function depends on input
parameters:
y
, a length N
(number
of samples) logical
vector,G
, an N
by k
integer matrix of allele counts for N
individuals at
k
rare variant sites,prior_prob_association
- the prior probability of
association between the disease label and the variants in the locus.
Defaults to 0.01
.prior_prob_dominant
- the prior probability of dominant
as opposed to recessive inheritance, given an association with the
locus. Defaults to 0.5
.ploidy
- a vector the same length as the case-control
label y
giving the ploidy of each individual in the locus.
Defaults to 2
for each sample.?bevimed
for more
details).bevimed
returns an object of class BeviMed
which contains the output of the (MCMC-based) inference procedure,
including samples from the posterior distributions of the model
parameters. The object can be evaluated at the command line to print a
summary of inference, telling you summary statistics of interest,
including the probability of association. The object is likely to take
up a lot of memory, so it is useful to store a summary, computed with
summary
, for each result if the procedure is being applied
to multiple loci.
Summary statistics can also be computed directly from the arguments using the functions (see help for individual functions for more information):
prob_association
- returning the probability of
association, optionally broken down by mode of inheritance.conditional_prob_pathogenic
- the probabilities of
pathogenicity for the individual variants conditional on a mode of
inheritance.expected_explained
- the expected number of cases
explained by variants.explaining_variants
- the expected number of variants
involved in explained cases.Here we demonstrate a simple application of BeviMed for some simulated data.
Firstly, we’ll generate a random allele-count matrix G
for 100 samples at 20 variant sites (each with an allele frequency of
0.02) and an independently generated case-control label,
y_random
.
G <- matrix(rbinom(size=2, prob=0.02, n=100*20), nrow=100, ncol=20)
y_random <- runif(n=nrow(G)) < 0.1
prob_association(G=G, y=y_random)
## [1] 0.002810699
The results indicate that there is a low probability of association.
We now generate a new case control label y_dependent
which
depends on G
- specifically, we treat variants 1 to 3 as
‘pathogenic’, and label any samples harbouring alleles for any of these
variants as cases.
y_dependent <- apply(G, 1, function(variants) sum(variants[1:3]) > 0)
prob_association(G=G, y=y_dependent)
## [1] 0.9998203
Notice that there is now a higher estimated probability of association.
By default, prob_association
integrates over mode of
inheritance (e.g. are at least 1 or 2 pathogenic variants required for a
pathogenic configuration?). The probabilities of association with each
mode of inheritance can by shown by passing the option
by_MOI=TRUE
(for more details, including how to set the
ploidy of the samples within the region, see
?prob_pathogenic
).
For a more detailed output, the bevimed
function can be
used, and it’s returned values can be summarised and stored/printed.
## --------------------------------------------------------------------------------
## Posterior probability of association:
## 1 [prior: 0.01]
## --------------------------------------------------------------------------------
## Model MOI Prior Post Cases Variants
## dominant dom 0.5 0.999676 6.93 2.95
## recessive rec 0.5 0.000324 4.59 5.96
##
## MOI: mode of inheritance, dominant (dom) or recessive (rec)
## Prior: prior probability of model given association
## Post: posterior probability of model given association
## Cases: posterior expected number of cases explained
## Variants: posterior expected number of variants involved in explained cases
## --------------------------------------------------------------------------------
## Probabilities of pathogenicity for individual variants given association
##
## Var Probability pathogenic
## 1 [1.00 =================]
## 2 [1.00 =================]
## 3 [0.94 ================ ]
## 4 [0.00 ]
## 5 [0.00 ]
## 6 [0.00 ]
## 7 [0.00 ]
## 8 [0.00 ]
## 9 [0.00 ]
## 10 [0.00 ]
## 11 [0.00 ]
## 12 [0.01 ]
## 13 [0.00 ]
## 14 [0.02 ]
## 15 [0.00 ]
## 16 [0.00 ]
## 17 [0.00 ]
## 18 [0.00 ]
## 19 [0.00 ]
## 20 [0.00 ]
## --------------------------------------------------------------------------------