---
title: "Parameterization of Response Distributions in brms"
author: "Paul Bürkner"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: yes
vignette: >
  %\VignetteIndexEntry{Parameterization of Response Distributions in brms}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

The purpose of this vignette is to discuss the parameterizations of the families
(i.e., response distributions) used in brms. For a more general overview of
the package see `vignette("brms_overview")`.

## Notation

Throughout this vignette, we denote values of the response variable as $y$, a
density function as $f$, and use $\mu$ to refer to the main model parameter,
which is usually the mean of the response distribution or some closely related
quantity. In a regression framework, $\mu$ is not estimated directly but
computed as $\mu = g(\eta)$, where $\eta$ is a predictor term (see
`help(brmsformula)` for details) and $g$ is the response function (i.e.,
inverse of the link function).

## Location shift models

The density of the **gaussian** family is given by
$$
f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right)
$$

where $\sigma$ is the residual standard deviation. The density of the
**student** family is given by
$$
f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y - \mu}{\sigma}\right)^2\right)^{-(\nu+1)/2}
$$

$\Gamma$ denotes the gamma function and $\nu > 1$ are the degrees of freedom. As
$\nu \rightarrow \infty$, the student distribution becomes the gaussian
distribution. The density of the **skew_normal** family is given by
$$
f(y) = \frac{1}{\sqrt{2\pi}\omega}
 \exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2  \right)
\left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}} \right) \right) \right)
$$

where $\xi$ is the location parameter, $\omega$ is the positive scale parameter,
$\alpha$ the skewness parameter, and $\text{erf}$ denotes the error function of
the gaussian distribution. To parameterize the skew-normal distribution in terms
of the mean $\mu$ and standard deviation $\sigma$, $\omega$ and $\xi$ are
computed as
$$
\omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}}
$$

$$
\xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}}
$$

If $\alpha = 0$, the skew-normal distribution becomes the gaussian distribution.
For location shift models, $y$ can be any real value.

## Binary and count data models

The density of the **binomial** family is given by
$$
f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y}
$$
where $N$ is the number of trials and $y \in \{0, ... , N\}$. When all
$N$ are $1$ (i.e., $y \in \{0,1\}$), the **bernoulli** distribution for binary
data arises.

For $y \in \mathbb{N}_0$, the density of the **poisson** family is given by
$$
f(y) = \frac{\mu^{y}}{y!} \exp(-\mu)
$$
The density of the **negbinomial** (negative binomial) family is
$$
f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y}
\left(\frac{\phi}{\mu + \phi}\right)^\phi
$$
where $\phi$ is a positive precision parameter. For $\phi \rightarrow \infty$,
the negative binomial distribution becomes the poisson distribution. The density
of the **geometric** family arises if $\phi$ is set to $1$.

<!--
The density of the **discrete_weibull** family is
$$
f(y) = \mu^{y^\alpha} - \mu^{(y + 1)^\alpha}
$$
with location parameter $\mu \in [0, 1]$ and positive shape parameter $\alpha$.
-->

## Time-to-event models

With time-to-event models we mean all models that are defined on the positive
reals only, that is $y \in \mathbb{R}^+$. The density of the **lognormal**
family is given by
$$
f(y) = \frac{1}{\sqrt{2\pi}\sigma y} \exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right)
$$
where $\sigma$ is the residual standard deviation on the log-scale.
The density of the **Gamma** family is given by
$$
f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1}
\exp\left(-\frac{\alpha y}{\mu}\right)
$$
where $\alpha$ is a positive shape parameter. The density of the **weibull**
family is given by
$$
f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1}
\exp\left(-\left(\frac{y}{s}\right)^\alpha\right)
$$
where $\alpha$ is again a positive shape parameter and
$s = \mu / \Gamma(1 + 1 / \alpha)$ is the scale parameter to that $\mu$
is the mean of the distribution. The **exponential** family arises if $\alpha$
is set to $1$ for either the gamma or Weibull distribution. The density of the
**inverse.gaussian** family is given by
$$
f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp \left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right)
$$
where $\alpha$ is a positive shape parameter. The **cox** family implements Cox
proportional hazards model which assumes a hazard function of the form $h(y) =
h_0(y) \mu$ with baseline hazard $h_0(y)$ expressed via M-splines (which
integrate to I-splines) in order to ensure monotonicity. The density of the cox
model is then given by
$$
f(y) = h(y) S(y)
$$
where $S(y)$ is the survival function implied by $h(y)$.

## Extreme value models

Modeling extremes requires special distributions. One may use the **weibull**
distribution (see above) or the **frechet** distribution with density
$$
f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu} \exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right)
$$
where $s = \mu / \Gamma(1 - 1 / \nu)$ is a positive scale parameter and
$\nu > 1$ is a shape parameter so that $\mu$ predicts the mean of the Frechet
distribution. A generalization of both distributions is the generalized extreme
value distribution (family **gen_extreme_value**) with density
$$
f(y) = \frac{1}{\sigma} t(y)^{\xi + 1} \exp(-t(y))
$$
where
$$
t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 / \xi}
$$
with positive scale parameter $\sigma$ and shape parameter $\xi$.

## Response time models

One family that is especially suited to model reaction times is the
**exgaussian** ('exponentially modified Gaussian') family. Its density is given
by

$$
f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right)
$$
where $\beta$ is the scale (inverse rate) of the exponential component, $\xi$ is
the mean of the Gaussian component, $\sigma$ is the standard deviation of the
Gaussian component, and $\text{erfc}$ is the complementary error function. We
parameterize $\mu = \xi + \beta$ so that the main predictor term equals the
mean of the distribution.

Another family well suited for modeling response times is the
**shifted_lognormal** distribution. It's density equals that of the
**lognormal** distribution except that the whole distribution is shifted to the
right by a positive parameter called *ndt* (for consistency with the **wiener**
diffusion model explained below).

A family concerned with the combined modeling of reaction times and
corresponding binary responses is the **wiener** diffusion model. It has four
model parameters each with a natural interpretation. The parameter $\alpha > 0$
describes the separation between two boundaries of the diffusion process,
$\tau > 0$ describes the non-decision time (e.g., due to image or motor processing),
$\beta \in [0, 1]$ describes the initial bias in favor of the upper alternative,
and $\delta \in \mathbb{R}$ describes the drift rate to the boundaries (a
positive value indicates a drift towards to upper boundary). The density for the
reaction time at the upper boundary is given by

$$
f(y) = \frac{\alpha}{(y-\tau)^3/2}
\exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right)
\sum_{k = - \infty}^{\infty} (2k + \beta)
\phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right)
$$

where $\phi(x)$ denotes the standard normal density function. The density at the
lower boundary can be obtained by substituting $1 - \beta$ for $\beta$ and
$-\delta$ for $\delta$ in the above equation. In brms the parameters
$\alpha$, $\tau$, and $\beta$ are modeled as auxiliary parameters named *bs*
('boundary separation'), *ndt* ('non-decision time'), and *bias* respectively,
whereas the drift rate $\delta$ is modeled via the ordinary model formula that
is as $\delta = \mu$.

## Quantile regression

Quantile regression is implemented via family **asym_laplace** (asymmetric
Laplace distribution) with density

$$
f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y - \mu}{\sigma}\right)\right)
$$
where $\rho_p$ is given by $\rho_p(x) = x (p - I_{x < 0})$ and $I_A$ is the
indicator function of set $A$. The parameter $\sigma$ is a positive scale
parameter and $p$ is the *quantile* parameter taking on values in $(0, 1)$. For
this distribution, we have $P(Y < g(\eta)) = p$. Thus, quantile regression can
be performed by fixing $p$ to the quantile to interest.

## Probability models

The density of the **Beta** family for $y \in (0,1)$ is given by
$$
f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi, (1-\mu) \phi)}
$$
where $B$ is the beta function and $\phi$ is a positive precision parameter.
A multivariate generalization of the **Beta** family is the **dirichlet** family
with density
$$
f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)}
  \prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}.
$$
The **dirichlet** family is implemented with the multivariate logit
link function so that
$$
\mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})}
$$
For reasons of identifiability, $\eta_{\rm ref}$ is set to $0$, where ${\rm ref}$
is one of the response categories chosen as reference.

An alternative to the **dirichlet** family is the **logistic_normal** family
with density
$$
f(y) = \frac{1}{\prod_{k=1}^K y_k} \times
  \text{multivariate_normal}(\tilde{y} \, | \, \mu, \sigma, \Omega)
$$
where $\tilde{y}$ is the multivariate logit transformed response
$$
\tilde{y} = (\log(y_1 / y_{\rm ref}), \ldots, \log(y_{\rm ref-1} / y_{\rm ref}),
             \log(y_{\rm ref+1} / y_{\rm ref}), \ldots, \log(y_K / y_{\rm ref}))
$$
of dimension $K-1$ (excluding the reference category), which is modeled as
multivariate normally distributed with latent mean and standard deviation
vectors $\mu$ and $\sigma$, as well as correlation matrix $\Omega$.


## Circular models

The density of the **von_mises** family for $y \in (-\pi,\pi)$ is given by
$$
f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)}
$$
where $I_0$ is the modified Bessel function of order 0 and $\kappa$ is
a positive precision parameter.

## Ordinal and categorical models

For ordinal and categorical models, $y$ is one of the categories $1, ..., K$.
The intercepts of ordinal models are called thresholds and are denoted as
$\tau_k$, with $k \in \{1, ..., K-1\}$, whereas $\eta$ does not contain a fixed
effects intercept. Note that the applied link functions $h$ are technically
distribution functions $\mathbb{R} \rightarrow [0,1]$. The density of the
**cumulative** family (implementing the most basic ordinal model) is given by
$$
f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta)
$$

The densities of the **sratio** (stopping ratio) and **cratio** (continuation
ratio) families are given by
$$
f(y) =  g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} - \eta))
$$
and
$$
f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k})
$$

respectively. Note that both families are equivalent for symmetric link
functions such as logit or probit. The density of the **acat** (adjacent
category) family is given by
$$
f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k})
  \prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta-\tau_{j})
  \prod_{j=k+1}^K(1-g(\eta - \tau_{j}))}
$$
For the logit link, this can be simplified to
$$
f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)}
  {\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)}
$$
The linear predictor $\eta$ can be generalized to also depend on the category
$k$ for a subset of predictors. This leads to category specific
effects (for details on how to specify them see `help(brm)`). Note that
**cumulative** and **sratio** models use $\tau - \eta$, whereas **cratio** and
**acat** use $\eta - \tau$. This is done to ensure that larger values of $\eta$
increase the probability of *higher* response categories.

The **categorical** family is currently only implemented with the multivariate
logit link function and has density
$$
f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})}
$$
Note that $\eta$ does also depend on the category $k$. For reasons of
identifiability, $\eta_{1}$ is set to $0$. A generalization of the
**categorical** family to more than one trial is the **multinomial** family with
density
$$
f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}}
  \prod_{k=1}^K \mu_{k}^{y_{k}}
$$
where, for each category, $\mu_{k}$ is estimated via the multivariate logit link
function shown above.

## Zero-inflated and hurdle models

**Zero-inflated** and **hurdle** families extend existing families by adding
special processes for responses that are zero. The density of a
**zero-inflated** family is given by
$$
f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\
f_z(y) = (1 - z) f(y) \quad \text{if } y > 0
$$
where $z$ denotes the zero-inflation probability. Currently implemented families
are **zero_inflated_poisson**, **zero_inflated_binomial**,
**zero_inflated_negbinomial**, and **zero_inflated_beta**.

The density of a **hurdle** family is given by
$$
f_z(y) = z \quad \text{if } y = 0 \\
f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0
$$
Currently implemented families are **hurdle_poisson**, **hurdle_negbinomial**,
**hurdle_gamma**, and **hurdle_lognormal**.

The density of a **zero-one-inflated** family is given by
$$
f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\
f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\
f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0, 1\}
$$
where $\alpha$ is the zero-one-inflation probability (i.e. the probability that
zero or one occurs) and $\gamma$ is the conditional one-inflation probability
(i.e. the probability that one occurs rather than zero). Currently implemented
families are **zero_one_inflated_beta**.