--- title: "Parameterization of Response Distributions in brms" author: "Paul Bürkner" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Parameterization of Response Distributions in brms} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- The purpose of this vignette is to discuss the parameterizations of the families (i.e., response distributions) used in brms. For a more general overview of the package see `vignette("brms_overview")`. ## Notation Throughout this vignette, we denote values of the response variable as $y$, a density function as $f$, and use $\mu$ to refer to the main model parameter, which is usually the mean of the response distribution or some closely related quantity. In a regression framework, $\mu$ is not estimated directly but computed as $\mu = g(\eta)$, where $\eta$ is a predictor term (see `help(brmsformula)` for details) and $g$ is the response function (i.e., inverse of the link function). ## Location shift models The density of the **gaussian** family is given by $$ f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right) $$ where $\sigma$ is the residual standard deviation. The density of the **student** family is given by $$ f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y - \mu}{\sigma}\right)^2\right)^{-(\nu+1)/2} $$ $\Gamma$ denotes the gamma function and $\nu > 1$ are the degrees of freedom. As $\nu \rightarrow \infty$, the student distribution becomes the gaussian distribution. The density of the **skew_normal** family is given by $$ f(y) = \frac{1}{\sqrt{2\pi}\omega} \exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2 \right) \left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}} \right) \right) \right) $$ where $\xi$ is the location parameter, $\omega$ is the positive scale parameter, $\alpha$ the skewness parameter, and $\text{erf}$ denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean $\mu$ and standard deviation $\sigma$, $\omega$ and $\xi$ are computed as $$ \omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}} $$ $$ \xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}} $$ If $\alpha = 0$, the skew-normal distribution becomes the gaussian distribution. For location shift models, $y$ can be any real value. ## Binary and count data models The density of the **binomial** family is given by $$ f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y} $$ where $N$ is the number of trials and $y \in \{0, ... , N\}$. When all $N$ are $1$ (i.e., $y \in \{0,1\}$), the **bernoulli** distribution for binary data arises. For $y \in \mathbb{N}_0$, the density of the **poisson** family is given by $$ f(y) = \frac{\mu^{y}}{y!} \exp(-\mu) $$ The density of the **negbinomial** (negative binomial) family is $$ f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y} \left(\frac{\phi}{\mu + \phi}\right)^\phi $$ where $\phi$ is a positive precision parameter. For $\phi \rightarrow \infty$, the negative binomial distribution becomes the poisson distribution. The density of the **geometric** family arises if $\phi$ is set to $1$. ## Time-to-event models With time-to-event models we mean all models that are defined on the positive reals only, that is $y \in \mathbb{R}^+$. The density of the **lognormal** family is given by $$ f(y) = \frac{1}{\sqrt{2\pi}\sigma y} \exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right) $$ where $\sigma$ is the residual standard deviation on the log-scale. The density of the **Gamma** family is given by $$ f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1} \exp\left(-\frac{\alpha y}{\mu}\right) $$ where $\alpha$ is a positive shape parameter. The density of the **weibull** family is given by $$ f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1} \exp\left(-\left(\frac{y}{s}\right)^\alpha\right) $$ where $\alpha$ is again a positive shape parameter and $s = \mu / \Gamma(1 + 1 / \alpha)$ is the scale parameter to that $\mu$ is the mean of the distribution. The **exponential** family arises if $\alpha$ is set to $1$ for either the gamma or Weibull distribution. The density of the **inverse.gaussian** family is given by $$ f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp \left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right) $$ where $\alpha$ is a positive shape parameter. The **cox** family implements Cox proportional hazards model which assumes a hazard function of the form $h(y) = h_0(y) \mu$ with baseline hazard $h_0(y)$ expressed via M-splines (which integrate to I-splines) in order to ensure monotonicity. The density of the cox model is then given by $$ f(y) = h(y) S(y) $$ where $S(y)$ is the survival function implied by $h(y)$. ## Extreme value models Modeling extremes requires special distributions. One may use the **weibull** distribution (see above) or the **frechet** distribution with density $$ f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu} \exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right) $$ where $s = \mu / \Gamma(1 - 1 / \nu)$ is a positive scale parameter and $\nu > 1$ is a shape parameter so that $\mu$ predicts the mean of the Frechet distribution. A generalization of both distributions is the generalized extreme value distribution (family **gen_extreme_value**) with density $$ f(y) = \frac{1}{\sigma} t(y)^{\xi + 1} \exp(-t(y)) $$ where $$ t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 / \xi} $$ with positive scale parameter $\sigma$ and shape parameter $\xi$. ## Response time models One family that is especially suited to model reaction times is the **exgaussian** ('exponentially modified Gaussian') family. Its density is given by $$ f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right) $$ where $\beta$ is the scale (inverse rate) of the exponential component, $\xi$ is the mean of the Gaussian component, $\sigma$ is the standard deviation of the Gaussian component, and $\text{erfc}$ is the complementary error function. We parameterize $\mu = \xi + \beta$ so that the main predictor term equals the mean of the distribution. Another family well suited for modeling response times is the **shifted_lognormal** distribution. It's density equals that of the **lognormal** distribution except that the whole distribution is shifted to the right by a positive parameter called *ndt* (for consistency with the **wiener** diffusion model explained below). A family concerned with the combined modeling of reaction times and corresponding binary responses is the **wiener** diffusion model. It has four model parameters each with a natural interpretation. The parameter $\alpha > 0$ describes the separation between two boundaries of the diffusion process, $\tau > 0$ describes the non-decision time (e.g., due to image or motor processing), $\beta \in [0, 1]$ describes the initial bias in favor of the upper alternative, and $\delta \in \mathbb{R}$ describes the drift rate to the boundaries (a positive value indicates a drift towards to upper boundary). The density for the reaction time at the upper boundary is given by $$ f(y) = \frac{\alpha}{(y-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right) $$ where $\phi(x)$ denotes the standard normal density function. The density at the lower boundary can be obtained by substituting $1 - \beta$ for $\beta$ and $-\delta$ for $\delta$ in the above equation. In brms the parameters $\alpha$, $\tau$, and $\beta$ are modeled as auxiliary parameters named *bs* ('boundary separation'), *ndt* ('non-decision time'), and *bias* respectively, whereas the drift rate $\delta$ is modeled via the ordinary model formula that is as $\delta = \mu$. ## Quantile regression Quantile regression is implemented via family **asym_laplace** (asymmetric Laplace distribution) with density $$ f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y - \mu}{\sigma}\right)\right) $$ where $\rho_p$ is given by $\rho_p(x) = x (p - I_{x < 0})$ and $I_A$ is the indicator function of set $A$. The parameter $\sigma$ is a positive scale parameter and $p$ is the *quantile* parameter taking on values in $(0, 1)$. For this distribution, we have $P(Y < g(\eta)) = p$. Thus, quantile regression can be performed by fixing $p$ to the quantile to interest. ## Probability models The density of the **Beta** family for $y \in (0,1)$ is given by $$ f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi, (1-\mu) \phi)} $$ where $B$ is the beta function and $\phi$ is a positive precision parameter. A multivariate generalization of the **Beta** family is the **dirichlet** family with density $$ f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)} \prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}. $$ The **dirichlet** family is implemented with the multivariate logit link function so that $$ \mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})} $$ For reasons of identifiability, $\eta_{\rm ref}$ is set to $0$, where ${\rm ref}$ is one of the response categories chosen as reference. An alternative to the **dirichlet** family is the **logistic_normal** family with density $$ f(y) = \frac{1}{\prod_{k=1}^K y_k} \times \text{multivariate_normal}(\tilde{y} \, | \, \mu, \sigma, \Omega) $$ where $\tilde{y}$ is the multivariate logit transformed response $$ \tilde{y} = (\log(y_1 / y_{\rm ref}), \ldots, \log(y_{\rm ref-1} / y_{\rm ref}), \log(y_{\rm ref+1} / y_{\rm ref}), \ldots, \log(y_K / y_{\rm ref})) $$ of dimension $K-1$ (excluding the reference category), which is modeled as multivariate normally distributed with latent mean and standard deviation vectors $\mu$ and $\sigma$, as well as correlation matrix $\Omega$. ## Circular models The density of the **von_mises** family for $y \in (-\pi,\pi)$ is given by $$ f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)} $$ where $I_0$ is the modified Bessel function of order 0 and $\kappa$ is a positive precision parameter. ## Ordinal and categorical models For ordinal and categorical models, $y$ is one of the categories $1, ..., K$. The intercepts of ordinal models are called thresholds and are denoted as $\tau_k$, with $k \in \{1, ..., K-1\}$, whereas $\eta$ does not contain a fixed effects intercept. Note that the applied link functions $h$ are technically distribution functions $\mathbb{R} \rightarrow [0,1]$. The density of the **cumulative** family (implementing the most basic ordinal model) is given by $$ f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta) $$ The densities of the **sratio** (stopping ratio) and **cratio** (continuation ratio) families are given by $$ f(y) = g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} - \eta)) $$ and $$ f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k}) $$ respectively. Note that both families are equivalent for symmetric link functions such as logit or probit. The density of the **acat** (adjacent category) family is given by $$ f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k}) \prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta-\tau_{j}) \prod_{j=k+1}^K(1-g(\eta - \tau_{j}))} $$ For the logit link, this can be simplified to $$ f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)} {\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)} $$ The linear predictor $\eta$ can be generalized to also depend on the category $k$ for a subset of predictors. This leads to category specific effects (for details on how to specify them see `help(brm)`). Note that **cumulative** and **sratio** models use $\tau - \eta$, whereas **cratio** and **acat** use $\eta - \tau$. This is done to ensure that larger values of $\eta$ increase the probability of *higher* response categories. The **categorical** family is currently only implemented with the multivariate logit link function and has density $$ f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})} $$ Note that $\eta$ does also depend on the category $k$. For reasons of identifiability, $\eta_{1}$ is set to $0$. A generalization of the **categorical** family to more than one trial is the **multinomial** family with density $$ f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}} \prod_{k=1}^K \mu_{k}^{y_{k}} $$ where, for each category, $\mu_{k}$ is estimated via the multivariate logit link function shown above. ## Zero-inflated and hurdle models **Zero-inflated** and **hurdle** families extend existing families by adding special processes for responses that are zero. The density of a **zero-inflated** family is given by $$ f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) \quad \text{if } y > 0 $$ where $z$ denotes the zero-inflation probability. Currently implemented families are **zero_inflated_poisson**, **zero_inflated_binomial**, **zero_inflated_negbinomial**, and **zero_inflated_beta**. The density of a **hurdle** family is given by $$ f_z(y) = z \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0 $$ Currently implemented families are **hurdle_poisson**, **hurdle_negbinomial**, **hurdle_gamma**, and **hurdle_lognormal**. The density of a **zero-one-inflated** family is given by $$ f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\ f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\ f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0, 1\} $$ where $\alpha$ is the zero-one-inflation probability (i.e. the probability that zero or one occurs) and $\gamma$ is the conditional one-inflation probability (i.e. the probability that one occurs rather than zero). Currently implemented families are **zero_one_inflated_beta**.