---
title: "Calculating EQ-5D indices and summarising profiles with eq5d"
author: "Fraser Morton and Jagtar Singh Nijjar"
date: "`r format(Sys.Date(), '%d %B %Y')`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Calculating EQ-5D indices and summarising profiles with eq5d}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# EQ-5D

EQ-5D is a popular health related quality of life instrument used in the clinical and economic evaluation of health care. Developed by the [EuroQol](https://euroqol.org/) group, the instrument consists of two components: health state description and evaluation. 

For the description component a subject self-rates their health in terms of five dimensions; mobility, self-care, usual activities, pain/discomfort, and anxiety/depression using either a three-level ([EQ-5D-3L](https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-3l/) and [EQ-5D-Y-3L](https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-y-3l/)) or a five-level ([EQ-5D-5L](https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-5l/)) scale. 

The evaluation component requires a patient to record their overall health status using a visual analogue scale (EQ-VAS).

Following assessment the scores from the descriptive component can be reported as a five digit number ranging from 11111 (full health) to 33333/55555 (worst health). A number of methods exist for analysing these five digit profiles. However, frequently they are converted to a single utility index using country specific value sets, which can be used in the clinical and economic evaluation of health care as well as in population health surveys.

The eq5d package provides methods for the cross-sectional and longitudinal analysis of EQ-5D profiles and also the calculation of utility index scores from a subject's dimension scores. Additionally, a [Shiny](https://shiny.posit.co/) app is included to enable the calculation, visualisation and automated statistical analysis of multiple EQ-5D index values via a web browser using EQ-5D dimension scores stored in CSV or Excel files.

Value sets for EQ-5D-3L are available for many countries and have been produced using the time trade-off (TTO) valuation technique or the visual analogue scale (VAS) valuation technique.  Some countries have TTO and VAS value sets for EQ-5D-3L. Additionally, EQ-5D-3L "reverse crosswalk" value sets based on the [van Hout *et al* (2021)](https://pubmed.ncbi.nlm.nih.gov/34452708/) models as well as those published on the [EuroQol](https://euroqol.org/information-and-support/resources/value-sets/) website that enable EQ-5D-3L data to be mapped to EQ-5D-5L value sets are included. 

For EQ-5D-5L, a standardised valuation study protocol (EQ-VT) was developed by the EuroQol group based on the composite time trade-off (cTTO) valuation technique supplemented by a discrete choice experiment (DCE).  The EuroQol group recommends users to use a standard value set where available.

The EQ-5D-5L "crosswalk" value sets published by [van Hout *et al* (2012)](https://pubmed.ncbi.nlm.nih.gov/22867780/) as well as that for Russia are also included. The crosswalk value sets enable index values to be calculated for EQ-5D-5L data where no value set is available by mapping between the EQ-5D-5L and EQ-5D-3L descriptive systems.

The recently published age and sex conditional based mapping data by the [NICE Decision Support Unit](https://www.sheffield.ac.uk/nice-dsu/methods-development/mapping-eq-5d-5l-3l) are also now part of the package. These enable age-sex based EQ-5D-3L to EQ-5D-5L and EQ-5D-5L to EQ-5D-3L mappings from dimensions and exact or approximate utility index scores. 

Additional information on EQ-5D can be found on the [EuroQol](https://euroqol.org/) website as well as in [Szende *et al* (2007)](https://doi.org/10.1007/1-4020-5511-0) and [Szende 
    *et al* (2014)](https://doi.org/10.1007/978-94-007-7596-1). Advice on [choosing a value set](https://euroqol.org/information-and-support/resources/value-sets/
) can also be found on the EuroQol website.

## Installation

You can install the released version of eq5d from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("eq5d")
```

And the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("fragla/eq5d")
```

## Quick Start

```{r quickStart}
library(eq5d)

#single calculation

#named vector MO, SC, UA, PD and AD represent mobility, self-care, usual activites, pain/discomfort and anxiety/depression, respectfully.
scores <- c(MO=1,SC=2,UA=3,PD=2,AD=1)

#EQ-5D-3L using the UK TTO value set
eq5d(scores=scores, country="UK", version="3L", type="TTO")

#Using five digit format
eq5d(scores=12321, country="UK", version="3L", type="TTO")

#EQ-5D-Y-3L using the Slovenian cTTO value set
eq5d(scores=13321, country="Slovenia", version="Y3L", type="cTTO")

#EQ-5D-5L crosswalk
eq5d(scores=55555, country="Spain", version="5L", type="CW")

#EQ-5D-3L reverse crosswalk
eq5d(scores=33333, country="Germany", version="3L", type="RCW")

#EQ-5D-5L to EQ-5D-3L NICE DSU mapping

#Using dimensions
eq5d(c(MO=1,SC=2,UA=3,PD=4,AD=5), version="5L", type="DSU", country="UK", age=23, sex="male")

#Using exact utility score
eq5d(0.922, country="UK", version="5L", type="DSU", age=18, sex="male")

#Using approximate utility score
eq5d(0.435, country="UK", version="5L", type="DSU", age=30, sex="female", bwidth=0.0001)

#multiple calculations using the Canadian VT value set

#data.frame with individual dimensions
scores.df <- data.frame(
  MO=c(1,2,3,4,5), SC=c(1,5,4,3,2), UA=c(1,5,2,3,1), PD=c(1,3,4,3,4), AD=c(1,2,1,2,1)
)

eq5d(scores.df, country="Canada", version="5L", type="VT")

#data.frame using five digit format
scores.df2 <- data.frame(state=c(11111, 25532, 34241, 43332, 52141))

eq5d(scores.df2, country="Canada", version="5L", type="VT", five.digit="state")

#or using a vector

eq5d(scores.df2$state, country="Canada", version="5L", type="VT")

```

## Value sets

The available value sets can be viewed using the ***valuesets*** function. The results can be filtered by EQ-5D version, value set type or by country.

```{r valuesets}
# Return TTO value sets with PubMed IDs and DOIs (top 6 returned for brevity).
head(valuesets(type="TTO", references=c("PubMed","DOI")))

# Return VAS value sets with ISBN and external URL (top 6 returned for brevity).
head(valuesets(type="VAS", references=c("ISBN","ExternalURL")))

# Return EQ-5D-5L value sets (top 6 returned for brevity).
head(valuesets(version="5L"))

# Return all French value sets.
valuesets(country="France")

# Return all EQ-5D-5L to EQ-5D-3L DSU value sets without references.
valuesets(type="DSU", version="5L", references=NULL)

```

## Analysis of EQ-5D Profiles

A number of methods have been published that enable the analysis of EQ-5D profiles, most recently in the open access book Methods for Analysing and Reporting EQ-5D Data by [Devlin, Janssen and Parkin](https://link.springer.com/book/10.1007/978-3-030-47622-9). The eq5d package includes R implentations of some of the methods from this book and from other sources that may be of use in analysing EQ-5D data.

### Cumulative frequency analysis

The ***eq5dcf*** function calculates the frequency, percentage, cumulative frequency and cumulative percentage for each five digit profile in an EQ-5D dataset. Either a vector of five digit profiles or a data.frame of individual dimensions can be passed to this function in order to summarise data in this way.

```{r eq5dcf}
library(readxl)

#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#run eq5dcf function on a data.frame
res <- eq5dcf(data, "3L")

# Return data.frame of cumulative frequency stats (top 6 returned for brevity).
head(res)

```

### Summarising the Severity of EQ-5D Profiles

The eq5d package includes methods for summarising the severity of EQ-5D health state. The Level Sum Score (LSS) treats each dimension's level as a number rather than a category. Each number is added up to produce a score between 5 and 15 for EQ-5D-3L and 5 and 25 for EQ-5D-5L.

```{r eq5dlss}
lss(c(MO=1,SC=2,UA=3,PD=2,AD=1), version="3L")

lss(55555, version="5L")

lss(c(11111, 12345, 55555), version="5L")

```

The Level Frequency Score (LFS) is an alternative method of summarising profile data developed by [Oppe and de Charro](https://pubmed.ncbi.nlm.nih.gov/11189116). Here the frequency of the levels for each health state are characterised. As described in Devlin, Janssen and Parkin's book, the full health profile 11111 for EQ-5D-5L has 5, 1 s, no level 2, 3, 4 and 5s, so the LFS is 50000; the health profile 55555 is 00005; profiles such as 31524 and 53412 would be 11111.

```{r eq5dlfs}
lfs(c(MO=1,SC=2,UA=3,PD=2,AD=1), version="3L")

lfs(55555, version="5L")

lfs(c(11111, 12345, 55555), version="5L")

```

### Paretian Classification of Health Change

The Paretian Classification of Health Change (PCHC) was developed by [Devlin et al](https://pubmed.ncbi.nlm.nih.gov/20623685) in 2010 and is used to compare changes in individuals over time. PCHC classifies the change in an individual's health state as better (improvement in at least one dimension), worse (a deterioration in at least one dimension), mixed (improvements and deteriorations in dimensions) or there being no change in the health state. Those classified in the No change group with the 11111 health state can be separated into their own "No problems" group. PCHC can be calculated using the ***pchc*** function.

```{r pchc}
library(readxl)

#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#use first 50 entries of each group as pre/post
pre <- data[data$Group=="Group1",][1:50,]
post <- data[data$Group=="Group2",][1:50,]

#run pchc function on data.frames

#Show no change, improve, worse, mixed without totals
res1 <- pchc(pre, post, version="3L", no.problems=FALSE, totals=FALSE)
res1

#Show totals, but not those with no problems
res2 <- pchc(pre, post, version="3L", no.problems=FALSE, totals=TRUE)
res2

#Show totals and no problems for each dimension
res3 <- pchc(pre, post, version="3L", no.problems=TRUE, totals=TRUE, by.dimension=TRUE)
res3

#Don't summarise. Return all classifications
res4 <- pchc(pre, post, version="3L", no.problems=TRUE, totals=FALSE, summary=FALSE)
head(res4)

```

### Probability of Superiority

The Probability of Superiority (PS) is a non-parametric measure of effect size introduced by [Buchholz et al](https://pubmed.ncbi.nlm.nih.gov/25355653/) in 2015 and enables the assessment of paired samples of EQ-5D profile data in the context of assessing changes in health in terms of improvement or deterioration. For each EQ-5D dimension the number of subjects that have improved over time is divided by the total number of matched pairs. Ties (those with no changes) were accounted for through the addition of half the number of ties to the numerator. The score is less than 0.5 if more patients deteriorate than improve, 0.5 if the same number of patients improve and deteriorate or do not change and greater than 0.5 if more patients improve than deteriorate.

```{r ps}
library(readxl)

#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#use first 50 entries of each group as pre/post
pre <- data[data$Group=="Group1",][1:50,]
post <- data[data$Group=="Group2",][1:50,]

res <- ps(pre, post, version="3L")
res

```

### Health Profile Grid

The Health Profile Grid (HPG) was also introduced by [Devlin et al](https://pubmed.ncbi.nlm.nih.gov/20623685/) in 2010. The HPG provides a visual way to observe changes in individuals between two time points. The HPG requires profiles for each time point to be ordered from best to worst. The ***hpg*** function uses a specified value set for this with profiles being assigned a ranking between 1 and 243 for EQ-5D-3L and 1 and 3125 for EQ-5D-5L based on severity (1 being the best and 243/3125 the worst). The rankings for the two time points for each individual are plotted with the location of each point showing whether there has been an improvement or deterioration. The further a point is above the 45° line, the great the improvement in an individuals health. Conversely, the further below the line a point is the more health has deteriorated. Those individuals on the line show "no change".

```{r hpg}
library(readxl)

#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#use first 50 entries of each group as pre/post
pre <- data[data$Group=="Group1",][1:50,]
post <- data[data$Group=="Group2",][1:50,]

#run hpg function on data.frames

#Show pre/post rankings and PCHC classification
res <- hpg(pre, post, country="UK", version="3L", type="TTO")
head(res)

#Plot data using ggplot2
library(ggplot2)

ggplot(res, aes(Post, Pre, color=PCHC)) +
  geom_point(aes(shape=PCHC)) +
  coord_cartesian(xlim=c(1,243), ylim=c(1,243)) +
  scale_x_continuous(breaks=c(1,243)) +
  scale_y_continuous(breaks=c(1,243)) +
  annotate("segment", x=1, y=1, xend=243, yend=243, colour="black") +
  theme(panel.border=element_blank(), panel.grid.minor=element_blank()) +
  xlab("Post-treatment") +
  ylab("Pre-treatment")

```

### Shannon's Indices

Shannon's indices were first used to assess how evenly EQ-5D dimension scores or health states in a dataset are distributed by [Janssen et al](https://pubmed.ncbi.nlm.nih.gov/17294285/) in 2007. Shannon's H' (diversity) index represents the absolute amount of informativity captured with Shannon's J' (evenness) index capturing the evenness of the distribution of data. Shannon's J' is calculated by dividing H' by H' max to give a value between 0 and 1. Lower values indicate more diversity and higher values indicate less.

```{r shannon}
library(readxl)

#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#Shannon's H', H' max and J' for the whole dataset
shannon(data, version="3L", by.dimension=FALSE)

#Shannon's H', H' max and J' for each dimension
res <- shannon(data, version="3L", by.dimension=TRUE)

#Convert to data.frame for ease of viewing
do.call(rbind, res)

```

### Health State Density Curve and Health State Density Index

The Health State Density Curve (HSDC) was introduced by [Zamora et al](https://www.ohe.org/publications/new-methods-analysing-distribution-eq-5d-observations/) in 2018 and provides a graphical way to depict the distribution of EQ-5D profiles. The cumulative frequency of health profiles ranked from most to least frequent is plotted against the cumulative proportion of the distinct health profiles (red line) and can be compared a uniform distribution representing total equality (black line). The Health State Density Index (HSDI) is based on the area formed by the diagonal line representing total equality and the line of the HSDC. The HSDI has a value between 0 and 1 where a value of 0 represents total inequality and 1 total equality.

```{r hsdi}
#load example data
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

#Calculate HSDI
hsdi <- hsdi(data, version="3L")

#Plot HSDC
cf <- eq5dcf(data, version="3L", proportions=T)
cf$CumulativeState <- 1:nrow(cf)/nrow(cf)

#Plot data using ggplot2
library(ggplot2)

ggplot(cf, aes(CumulativeProp, CumulativeState)) + 
  geom_line(color="#FF9999") + 
  annotate("segment", x=0, y=0, xend=1, yend=1, colour="black") +  
  annotate("text", x=0.5, y=0.9, label=paste0("HSDI=", hsdi)) +
  theme(panel.border=element_blank(), panel.grid.minor=element_blank()) +
  coord_cartesian(xlim=c(0,1), ylim=c(0,1)) +
  xlab("Cumulative proportion of observations") +
  ylab("Cumulative proportion of profiles")

```

### EQ-5D-DS

The ***eq5dds*** function is an R approximation of the Stata command written by [Ramos-Goñi & Ramallo-Fariña](https://www.stata-journal.com/article.html?article=st0450). The function analyses and summarises the descriptive components of an EQ-5D dataset. The "by" argument enables a grouping variable to be specified when analysing the data subgroup.

```{r eq5dds}
set.seed(12345)
dat <- data.frame(
         matrix(
           sample(1:3, 5*12, replace=TRUE), 12, 5, 
           dimnames=list(1:12, c("MO","SC","UA","PD","AD"))
         ),
         Sex=rep(c("Male", "Female"))
       )

eq5dds(dat, version="3L")

eq5dds(dat, version="3L", counts=TRUE)

eq5dds(dat, version="3L", by="Sex")

```

## Helper functions

Helper functions are included, which may be useful in the processing of EQ-5D data. ***get_all_health_states*** returns a vector of all possible five digit health states for a specified EQ-5D version. ***get_dimensions_from_health_states*** splits a vector of five digit health states into a data.frame of their individual components and ***get_health_states_from_dimensions*** combines indiviual dimensions in a data.frame into five digit health states.

```{r helper}

# Get all EQ-5D-3L five digit health states (top 6 returned for brevity).
head(get_all_health_states("3L"))

# Split five digit health states into their individual components.
get_dimensions_from_health_states(c("12345", "54321"), version="5L")

```

## Example data

Example data is included with the package and can be accessed using the ***system.file*** function.

```{r example data}
# View example files.
dir(path=system.file("extdata", package="eq5d"))

# Read example EQ-5D-3L data.
library(readxl)
data <- read_excel(system.file("extdata", "eq5d3l_example.xlsx", package="eq5d"))

# Calculate index scores
scores <- eq5d(data, country="UK", version="3L", type="TTO")

# Top 6 scores
head(scores)

```

## Shiny web interface

The calculation (and visualisation) of multiple EQ-5D indices can also be performed by upload of a CSV or Excel file using the packaged [Shiny](https://shiny.posit.co/) app. This requires the [shiny](https://cran.r-project.org/package=shiny),  [DT](https://cran.r-project.org/package=DT), 
[FSA](https://cran.r-project.org/package=FSA), 
[ggplot2](https://cran.r-project.org/package=ggplot2), 
[ggiraph](https://cran.r-project.org/package=ggiraph), 
[ggiraphExtra](https://cran.r-project.org/package=ggiraphExtra), 
[mime](https://cran.r-project.org/package=mime),  [PMCMRplus](https://cran.r-project.org/package=PMCMRplus), [readxl](https://cran.r-project.org/package=readxl), 
[shinycssloaders](https://cran.r-project.org/package=shinycssloaders) and
[shinyWidgets](https://cran.r-project.org/package=shinyWidgets) packages. Ideally the CSV/Excel headers should be the same as the names of the vector passed to the ***eq5d*** function i.e. MO, SC, UA, PD and AD or the column name "State" if using the five digit format. However, a modal dialog will prompt the user to select the appropriate columns if the defaults can not be found.

The app is launched using the ***shiny_eq5d*** function.

```{r shiny, echo=TRUE, eval=FALSE}
shiny_eq5d()
```
Alternatively, it can be accessed without installing R/Shiny/eq5d by visiting [shinyapps.io](https://fragla.shinyapps.io/shiny-eq5d).

## License

This project is licensed under the MIT License - see the [LICENSE.md](https://github.com/fragla/eq5d/blob/master/LICENSE.md) file for details.