The following sections describe the options for model classification results, net benefit, and interventions avoided from a decision curve analysis (DCA) for binary outcome models (as of yet).
Examine sensitivity and specificity values from a threshold on predicted values.
Set a prediction threshold and examine sensitivity and specificity below. Find the best cutoff level. AUC calculated using the trapezoidal rule at the cutoff level. The threshold that minimizes misclassification is where the sum of the number of false-positive and false-negative results (error rate) is lowest.
The appropriate threshold is based on the clinical context (e.g., we consider any risk over 20% as being too high to go without an intervention).
A common option is to select a threshold equal to the prevalence rate (mean). A predicted score above prevalence indicates high risk, a value below prevalence indicates low risk.
The threshold range can be set by using sensible upper and lower bounds on the maximum number of false positives one would tolerate to find one true positive. For example, if a detected cancer is worth 16 unnecessary surgical interventions, an appropriate risk threshold for surgery would be 1/(1 + 16) = 6%.
Someone may not do more than 10 biopsies to find one high-grade cancer in patients with similar health and who think about the risks and benefits of biopsy vs. finding cancer in the same way. So if a patient’s risk was above 10% I do a biopsy, otherwise not. The risk of 10% is odds of 1:9. Missing a high-grade cancer is 9 times worse than doing an unnecessary biopsy.
For more information, please read the article “Decision Curve Analysis: A Novel Method for Evaluating Prediction Models” (Vickers & Elkin, 2006).
Get sensitivity, specificity, false-positive, false-negative, and positive and negative predictive values associated using your prediction threshold value. Includes Decision Curve Analysis.
Positive Predictive Value: Proportion of all positive classifications that were true-positive.
Negative Predictive Value: Proportion of all negative classifications that were true-negative.
Create a decide object, enter the model name and a threshold on the logit scale. View model classification related statistics. Here we predict engine shape, straight or “v” from the mtcars dataset.
car_m1 <- assess(formula=vs ~ hp + am, data=mtcars, regression="logistic")
d1 <- decide(x=car_m1, threshold= -0.767)
print(d1$Model.Summary$Classification)
#> Sensitivity Specifity False.Positives False.Negatives Accuracy.Rate
#> 0.92857143 0.83333333 0.16666667 0.07142857 0.87500000
#> Error.Rate
#> 0.12500000View decision curve analysis results like ‘net benefit’ at various thresholds of key interest at these percentiles: 0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99.
print(d1$DCA)
#> $total_N
#> 0.01 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.99
#> 32 32 32 32 32 32 32 32 32
#>
#> $Threshold.Level
#> 0.01 0.05 0.10 0.25 0.50 0.75
#> 1.736852e-11 1.681793e-07 3.093199e-06 2.830399e-03 3.170837e-01 9.667616e-01
#> 0.90 0.95 0.99
#> 9.916185e-01 9.963157e-01 9.997053e-01
#>
#> $Net.Benefit
#> 0.01 0.05 0.10 0.25 0.50 0.75 0.90 0.95
#> 0.4375000 0.4374999 0.4374986 0.4366130 0.3627211 0.2500000 0.1250000 0.0625000
#> 0.99
#> 0.0312500
#>
#> $All.Treated
#> 0.01 0.05 0.10 0.25 0.50
#> 0.4375000 0.4374999 0.4374983 0.4359034 0.1763265
#> 0.75 0.90 0.95 0.99
#> -15.9231896 -66.1122468 -151.6745755 -1907.6989019
#>
#> $Interventions.Saved
#> 0.01 0.05 0.10 0.25 0.50 0.75 0.90 0.95
#> 0.0312500 0.0625000 0.1250000 0.2500000 0.4014456 0.5560535 0.5598586 0.5611133
#> 0.99
#> 0.5623802A basic graph of the classification results when y= ‘cl’. The green area represents true predictions.
We can modify the graph. Many plotting options available.
This model predicts very well. However, in healthcare we often have very high stakes decisions to make based on our model result. And we need methods to help make tough decisions. We now turn to a Decision Curve Analysis.
The Net Benefit is driven by true-positives, a higher value is better. The NB plot compares what provides a better strategy along various thresholds (only consider thresholds within the range of predicted values). Two valid comparison strategies are to ‘treat all’ or ‘treat none’.
For example, a net benefit of 0.0973 indicates that the difference between our true-positives and false-positives will be finding about 10 more true-positives than false-positives per 100 patients using a cutoff of FEV= 1.5.
The ‘All treated’ line tends to intersect the ‘None treated’ line at the prevalence line for binary outcomes. No Benefit when all interventions withheld.
Net Benefit = Frequency of True-Positives/N - False-Positives/N * (Pt/1-Pt) at a specific probability threshold (Pt).
All Treated = (true-positive + false-negative)/N - (false positive + true-negative)/N * (Pt/1-Pt).
Weighting: As Pt goes up, true- and false-positives goes down but the weight goes up. With a bigger weight, net benefit tends to get smaller because [false-positives * weight] is larger.
The DCA results were presented earlier so we now turn to graphing ‘net benefit’ and interventions’ saved.
A basic net benefit graph of the classification results when y= ‘nb’.
Below adds a little more detail. We see across all of the key percentiles, this model outperforms a potentially costly approach of “treat all” or “treat none”.
As we expect, a model can inform us on which patients to treat as well as which patients can avoid treatment. This is especially helpful if treatments have permanent side effects such as those from taking biopsies or when working with less critical outcomes, we can save money by avoiding treatments that may be unnecessary for most patients.
A model can reduce unnecessary interventions. For example, a risk threshold of 10% may reduce the number of unnecessary interventions by 40 per 100 without missing treatment for any patients with cancer.
Interventions Avoided = Frequency of True-Negatives/N - False-Negatives/N * (1-Pt/Pt) at a specific probability threshold (Pt).
Interventions avoided.
This graph is more straightforward, we can easily see that as we set a higher threshold on the x-axis, we avoid more and more interventions that weren’t needed. It is important note that the y-axis labels are integers that range from 1 to 100 but the actual values in the y-axis are in decimals.