Response to Letters Regarding Article, “Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction”
I am pleased that Pepe et al agree that the C-statistic is not a relevant measure of clinical value despite its use as such in the clinical literature. They remain, however, interested in sensitivity and specificity. It is true that in the high-density lipoprotein example the sensitivity, or probability of inclusion in the highest risk group among cases, improves in the model with high-density lipoprotein, whereas the specificity, or probability of inclusion in the lowest risk group among controls, declines. The model with high-density lipoprotein tends to categorize more women, both cases and controls, into higher risk categories. The reclassification table, though, offers evidence that the predicted risk with high-density lipoprotein is closer to observed risk.
Pepe et al prefer to focus on the margins of the reclassification table, but these margins may not be of the most interest. Figure 3 of our original paper shows the predictiveness curves, or the continuous analog of these margins, for models with and without systolic blood pressure. These show little difference, despite the fact that systolic blood pressure is the strongest predictor of cardiovascular disease after age and is causally related. The joint rather than marginal distributions of predictive values, such as those in the body of Table 3, can be more informative in the determination of utility for individual patients. Some women move up and some move down a category, and the new categories are more accurate. It is not just how many are at intermediate risk that matters, but whether individuals are placed in that category appropriately.
In contrast, Janket et al assert that only the C-statistic can be used to assess clinical utility. Although useful for diagnosis, where discrimination is of most interest, the C-statistic is a weak tool for risk prediction. Clinical decisions also should not be based on the relative risk alone, especially simply on quartiles. More appropriate cut points need to be determined for clinical classification.
A single marker alone cannot be used to accurately predict a complex disease, whether it be blood pressure, lipids, or a novel biomarker. It is only useful in combination, such as in the Framingham risk score. The estimates by Janket et al of sensitivity and specificity are based on crude numbers and do not reflect clinical classification by such a combination score.
Also, although inherent properties of a single x-ray machine may not vary under ideal conditions, there is variability in its use. Factors such as technique and clinical interpretation as well as case mix and severity can lead to differing estimates of sensitivity and specificity.