Letter by Pepe et al Regarding Article, “Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction”
To the Editor:
Current statistical approaches for evaluation of risk prediction markers are unsatisfactory. We applaud Cook’s criticisms of the c-index, or area under the receiver operating characteristic curve. This index is based on the notion of pairing subjects, one with poor outcome (eg, cardiovascular event within 10 years) and one without, and determination of whether the risk for the former (ie, the case) is larger than the risk for the latter (ie, the control). This probability of correct ordering of risks is not a relevant measure of clinical value. It should not play a central role in evaluation of risk markers.
However, Cook goes too far in her dismissal of the classic notions of sensitivity and specificity as irrelevant. True, these are components of the receiver operating characteristic curve, but they are not synonymous with the area under it. Our point of view is that the proportion of cases that are flagged as “high-risk” (sensitivity) and the proportion of controls flagged as “low-risk” (specificity) are crucially important and should be considered in conjunction with predictive accuracy.
Table 3 in Cook’s report is an informative but incomplete summary. We suggest enumeration of the numbers of cases and controls in each cell of the table. According to our calculations, of the 706 cases, 39 (6%) are designated as high-risk, and 381 (54%) are designated as low-risk when the model without high-density lipoprotein (HDL) is used; addition of HDL has a small benefit in that 11 more cases are classified as high-risk and 20 fewer as low-risk. However, for the 26 197 controls, the model without HDL places 22 976 (88%) subjects in the low-risk category and 104 (<1%) subjects in the high-risk group; addition of HDL now results in worse performance, with 87 fewer controls classified as low-risk and 23 more designated as high-risk.
Cook suggests evaluation of the impact of a marker within subgroups defined by baseline risk factors. Predictiveness curves2 applied to subgroups are continuous analogs of the categorical rows in Table 3 of Cook’s report. However, report of results for 1 subgroup and not for others can be somewhat misleading. For example, of subjects in the low/medium-risk category without HDL, 593 subjects move to the low-risk designation with HDL information. Yet 696 move from the low- to low/medium-risk category; the net result is that 103 fewer subjects are designated as low-risk. The margins of Table 3 would be useful to view the performance of the models across the entire population. We see that the model without HDL leaves 3401 subjects (13%) in the intermediate risk ranges; addition of HDL actually increases this number to 3472 subjects.