Agreement Among Cardiovascular Disease Risk CalculatorsClinical Perspective
Background—Use of cardiovascular disease risk calculators is often recommended by guidelines, but research on consistency in risk assessment among calculators is limited.
Method and Results—A search of PubMed and Google was performed. Five clinicians selected 25 calculators by independent review. Hypothetical patients were created with the use of 7 risk factors (age, sex, smoking, blood pressure, high-density lipoprotein, total cholesterol, and diabetes mellitus) dichotomized to high and low, generating 27 patients (128 total). These patients were assessed by each calculator by 2 clinicians. Risk estimates (and assigned risk categories) were compared among calculators. Selected calculators were from 8 countries, used 5- or 10-year predictions, and estimated either cardiovascular disease or coronary heart disease. With the use of 3 risk categories (low, medium, and high), the 25 calculators categorized each patient into a mean of 2.2 different categories, and 41% of unique patients were assigned across all 3 risk categories. Risk category agreement between pairs of calculators was 67%. This did not improve when analysis was limited to just the 10-year cardiovascular disease calculators. In nondiabetics, the highest calculated risk estimate from a calculator averaged 4.9 times higher (range, 1.9–13.3) than the lowest calculated risk estimate for the same patient. This did not change meaningfully for diabetics or when the analysis was limited to 10-year cardiovascular disease calculators.
Conclusions—The decision as to which calculator to use for risk estimation has an important impact on both risk categorization and absolute risk estimates. This has broad implications for guidelines recommending therapies based on specific calculators.
Cardiovascular disease (CVD) risk calculators assist clinicians in estimating a patient’s risk of a cardiovascular event. These calculated risk estimates (RE) are often used to place patients into specific risk categories. This categorization is then used to guide intervention recommendations or determine the benefits of treatment. There are numerous CVD risk calculators (hereafter called calculators) with systematic reviews identifying 110 different risk-scoring methods1 and 45 calculators for diabetics alone.2
Editorial see p 1929
Clinical Perspective on p 1956
Some cohort studies have found that calculator predictions of CVD outcomes may not accurately represent the actual CVD risk. For example, the Framingham model typically overestimates risk in low-risk patients and underestimates risk in high-risk patients.3 Some studies have compared Framingham with different calculators4,5 or with each other6–8 and found that calculated RE are inconsistent among calculators.
Small variations in calculated RE among calculators are to be expected. However, as calculated RE variation increases, calculators will more frequently categorize the same patient into different risk categories.7 In theory, this could lead clinicians to make different recommendations for their patients solely on the basis of which calculator is used.
Previous studies directly comparing how different calculators categorize patients focused on only 3 calculators7 or only 1 patient.8 The largest to date, with 8 calculators, assessed agreement but provided limited information to compare how specific calculators agree in risk estimation or categorization.6 A recent systematic review of CVD risk prediction models (calculators) concluded that more direct comparison of risk calculators was required.9
The objective of this evaluation was to assess the consistency of a broad sample of commonly used calculators over a sample of patients with a range of cardiovascular risk factors.
Because of the large number of existing calculators, we decided to focus on a representative sample, and therefore we selected calculators that were from a variety of countries; used Framingham and other databases; had a range of formats (Internet, cell phone or personal digital assistant, paper and pencil, requirement of triangulation on graph); were associated with and not associated with guidelines; calculated risk over different durations (5- or 10-year risks); were with and without a diabetes mellitus category; and estimated different outcomes such as CVD or coronary heart disease (CHD). We excluded models that did not use age, sex, smoking status, blood pressure, total cholesterol, and high-density lipoprotein (HDL) or a cholesterol/HDL ratio to calculate risk. For example, we did not include models that substituted weight or body mass index for lipid measurements or calculators for specific populations (eg, Native Americans).
We (F.N., G.M.A., and J.M.) initially searched PubMed and Google using the terms cardiovascular risk calculators or heart disease risk calculators. We also scanned references of articles in this area. The purpose of the search was not to find all calculators but to locate a representative group of calculators. Once we had >40 calculators meeting our inclusion criteria, we distributed the list to the clinician authors (F.N., G.M.A., J.M., C.K., and M.R.K.), who independently selected their top 20 calculators. Clinician authors were asked to select calculators they believed were commonly used and provided a broad representation of calculators. This would include calculators from a variety of countries; calculators that were associated with and not associated with common guidelines; calculators with different formats (Internet, paper and pencil, requirement of triangulation on graph); calculators that calculated risk over different durations (5- or 10-year risks) and outcomes (CVD or other); and some calculators with a diabetes mellitus category. Votes by clinician authors were totaled, and we included any calculator recommended by ≥3 of the 5 clinician authors. Although only 1 of 4 calculators at the Edinburgh site was originally selected, we included all 4 to allow for a direct comparison of the same Web site.
We performed a final search in July 2011 and identified a new review article10 that included 21 calculators, 3 of which were previously unidentified and met our inclusion criteria. One of these calculators (Progetto CUORE) involved a different cohort and country, and therefore it was also selected for inclusion.
Calculator inclusion and exclusion flows are presented in Figure 1.
Seven specific risk factors were used in all included calculators, as follows: age, sex, smoking status, blood pressure, diabetes mellitus, total cholesterol, and HDL (or the ratio of the latter 2). To assess the variability of calculated RE among calculators for a broad cohort of hypothetical patients, we assigned 2 values for each of the 7 risk factors. The 2 values for each risk factor were as follows: age, 70 or 50 years; sex, male or female; current smoking status, yes or no; systolic blood pressure, 160 or 120 mmHg; total cholesterol, 7 or 4 mmol/L; HDL cholesterol, 0.8 or 1.3 mmol/L; and diabetes mellitus, yes or no. Using 2 values for each of the 7 variables (27) created 128 unique hypothetical patients.
Some calculators required other variable input. When necessary, we used the following factors: white race; no family history of CVD; all blood pressures before treatment and while not on hypertensive medications; diastolic blood pressure 90 mmHg when systolic blood pressure 160 mmHg, and diastolic blood pressure 80 mmHg when systolic blood pressure 120 mmHg; duration of diabetes mellitus 11 years and hemoglobin A1C 7.2%11; triglycerides 1.5 mmol/L; C-reactive protein 2 mg/L12; and 15 cigarettes per day for smokers.13 The SCORE calculator had a low- or high-risk country classification. We chose the high-risk classification because the majority of countries in SCORE were classified as high risk. Progetto CUORE did not allow input of age 70 years, and therefore we used 69 years. For nondiabetics in the UK Prospective Diabetes Study risk calculator, we assigned a hemoglobin A1C of 5.2% and zero for the duration of diabetes mellitus.
Calculation of Risk
For each calculator, 2 independent reviewers calculated RE on all 128 hypothetical patients. F.N. calculated RE with all calculators, and the other 4 clinician authors (G.M.A., J.M., C.K., M.R.K.) completed 6 calculators each. Risks were compared, and interrater agreement was assessed. When disagreement occurred, G.M.A. and F.N. confirmed assessed risks again.
We then categorized the calculated RE for each hypothetical patient into 3 risk categories. The risk categories were selected to reflect the most common categorizations used for the calculators selected. Calculated risks from 10-year calculators (either CHD or CVD) were categorized as low (<10%), moderate (10% to <20%), and high (≥20%). Calculated risks from 5-year (CVD) calculators were categorized as low (<10%), moderate (10% to <15%), and high (≥15%). SCORE used only 2 risk categories: low (<5%) and high (≥5%).
Ethics approval was not required for this study (because patients were hypothetical).
Sample calculations for each portion of the analysis are provided in the online-only Data Supplement (Sample Calculations).
Variation in Risk Categorization
We defined concordance as the percent agreement in the risk category assignment in pairwise comparisons of 2 calculators. Although concordance is frequently reported with the use of κ, we used percent agreement to allow for easier practical understanding of the results. For each calculator we determined the average, median, and range of the percent agreement in paired comparisons.
We determined the number of patients assigned to the same category by all calculators. For patients categorized into ≥2 risk categories by different calculators, we identified the maximum risk spread for each specific patient. For example, if a patient was categorized as low risk by one calculator and high risk by another, that patient had a 3–risk category spread (low, moderate, and high).
Variation in Absolute Risk Calculation
For those calculators that provided absolute risk percentages for CVD or CHD over 10 years, we were able to assess the range in calculated RE for each patient. We evaluated the relative difference in calculated RE from different calculators by dividing the highest calculated RE by the lowest calculated RE for each patient.
Characteristics of the 24 included unique calculators, 18 of which allowed for assessment of diabetics, are summarized in Table 1. Citations and Internet links for each calculator are available in the online-only Data Supplement (References and Links for Included CVD Risk Calculators). The Total CVD Risk calculator had an error that allowed an HDL of 1.3 mmol/L to be scored 2 different ways (higher and lower risk), and therefore we included both versions in our analysis, which increased the total number of calculators to 25. Nine 10-year CVD calculators (6 for diabetics) and five 10-year CHD calculators (3 for diabetics) provided actual percent calculated RE for all patients.
All patients were assessed by each calculator by 2 authors. Agreement in risk calculated for each patient was 95% between authors. The 5% disagreement arose in 3 ways. As mentioned above, the Total CVD Risk calculator had an error allowing HDL 1.3 mmol/L to be assigned 2 different risks. Originally unnoticed, this caused a 2% decline in agreement. Systematic errors (for example, an author forgetting to change sex to female for a series of patients) caused another 2% decline in agreement. The remaining 1% decline resulted from singular or sporadic errors in data entry.
Variation in Risk Categorization
Overall, the 128 patients were categorized across a mean of 2.2 categories, with 41% of patients crossing all 3 categories. Among diabetic patients, 25% were assigned to the same category by all calculators, 36% were assigned to 2 risk categories, and 39% were assigned across a 3–risk category spread. For nondiabetics, the assignment was 19%, 39%, and 42%, respectively. Overall, 28 of 128 patients (22%) were assigned the same category by all calculators: 6 low risk, 0 moderate risk, and 22 high risk.
The RE concordance (percent agreement in patients assigned the same risk category by a pair of calculators) for all calculators is available in Figure 2. The average, median, and range of the varying paired concordances for each calculator are available in Table 2. Progetto CUORE (code N) had the lowest average and median concordance in calculator pairs: <50% for nondiabetics and <40% for diabetics. Joint British Societies Risk Charts Assessment (G), iPhone STAT Adult Treatment Panel III Lipid Management (X), and National Cholesterol Education Program (Y) all had average and median concordances across calculator pairs of <60% for nondiabetics.
The pooled average concordance for all paired comparisons of nondiabetics was 64%, for diabetics was 73%, and for combined was 67%. Limiting the analysis to just 10-year CVD, to reduce the time variable (5 versus 10 years) and types of outcome variable, did not change the results in a clinically meaningful way. The pooled average concordance for pairs of 10-year CVD calculators only was 68% for nondiabetics, 74% for diabetics, and 70% overall.
To further reduce variability, we performed post hoc comparison of only those calculators that used the Framingham database (15 calculators, 11 for diabetics). The pooled average concordance for pairs of Framingham-derived calculators was 68% for nondiabetics, 84% for diabetics, and 73% overall. When we focused further on the 9 Framingham-derived 10-year CVD calculators (7 for diabetics), the pooled average concordance among pairs was 86% for nondiabetics, 93% for diabetics, and 89% overall. The pooled average concordance for pairs of European database–derived calculators or the subgroup of 10-year CVD calculators from European databases did not improve agreement (data not shown).
As an example of calculators recommended by national guidelines, Total CVD Risk (L) from Canada had a 44% agreement with the National Cholesterol Education Program (Y) from the United States.
Variation Among Calculators in Absolute Risk Calculation
We compared absolute calculated RE for the 10-year CVD and CHD calculators. The absolute difference in calculated RE (highest calculated RE minus the lowest calculated RE for each patient) was greater than the mean calculated RE in 78% of nondiabetics and 72% of diabetics. In nondiabetics, the highest calculated RE was on average 4.9 times higher (range, 1.9–13.3) than the lowest calculated RE for the same patient. In diabetics, the highest calculated RE was on average 5.2 times higher (range, 1.8–11.7) than the lowest calculated RE.
When we focused on 10-year CVD calculators, the absolute difference in calculated RE was greater than the mean calculated RE in 55% of nondiabetics and 63% of diabetics. In nondiabetics, the highest calculated RE was on average 4.0 times higher (range, 1.7–10) than the lowest calculated RE for the same patient. In diabetics, the highest calculated RE was 5 times higher (range, 1.6–11.7) than the lowest calculated RE.
To show the distribution of calculated absolute RE from 10-year CVD and CHD calculators, a sample of 10 patients with the absolute calculated RE is shown on Figure 3. The 10 patients chosen were distributed equally from highest to lowest risk, determined by averaging the RE of all calculators for each patient and then ordering the patients by their average risk. Figure 3 shows how the distribution of calculated RE increases as risk increases. For example, the highest and lowest calculated RE for patient 2 is 72% (Edinburgh ASSIGN) and 22% (Progetto CUORE), for a range in calculated RE of 50%. By comparison, the highest and lowest calculated RE for patient 9 is 11.2% (Edinburgh ASSIGN) and 2.9% (Progetto CUORE), for a range in calculated RE of 8.3%.
We used a total of 25 calculators to calculate CVD and CHD risks for 128 hypothetical patients. Comparing concordance (agreement) in classification with 3 standard risk categories (low [<10%], moderate [10% to <20%], and high [≥20%]), pairs of calculators will assign a different category to the same patient approximately one third of the time. Despite attempts to remove variability by focusing on 10-year CVD calculators only, the concordance improved by only 2%. The range in calculated RE for individual patients is likely clinically important, with the highest calculated RE being ≈5 times higher than the lowest calculated RE on average. Again, narrowing the calculators to just 10-year CVD did little to reduce this in a meaningful way.
In a post hoc analysis, we found that focusing on calculators using the Framingham database marginally improved overall agreement from 67% to 73%. Other individual variables of duration (10 year) or specific end point (CVD) had little impact on agreement. It was only when we focused the comparison on just those calculators that estimated 10-year CVD RE using the Framingham database that overall agreement approached 90%.
Calculator results are known to have margins of error that vary slightly with each calculator. Margins of error are reported to be 2% to 4% for risk estimates <10% and up to 15% for risks >30%.14,15 Interestingly, these margins of error appear to be much lower than the variations seen among calculators. The inconsistency among calculators is likely due to a number of factors, including the following: different databases; differing combinations of CVD end points (some include only hard end points, such as myocardial infarction or stroke, whereas others also include softer end points such as angina and transient ischemic attack); and mathematical algorithms that vary for the same database.
Often defined as nonfatal myocardial infarction and cardiac death, CHD is a subset of CVD, and therefore it would be expected that calculated RE for CHD would be lower than calculated RE for CVD in the same patients. In our study, the calculated RE from CHD calculators was surprisingly frequently higher than many of the calculated RE from CVD calculators (see Figure 3 for examples). The inconsistency and variance found when 10-year CVD and CHD calculators were examined together did not change much when CVD calculators were examined alone. Why CHD calculators are also so widely variable and frequently provide calculated RE in the range of CVD calculators is unclear.
We attempted to minimize some of the differences by focusing on 10-year CVD Framingham calculators. Although this is a relatively homogeneous group, agreement still only approached 90%. This may result from some calculators using more up-to-date versions of the Framingham database. However, it is known that Framingham models require some recalibration for differing populations based on prevalence of risk factors and CHD rates.16 Therefore, some of these differences may result from adjustments applied to the calculator algorithms to better reflect the population for which the risk calculators was designed.
Most CVD guidelines encourage the use of CVD/CHD calculators, and cross-sectional studies report that 22% to 48% of physicians regularly use risk assessment tools to determine CVD risk.17–19 Although the use of calculators appears to improve patients’ perception of risk,20 the effect on risk over time is small at best, and there is no evidence for a reduction of actual CVD events.20–22 Physician barriers to the implementation of risk calculation include time, a belief that the information is not helpful, a sense of oversimplification with risk tools, and an ability to predict risk subjectively.17
Although previous research has shown that physicians struggle to subjectively estimate absolute risk accurately, they appreciate patients’ risks relative to a “normal” risk.22 Another study found that 60% of physicians’ subjective estimations of patients’ CVD risk category agreed with the Framingham Risk Score.18 Compared with the Framingham risk equation, subjective estimations by physicians were as accurate in categorizing patients as 4 different calculators, at 71% versus 66% to 81%, respectively.23 If clinicians are 60% to 71% accurate in subjective assessment of risk category (compared with a calculator), they may be as reliable as calculators, which agree 67% overall when 3 risk categories are used.
Approximately 80% of primary prevention primary care patients have a calculated 5-year risk <10% (approximately <20% for 10-year risk).24 Our study was designed to assess risk calculators over a broad range of risks. Our study has proportionally more high-risk patients and likely does not mirror a common primary care population. The variability in RE increased in higher-risk patients, as shown in Figure 3. However, in patients with multiple risk factors, much of the RE variability occurred in risks >20%. Therefore, although absolute risks can vary widely in higher-risk patients, the risk category assigned by all calculators would still be “high.” Of the 28 patients assigned to a single risk category by all calculators, 22 (or 79%) were high risk. This is also supported by the slightly higher concordance among diabetics (compared with nondiabetics) because they are categorized as higher risk. Including patients with higher risks may have led to underestimation of the variability in assigned risk categories.
More research is required to understand the inconsistency in calculators. However, recommendations to use calculators should identify those that apply to the populations to be assessed and acknowledge the considerable inconsistency among them. For clinicians currently using calculators, it may be reasonable to use a calculator that best represents their patient population. The variability in the calculators identified in this study supports this approach. It is hoped that the risk calculator recommended for a clinician’s community (typically by a guideline) uses a database that reflects typical practice in the clinician’s area and has been calibrated to better account for any differences. Clinicians need to be aware, however, that these calculators provide only rough estimates.
Whereas a meta-analysis has shown overestimation in low-risk categories and underestimation in high-risk categories,3 other data suggest general overestimation in primary care.24 Our future work will attempt to identify calculators that consistently estimate low- or high-risk categories and those that are simply consistently inconsistent.
As described earlier, our study was designed to examine risk calculator variability across a broad range of patient risk levels. However, our study may not mirror primary care populations, and the variability in absolute RE may have been increased, while the variation in risk category may have been reduced. Most of the “uncaptured” absolute RE variability in high-risk patients is unlikely to have much clinical relevance.
We did not review all available calculators. However, we have included more than any previous study, and it is highly unlikely that adding more calculators would meaningfully change the inconsistency. It is possible that the purposeful heterogeneity of our risk calculator sample reduced the level of RE agreement. Removal of 1 or 2 variables did not seem to have important impacts on agreement. For example, concordance and variability changed little when CHD and CVD were analyzed together or when CVD was analyzed alone. However, we found that by limiting the analysis further to 10-year CVD, Framingham risk calculators brought agreement close to 90%. Therefore, it is possible that agreement between 2 calculators with the use of the same database, end points, and duration will be better than the overall agreement from our sample of risk calculators.
We used hypothetical patients with dichotomized risk factors, and it is possible that other values may have provided better consistency. However, hypothetical patients allowed for focal examination of the high and low values for each risk factor and representation of a broad range of patients. Additionally, our study supports other work done in this area,4–8 suggesting that the use of hypothetical patients did not bias the results.
Whether calculated RE are used to assign categories and determine treatment cutoffs or are used to discuss risks and benefits with patients, the inconsistency between calculators appears to be a clinically relevant limitation. Creators of calculators and the guideline writers encouraging their use must provide clear guidance about which tools are suitable for which populations and how to deal with the inconsistent calculations. In the meantime, clinicians using calculators need to be aware of the important variability among them and the limitations associated with any of the available calculators.
Sources of Funding
This project is funded by a grant from the Edmonton North Primary Care Network. The funder had no involvement in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
The online-only Data Supplement is available with this article at http://circ.ahajournals.org/lookup/suppl/doi:10.1161/CIRCULATIONAHA.112.000412/-/DC1.
- Received December 6, 2012.
- Accepted March 15, 2013.
- © 2013 American Heart Association, Inc.
- Breswick AD,
- Brindle P,
- Fahey T,
- Ebrahim S
- van Dieren S,
- Beulens JW,
- Kengne AP,
- Peelen LM,
- Rutten GE,
- Woodward M,
- van der Schouw YT,
- Moons KG
- Brindle P,
- Beswick A,
- Fahey T,
- Ebrahim S
- Jones AF,
- Walker J,
- Jewkes C,
- Game FL,
- Bartlett WA,
- Marshall T,
- Bayly GR
- Fornasini M,
- Brotons C,
- Sellarès J,
- Martinez M,
- Galán ML,
- Sáenz I,
- da Pena JM
- Siontis GC,
- Tzoulaki I,
- Siontis KC,
- Ioannidis JP
- Liew SM,
- Doust J,
- Glasziou P
- 13.↵The National Strategy: Moving Forward—The 2006 Progress Report on Tobacco Control: Health Canada 2007. http://www.hc-sc.gc.ca/hc-ps/pubs/tobac-tabac/prtc-relct-2006/part2-eng.php#a1. Accessed August 4, 2011.
- Sheridan SL,
- Viera AJ,
- Krantz MJ,
- Ice CL,
- Steinman LE,
- Peters KE,
- Kopin LA,
- Lungelow D
- Grover SA,
- Lowensteyn I,
- Esrey KL,
- Steinert Y,
- Joseph L,
- Abrahamowicz M
- McManus RJ,
- Mant J,
- Meulendijks CF,
- Salter RA,
- Pattison HM,
- Roalfe AK,
- Hobbs FD
- Kerr AJ,
- Broad J,
- Wells S,
- Riddell T,
- Jackson R
Past research has suggested that agreement among cardiovascular risk calculators may be low. This study examines the consistency in cardiovascular disease (and coronary heart disease) risk calculators and reports percent agreement in risk categorization to allow easy interpretation of risk calculator consistency. With the use of 25 risk calculators, patients were categorized across a mean of 2.2 risk categories, and 41% of patients were categorized into all 3 risk categories. The average percent agreement in risk category assigned between pairs of calculators was 64% for nondiabetics, 73% for diabetics, and 67% overall. Therefore, on average, 1 in 3 risk calculators will assign the same patient a different risk category. Focusing on specific cardiovascular disease–only calculators or those that looked at 10-year outcomes only did not improve agreement in a clinically relevant manner. Narrowing the variety of risk calculators to 10-year cardiovascular disease Framingham-derived calculators improved agreement to 89% overall. In absolute numbers, the highest risk estimates were on average 4 to 5 times higher than the lowest estimates for each patient. Our study shows important inconsistencies among risk calculators. These inconsistencies can result in different treatment decisions, particularly if risk cutoffs or categories are being used to determine therapy. These in turn have cost and public health consequences. For the practicing clinician, it is probably wise to consistently use a calculator that is calibrated (or adjusted) to suit the clinician’s practice population and not to select a calculator at random or use multiple different calculators. Clinicians and their patients also need to understand that risk estimations are rough approximations.