Lack of Concordance Between Empirical Scores and Physician Assessments of Stroke and Bleeding Risk in Atrial FibrillationCLINICAL PERSPECTIVE
Results From the Outcomes Registry for Better Informed Treatment of Atrial Fibrillation (ORBIT-AF) Registry
Background—Physicians treating patients with atrial fibrillation (AF) must weigh the benefits of anticoagulation in preventing stroke versus the risk of bleeding. Although empirical models have been developed to predict such risks, the degree to which these coincide with clinicians’ estimates is unclear.
Methods and Results—We examined 10 094 AF patients enrolled in the Outcomes Registry for Better Informed Treatment of AF (ORBIT-AF) registry between June 2010 and August 2011. Empirical stroke and bleeding risks were assessed by using the congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, and previous stroke or transient ischemic attack (CHADS2) and Anticoagulation and Risk Factors in Atrial Fibrillation (ATRIA) scores, respectively. Separately, physicians were asked to categorize their patients’ stroke and bleeding risks: low risk (<3%); intermediate risk (3%–6%); and high risk (>6%). Overall, 72% (n=7251) in ORBIT-AF had high-risk CHADS2 scores (≥2). However, only 16% were assessed as high stroke risk by physicians. Although 17% (n=1749) had high ATRIA bleeding risk (score ≥5), only 7% (n=719) were considered so by physicians. The associations between empirical and physician-estimated stroke and bleeding risks were low (weighted Kappa 0.1 and 0.11, respectively). Physicians weighed hypertension, heart failure, and diabetes mellitus less significantly than empirical models in estimating stroke risk; physicians weighted anemia and dialysis less significantly than empirical models when estimating bleeding risks. Anticoagulation use was highest among patients with high stroke risk, assessed by either empirical model or physician estimates. In contrast, physician and empirical estimates of bleeding had limited impact on treatment choice.
Conclusions—There is little agreement between provider-assessed risk and empirical scores in AF. These differences may explain, in part, the current divergence of anticoagulation treatment decisions from guideline recommendations.
Stroke is the major source of morbidity and mortality in patients with atrial fibrillation (AF).1,2 The use of oral anticoagulants can significantly reduce this risk,3 yet this therapy also conveys a risk of bleeding complications. Thus, proper treatment selection requires careful identification of patients in whom stroke prevention outweighs the bleeding risks.
Editorial see p 1997
Clinical Perspective on p 2012
Several tools have been developed to objectively assess these risks of stroke and bleeding in patients with AF,4,5 and their predictive value has been validated across multiple populations.6,7 However, several studies have demonstrated that actual anticoagulation choices in clinical practice often differ from what would be recommended based on risk scores.8 To date, there is limited information available to understand how clinicians assess risk or the degree to which their subjective assessments agree with objective scores.
Using data from the nation’s largest clinical registry of AF, this article will (1) describe physicians’ assessment of patient risk and factors associated with such determinations; (2) describe the level of agreement between physician-assessed risks for stroke and bleeding risk versus those obtained by validated empirical model estimates; (3) identify what patient and provider factors are most associated with the discrepancy between physician-assessed risk and empirical scores; and (4) determine whether choice of treatment with oral anticoagulation (OAC) was driven more by physician-assigned or objective risk assessments.
We used data from the Outcomes Registry for Better Informed Treatment of Atrial Fibrillation (ORIBT-AF), which is a national US registry of outpatients with AF managed by primary care physicians, cardiologists, and electrophysiologists. A nationally representative cohort of sites was invited to participate, and an adaptive design was used to achieve heterogeneity of practice type and geography. Study coordination and site management were performed by the Duke Clinical Research Institute. Eligible patients were aged ≥18 years, with electrocardiographically documented AF that was not due to a reversible cause, and capable of following up every 6 months for at least 2 years. The present analysis included all enrolled patients who had each of the following at baseline: (1) calculable congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, and previous stroke or transient ischemic attack (CHADS2) score; (2) calculable Anticoagulation and Risk Factors in Atrial Fibrillation (ATRIA) score; (3) physician assessment of stroke risk; and (4) physician assessment of bleeding risk.
Data were entered in a Web-based case report form, primarily from the patient’s medical record and treating physician. Components included demographic data, past medical history, AF history (including symptoms) and interventions, medications, vital signs, laboratory and echocardiographic assessments, and incident events. Additional details of the ORBIT-AF registry have been described previously.9
The empirical risk scores used in the current analysis were the CHADS2 score for stroke risk in AF, and the ATRIA score for bleeding risk in patients with AF. The CHADS2 score has been empirically validated and previously studied with scores of 0, 1, or ≥2 categorizing low, intermediate and high risk, respectively.4,10 Similarly, the ATRIA bleeding score has been empirically validated at scores of 0 to 3, 4, or ≥5 correlating to low, intermediate, and high, respectively (Tables I and II in the online-only Data Supplement).5,7
As part of the ORBIT-AF baseline case report form, the patient’s physician was also asked to subjectively classify each patient’s individual risk of stroke and bleeding into 1 of 3 categories: low (<3%), intermediate (3%–6%), or high (>6%). These cutoffs were prospectively defined to generally correlate with observed event rates for low-, moderate-, and high-risk groups by the CHADS2 and ATRIA risk stratification tools. The physicians could assign this estimate however they deemed appropriate.
The study population was subsequently stratified by empirical and physician-assessed risk categories for both stroke and bleeding (low risk, intermediate risk, and high risk). This yielded 4 risk levels for each patient: physician-assessed stroke risk, CHADS2-assigned stroke risk, physician-assessed bleeding risk, and ATRIA-assigned bleeding risk. Each of these 4 risk assignments had 3 levels: low, intermediate, and high. Baseline characteristics were calculated among these groups. Subsequently, we calculated agreement between empirical risk scores and physician-assessed risk by subtracting the risk level assigned by the scores from that assigned by the physicians, for each stroke and bleeding. This yielded difference scores for every patient, for both stroke and bleeding, that ranged from −2 (the physician underestimated risk by 2 levels [eg, physician-assessed risk was low and empirically assigned risk was high]) to +2 (where the physician overestimated risk by 2 levels). Using these difference scores as continuous variables, we identified patient factors associated with mismatch between physician-assessed risk and empirically categorized risk for each stroke and bleeding. Last, we assessed rates of treatment with OAC across these groups.
Agreement between physician-assigned (low, medium, high) and empirical (low, medium, high) risk assessment was displayed by presenting the number of patients and the percentage in each category, and measured as the weighted Kappa statistic.
Baseline characteristics are presented stratified by risk in terms of empirical score and physician-assigned risk (low, intermediate, high) for stroke and bleeding, separately. Continuous variables are presented as medians (Q1–Q3) and categorical variables are presented as proportions. Because of the significant overlap in patient populations of objective and physician-assigned risk strata, statistical testing was not performed between or across these groups.
To assess the factors associated with mismatch between physician-assigned and empirical risk we evaluated the difference in categorization (physician-assigned minus empirical), with possible values of −2, −1, 0, 1, 2 representing the number of categories apart. This outcome was treated as continuous and modeled by multivariable linear regression. A positive parameter estimate, 0.5 for example, implies that, per 1-unit increase in a covariate, the difference between physician-assigned and empirical risk is expected to be 0.5 categories higher, all else being equal. In short, positive coefficients correspond to factors that concern physicians more and negative coefficients correspond to factors that concern physicians less, relative to the empirical scores. P values and confidence intervals are based on robust standard errors (to account for the correlation in the same site). Two separate regression models were developed for the evaluation of (1) stroke risk and (2) bleeding risk. The final regression model for each outcome was developed based on selected risk factors from candidate baseline characteristics (see Table III in the online-only Data Supplement) using backward selection, with an α for exclusion of 0.05. All continuous variables were tested for linearity, and nonlinear relationships were accounted for by using linear splines.
Missing covariate data in the regression analyses were handled by multiple imputation with the use of Markov Chain Monte Carlo and regression methods. The extent of missing covariates, individually and jointly, was evaluated in the first phases of model building. The imputation method used Markov Chain Monte Carlo simulation to create a monotone missing data structure so that subsequent imputation could be performed by regression method. Imputed data for covariates was only used for predictors in the multivariable model, not for the outcome. Backward selection for the final regression model for each outcome was performed in the first imputed data set. Final estimates and associated standard errors reflect the combined analysis over 5 imputed data sets
Approval for ORBIT-AF was obtained from the Duke University institutional review board, and all sites obtained institutional review board approval subject to local regulations. All patients signed written, informed consent. Analyses of the aggregate, deidentified data were performed by the Duke Clinical Research Institute by using SAS software (version 9.3, SAS Institute, Cary, NC).
The overall ORBIT-AF population included 10 132 patients from 176 sites. After excluding patients missing parameters of the CHADS2 and ATRIA scores (n=3) and those missing physician assessment of stroke or bleeding risk (n=35), this yielded a final study cohort of 10 094 patients from 176 sites.
Comparison of Physician-Assigned Versus Empirical Risk Assessments
Overall, clinicians rated 1625 patients (16%) as having high risk of stroke, versus 7251 (72%) who were considered high risk by CHADS2 (Table 1, Figure 1A). Similarly, clinicians rated 719 patients (7%) at high risk of bleeding on anticoagulation therapy versus 1749 (17%) considered high risk by the ATRIA score (Table 1, Figure 1B). The overall weighted Kappa score between physician-assigned stroke risk and empirical stroke risk assessment was 0.10 (95% confidence interval [CI], 0.10–0.11), indicating poor agreement.11,12 Similarly, the weighted Kappa between physician-assigned bleeding risk and empirical bleeding risk assessment was 0.11 (95% CI, 0.09–0.12), also indicating poor overall agreement.
The baseline characteristics of patients, stratified by physician-assigned and empirical stroke risk assessments, are shown in Table 2. Patients subjectively labeled as low risk for stroke were older (median, 71 versus 63), more likely female (40% versus 33%), with more medical comorbidity (coronary artery disease 29% versus 14%, previous stroke/transient ischemic attack 7.6% versus 0%), compared with patients with a CHADS2 score of 0. Those subjectively assessed as high risk were of similar age (median, 78 for both), but more likely to have significant comorbidity (coronary artery disease 48% versus 42%, previous stroke/transient ischemic attack 37% versus 21%) in comparison with patients with a CHADS2 score of ≥2.
The baseline characteristics of the patients, stratified by physician-assigned and empirical bleeding risk assessment, are shown in Table 3. Patients subjectively assessed as low risk of bleeding were of similar age and sex balance, with roughly equivalent rates of comorbid diseases as those with low-risk ATRIA scores. Patients at high risk of bleeding (subjectively or empirically) were overall older (median age, 78 and 82, respectively) and had overall high rates of comorbidities contributing to stroke and bleeding risks.
Factors Associated With Disagreement
Multivariable models of the difference between physician-assessed and empirically calculated risk are shown in Figure 2. Physician-assigned risk was less influenced by hypertension (adjusted mean difference, 0.61; 95% CI, 0.56–0.65), heart failure (adjusted mean difference for New York Heart Association class I versus no heart failure, 0.23; 95% CI, 0.17–0.29), increasing age (adjusted mean difference for 10-year increase from 60 to 85, 0.15; 95% CI, 0.12–0.18), and diabetes mellitus (adjusted mean difference, 0.14; 95% CI, 0.09–0.2). Previous stroke or transient ischemic attack (adjusted mean difference, 0.17; 95% CI, 0.10–0.23), severe AF symptoms (adjusted estimate, 0.08; 95% CI, 0.02–0.14) and not living independently (0.12; 95% CI, 0.06–0.18) were more strongly associated with physician-assigned risk than empirical stroke risk. Anemia was most significantly associated with physician-assigned bleeding risk being lower than empirically calculated bleeding risk (adjusted estimate, 1.36; 95% CI, 1.30–1.42), whereas a variety of comorbidities were roughly equally associated with bleeding between physicians and the empirical risk score (eg, previous gastrointestinal bleed, heart failure class, concomitant atherosclerotic disease, alcohol abuse, and depressed renal function). Full model details can be found in Tables IV and V in the online-only Data Supplement.
Use of Systemic Oral Anticoagulation
Among patients at high risk of stroke by CHADS2 score (≥2), 80% were treated with OAC, in comparison with 81% of those assessed as high risk subjectively by the physician. In contrast, OAC use among patients did not vary much among those with high- or low-risk ATRIA scores (73% versus 77%) or high or low physician bleeding risk estimates (73% versus 68%; Figure 3).
There are 4 main findings in this study of empirical and provider risk stratification. First, physicians’ categorical assessment of stroke and bleeding risk in patients with AF was poorly correlated with empirical risk estimates. Second, physicians generally classified many fewer patients as having high risk for stroke and bleeding than validated empirical risk models. Third, physicians’ emphasis on specific risk factors differed significantly from the empirical scores (eg, hypertension, heart failure). Finally, the assessments of stroke risk by either empirical or physician assessments seemed to have a larger impact on subsequent decisions regarding the use of anticoagulation than assessments of bleeding risk.
The disagreement between physician-assigned risk categorization and empirical risk categories in up to 80% of the cases highlights several pitfalls of risk stratification (Table 1). Physicians may have difficulty translating empirical scores into absolute event rates. Although the predefined categories on the ORBIT-AF case report form were designed to mirror derivation cohorts from empirical risk scores, many physicians may not correlate a score with an event rate. For example, it appears that physicians markedly underestimated the annual risk of stroke among AF patients in various CHADS2 categories. However, the discriminatory value of the CHADS2 scoring system remains modest.4
In our multivariable analysis of mismatch, it appears that physicians weighted individual risk factors differently from weightings in the scores, for both stroke and bleeding. For example, patients with high-risk CHADS2 scores owing to a previous stroke had higher physician-assessed risk than those who got to a high CHADS2 score with a combination of other factors. Analogously, the presence of anemia or significant renal disease was significantly associated with lower physician-assigned bleeding risk, relative to ATRIA score. One explanation is that physicians simply value certain risk factors differently from the empirical scores. Alternatively, this could be another manifestation of poor calibration of risk by clinicians.
Our study also highlights the influence of clinicians’ assessments on treatment decisions. Rates of OAC were lower among patients with low or intermediate CHADS2 risk, in comparison with OAC use among patients deemed low or intermediate risk by their physicians. Similarly, patients having a high bleeding risk by ATRIA score were more likely to receive OAC than those deemed at high risk by their provider. This demonstrates that, in fact, when discordant, physician subjective evaluation was a stronger driver of decisions than the objective empirical tool.
Furthermore, it appears that stroke risk, more so than bleeding risk, drives the decision to use OAC. Rates of OAC varied more across stroke risk categories than across bleeding risk strata. This appears contrary to previous data demonstrating that physicians tend toward errors of omission (ie, causing harm owing to withholding therapy) versus errors of commission (ie, causing harm because of therapy).13–15 There may be several reasons for the low influence of bleeding risk: stroke risk is particularly emphasized in the US guidelines; there appears to be a lack of familiarity with validated bleeding scores; or, it may simply reflect the difficulty in predicting bleeding risk by any method. However, the overall effect was modest; rates of OAC use in our cohort were relatively high overall and a risk-treatment paradox persisted. Up to 70% of patients at low risk of stroke received OAC, and nearly 20% of patients at high risk did not receive OAC. There remains room for improvement in the selection of AF patients for OAC.
These analyses are derived from a national registry and sampling bias may exist. However, ORBIT-AF was designed to be inclusive of a very broad sample of community practice types and locations and included a wide spectrum of AF patient types (incident and prevalent, paroxysmal and permanent). Additionally, the physician assessment was based on a single question and reporter biases are possible; however, these data were not linked to any rewards or comparative profiling, so honest responses would be expected. Although many stroke and bleeding risk assessment tools exist, this article focuses on CHADS2 and ATRIA risk assessments. That said, these are the most commonly used and longstanding tools in the United States, and it is likely that results would be consistent had other risk scores been used. Additionally, the cut points for low, moderate, and high risks are subjective, yet ours were selected to correspond to our predefined physician risk ranges. Notably, we cannot say which risk assessment, empirical or physician-assigned, was more correct. Last, treatment decisions regarding OAC were made before the subjective risk assessment by the provider, and we cannot exclude the possibility that current treatment influenced the provider’s risk assessment.
Concordance between provider-assessed risk and empirical scores for stroke and bleeding in patients with AF is low. Although physicians rely on components of scores to estimate risk, they appear to weight them differently from the calculated score. Risk assessment at the patient level remains a challenge, for both stroke and bleeding, and such assignments have important implications for treatment decisions.
Sources of Funding
The ORBIT-AF registry is sponsored by Janssen Scientific Affairs, LLC, Raritan, NJ. Dr Steinberg was funded by National Institutes of Health T-32 training grant 5 T32 HL 7101-38.
Dr Steinberg reports modest educational support from Medtronic. Dr Fonarow reports modest consultant/advisory board support from Ortho McNeil. Dr Hylek reports modest honoraria support from Boehringer-Ingelheim and Bayer; modest consultant/advisory board support from Johnson & Johnson, Boehringer-Ingelheim, Bristol-Myers Squibb, Daiichi Sankyo, Pfizer, and Ortho-McNeil-Janssen. Dr Ansell reports modest consultant/advisory board support from Bristol Myers Squibb, Pfizer, Janssen, Daiichi, Boehringer Ingelheim, and Alere. Dr Chang reports significant employment with Janssen Pharmaceuticals, Inc. Dr Kowey reports modest consultant/advisory board support from Boehringer Ingelheim, Bristol Myers Squibb, Johnson & Johnson, Portola, Merck, Sanofi, and Daiichi Sankyo. Dr Gersh reports modest DSMB/advisory board support from Medtronic, Baxter Healthcare Corporation, InspireMD, Cardiovascular Research Foundation, PPD Development, LP, Boston Scientific, and St. Jude. Dr Mahaffey’s financial disclosures preceding August 1, 2013, can be viewed at https://www.dcri.org/about-us/conflict-of-interest/Mahaffey-COI_2011-2013.pdf; disclosures after August 1, 2013, can be viewed at http://med.stanford.edu/profiles/kenneth_mahaffey. Dr Singer reports significant research grant support from Johnson and Johnson; modest consultant/advisory board support from Bayer HealthCare, Boehringer Ingelheim, Bristol-Myers Squibb, Johnson and Johnson, and Pfizer; and significant consultant/advisory board support from Daiichi Sankyo. Dr Piccini reports significant research grant support from Johnson & Johnson / Janssen Pharmaceuticals; significant other research support from Bayer HealthCare Pharmaceuticals Inc (formerly Berlex Labs), Boston Scientific Corporation, Johnson & Johnson Pharmaceutical Research & Development; modest consultant/advisory board support from Forest Laboratories, Inc and Medtronic, Inc; and significant consultant/advisory board support from Johnson & Johnson / Janssen Pharmaceuticals. Dr Peterson reports significant Research Grant support from Eli Lilly & Company, Janssen Pharmaceuticals, Inc, and the American Heart Association; modest consultant/advisory board support from Boehringer Ingelheim, Bristol-Myers Squibb, Janssen Pharmaceuticals, Inc, Pfizer, and Genentech Inc. The other authors report no conflicts.
Guest Editor for this article was Wendy Post, MD, MS.
The online-only Data Supplement is available with this article at http://circ.ahajournals.org/lookup/suppl/doi:10.1161/CIRCULATIONAHA.114.008643/-/DC1.
- Received January 6, 2014.
- Accepted March 10, 2014.
- © 2014 American Heart Association, Inc.
- Wolf PA,
- Abbott RD,
- Kannel WB
- Friberg L,
- Rosenqvist M,
- Lip GY
- Apostolakis S,
- Lane DA,
- Guo Y,
- Buller H,
- Lip GY
- Sandhu RK,
- Bakal JA,
- Ezekowitz JA,
- McAlister FA
- Piccini JP,
- Fraulo ES,
- Ansell JE,
- Fonarow GC,
- Gersh BJ,
- Go AS,
- Hylek EM,
- Kowey PR,
- Mahaffey KW,
- Thomas LE,
- Kong MH,
- Lopes RD,
- Mills RM,
- Peterson ED
- Altman DG
- Fleiss JL,
- Levin B,
- Paik MC
Physicians treating patients with atrial fibrillation (AF) must weigh the benefits of anticoagulation in preventing stroke versus the risk of bleeding. Empirical models have been developed to predict such risks (CHADS2 [congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, and previous stroke or transient ischemic attack] score for stroke; ATRIA [Anticoagulation and Risk Factors in Atrial Fibrillation] score for bleeding), and we aimed to assess the degree to which these coincide with clinicians’ estimates. Among 10 094 outpatients with AF in the Outcomes Registry for Better Informed Treatment of Atrial Fibrillation (ORBIT-AF) registry, agreement between physician-assessed stroke risk and CHADS2 score was low. Similarly, agreement between physician-assessed bleeding risk and ATRIA score was also low. Physicians appeared to weigh different factors in their assessments of stroke and bleeding risk, and the use of anticoagulation was highest among patients with high stroke risk, by CHADS2 score or physician assessment. In contrast, physician and empirical estimates of bleeding had limited impact on treatment choice. Overall treatment rates have been reported to be suboptimal, and these data may provide an explanation, in part. Although prospective studies have demonstrated the benefit of anticoagulation in patients stratified by CHADS2 score, risk assessment at the patient level remains a challenge, for both stroke and bleeding, and such assignments have important implications for treatment decisions.