ACCF/AHA New Insights Into the Methodology of Performance Measurement
A Report of the American College of Cardiology Foundation/American Heart Association Task Force on Performance Measures
- 2. New Insights Into the Selection of Possible Performance Measures
- 3. New Insights Into the Construction of Performance Measures
- 4. New Insights Into the Implementation of Performance Measures
- 5. New Insights Into the Analysis and Interpretation of Performance Measures
- 6. Conclusion
- Figures & Tables
- Supplemental Materials
- Info & Metrics
Since the publication of the initial American College of Cardiology (ACC)/American Heart Association (AHA) Methodology for the Selection and Creation of Performance Measures,1 there has been an explosion in the development and application of performance measures. Although initially envisioned as a means for physician-led quality-improvement efforts, performance measures have been primarily used as tools for accountability and performance-based reimbursement instead. Given the centrality of and experience with performance measures for quantifying healthcare quality, the American College of Cardiology Foundation (ACCF)/AHA Task Force on Performance Measures sought to update its methodology so that ongoing efforts to measure performance could benefit from emerging insights. The original methodology, proposed in 2005,1 remains the foundation for developing process performance measures. The principal recommendations of the 2005 report are summarized in Table 1. The 2010 report does not address detailed issues of analysis,3 pay for performance,4 or nonfinancial rewards for better performance5 because these topics have been addressed in other statements. The focus of the 2010 report is to provide a state-of-the-art perspective on the construction, collection, and emerging directions of performance measurement as a means to improve healthcare quality.
Performance measures that articulate discrete processes of care, as opposed to structural aspects of care or outcomes, are distinctly different from both clinical practice guidelines and appropriate use criteria because they represent a subset of the clinical guidelines for which the evidence is sufficiently strong: typically where the highest-quality evidence of benefit unequivocally exceeds risk (Class I recommendation, Level of Evidence: A),8 failure to provide the therapy to an eligible patient meaningfully reduces the likelihood that the patient will experience the best possible outcome. In this report, the writing committee, commissioned by the Task Force on Performance Measures, discusses new insights into the selection of performance measures, including the strength of evidence needed to consider creating a performance measure from a clinical guideline and the role of costs in considering selection of a performance measure for an expensive technology. The committee then describes new insights into the construction of performance measures (creation of exclusions, use of outcomes measures, numbers of measures, and modification/retirement of measures). Finally, the implementation and analysis of performance measures are discussed, including the use of composite measures and attribution as the foundation for a continually growing and improving healthcare system.
2. New Insights Into the Selection of Possible Performance Measures
2.1. Abbreviations Used Throughout the Report
AMI, acute myocardial infarction
CMS, Centers for Medicare and Medicaid Services
D2B, American College of Cardiology Foundation Door-to-Balloon Alliance
EMRs, electronic medical records
ICER, incremental cost-effectiveness ratio
INR, international normalized ratio
IOM, Institute of Medicine
MI, myocardial infarction
NCQA, National Committee for Quality Assurance
NQF, National Quality Forum
PCI, percutaneous coronary intervention
PCPI, American Medical Association–Physician Consortium for Performance Improvement
2.2. Strength of Evidence
Initial recommendations for construction of a performance measurement set1 involved 1) evaluating the strength of evidence supporting a potential performance measure, 2) defining the clinical significance of the outcome most likely to be achieved by adherence to a performance measure, and 3) assessing the magnitude of the association between adherence to the potential performance measure and a clinically important outcome. Because there can be strong financial incentives for a manufacturer to have its diagnostic or therapeutic products included in a performance measure, a clearly articulated approach to the selection of performance measures is needed. The writing committee reviewed current perspectives to determine how to select performance measures that could improve patients' health in clinically meaningful ways.
An important concept in the selection of performance measures is confidence that the selected measures will meaningfully improve the health—either survival or health status (patients' symptoms, function, and quality of life)—of the population to whom the measures are applied. For performance measures other than outcomes, writing committees should clearly establish that the selected process or structural performance measures have a strong association with clinically meaningful outcomes. The strength of this association can be measured in a number of different ways through a qualitative or a quantitative assessment of the likely benefit and the range of uncertainty about the size of that benefit. Under the current paradigm, clinical practice guideline writing committees classify information obtained from mixtures of randomized clinical trials, nonrandomized studies, expert panel consensus, and case studies to create a hierarchical grading system. This system integrates the methodological quality of the underlying evidence (level of evidence, from the highest [A] to the lowest [C]) and the trade-off between benefit and risk (class of recommendation, from the highest [I] to the lowest [III]).8 In comparing risks and benefits, writing committees ultimately develop a qualitative sense that the benefits outweigh the risks. However, this qualitative assessment is usually based on the number and type of supportive studies rather than the clinical importance of the observed differences in outcome.6,8 By design this approach elevates ratings to those studies where statistically significant differences in outcomes are replicated in several randomized clinical trials. Reliance on statistically significant differences indicates that there was some benefit from the intervention, however. In an era where many studies use combined end points, though, rather than relying solely on mortality or quality of life, less-important outcomes (including surrogates) may drive the statistical significance of a trial. In fact, because industry-funded clinical trials are primarily designed to provide data to support regulatory approval of a novel treatment and because the US Food and Drug Administration often requires several supportive trials to grant approval, industry trials are often large and replicated. In contrast, nonindustry-sponsored trials, such as those sponsored by the National Institutes of Health or the Veterans Affairs healthcare system, are rarely repeated. Although replication of scientific findings is a key tenet in assessing cause and effect, the strength, accuracy, and clinical importance of the findings must also be weighed. The writing committee believes that an enhanced system for selecting potential performance measures that provides quantitative summaries of the impact on outcomes from adherence to the measure is needed.
Translation of clinical evidence into quantitative summaries for use in the development of performance measurement is challenging but feasible (see Online Appendix C). In particular, explicit assessments of the clinical importance of an observed finding, such as whether 1) the outcome was important and 2) whether the range of possible “true” differences between the treatment groups represents a clinically important difference in outcomes (see example E in Figure 1 of Online Appendix C) will enhance the understanding of the benefit. The writing committee believes that no hard and fast rules of minimal clinically important differences can be created outside the context of a particular intervention and outcome. But converting both survival and health status (eg, being asymptomatic, having a clinically important improvement in function or health-related quality of life) benefits of treatment into meaningful summary metrics, such as number needed to treat with corresponding measures of uncertainty, could help writing committees establish a standard for their use in creating a performance measurement set. Reporting quantitative measures of comparative evidence, such as Bayes factors, will also help writing committees in their decision-making. Regardless of the approach used, writing committees should be explicit as to which outcomes and benefits were considered clinically important in recommending that an intervention be developed into a performance measure. This process would be much more straightforward if clinical trialists, when designing their studies, explicitly stated what defined a clinically important difference in outcomes for each of the end points assessed in the trial. Such routine reporting would markedly simplify the incorporation of study results into guidelines and performance measures.
The Task Force on Performance Measures recommends examining the evidence and range of clinically important benefits to provide quantitative evidence with which to assess the potential benefit of a proposed performance measure. A particular advantage to this approach is its formal specification of clinical benefit and the ability to systematically incorporate the range of available clinical evidence into a transparent analysis demonstrating the confidence with which a benefit of a certain magnitude might be gained from widespread adoption of the clinical practice. An explicit delineation of the clinical logic used to create a performance measure should be disclosed, and a formal process for evaluating existing evidence should be developed with the goal of different performance measures writing committees likely selecting similar processes of care from which to create performance measures. Such a process would have the added advantage of minimizing potential conflicts of interest among members of a performance measures writing committee. The Task Force on Performance Measures recognizes that formally integrating available evidence into a framework to define the clinical significance of a benefit is labor-intensive. Ideally, this would be done by guidelines writing committees, but this is not always the case.9 Although it is not to be implied that it is the role of performance measures writing committees to conduct such analyses, it is important that explicit articulation of the clinically meaningful benefit of introducing a performance measure be demonstrated and referenced before the measure is created and selected. The ACCF/AHA Task Force on Practice Guidelines is examining alternative approaches to grading clinical evidence. On completion of this process, a more standardized approach can be developed.
2.3. Costs and Performance Measures
The creation of a performance measure implies that all eligible patients (see Section 3.1) for that measure should receive, or at least be considered for, the therapy. The writing committee believes that it is important to consider both the cost-effectiveness and total cost burden of potential performance measures before selection. Although these may change over time, explicitly quantifying the cost-effectiveness of treatments at the time that performance measures are created is aligned with the Institute of Medicine (IOM) goal for a more efficient healthcare system and will minimize the likelihood that unintended economic consequences for society and hospitals emerge from adopting a measure.7 It is not necessarily the role of performance measure writing committees to conduct formal cost-effectiveness analyses, but the writing committee believes that it is important to consider such analyses during selection of performance measures so that the societal outcomes, including financial outcomes, of implementing performance measures can be transparent. Cost-effectiveness analysis should occur before or be concurrent with performance measure recommendations, should be conducted by parties free of conflicts of interest, and should preferentially use the societal perspective in defining cost-effectiveness.10 In some situations, therapies are both more effective and less costly than the standard of care (ie, dominant treatments). When this occurs, there is strong justification to promote the intervention to a performance measure because it is likely to both improve care and lower costs. Although other issues may preclude the selection of a dominant treatment as a performance measure (eg, feasibility of implementation), such treatments represent an ideal opportunity for creating performance measures. In most circumstances, however, effective therapies are also associated with increased costs. This creates a need to balance costs against benefits attained, especially because there are competing demands for the limited resources available to governments and societies for improving the health of populations.
There is no consensus on how cost considerations should be integrated into decisions about performance measures. Traditionally, value has been defined as the absolute effectiveness of a given therapy compared with an alternative, conditional on the cost of that therapy (ie, the incremental cost-effectiveness ratio [ICER]).11 Unfortunately, although most cost-effectiveness studies have been conducted from a societal (ie, population-wide) perspective, significant heterogeneity remains in study designs (eg, in-trial analyses versus Markov models), costing methods (eg, microcosting versus macrocosting), measures of effectiveness, assumptions, and time horizons (eg, 3 years versus lifelong), and there is no consensus as to what ICER threshold (if one should even be put forth) would be considered cost-effective.10,12 Because of these considerations, there are significant limitations in comparing ICERs across studies. For example, the ICER for implantable cardioverter-defibrillator therapy for primary prevention varies from $34 000 to $235 000 per quality-adjusted life-year across different cost-effectiveness studies, depending on which patient subpopulations are considered.13,–,15 Moreover, a cost-effectiveness analysis is not sufficient to fully appreciate issues of cost because it does not provide a transparent reporting of the total cost burden of the intervention to society, which is determined by the cost of the therapy and the prevalence of the condition for which the therapy is indicated. As such, 2 therapies could have identical ICER estimates but vastly different impacts on a healthcare budget with competing demands.
There are other cost considerations for performance measures. In some cases a therapy may be more effective and less costly from a societal perspective, but its implementation may financially penalize clinicians or hospitals (eg, if the therapy prevents hospital readmissions or is poorly reimbursed by payers, such as higher nurse–patient ratios). In these circumstances, when a patient's benefit is expected to increase and the total costs to society should decrease, realignment of dysfunctional economic reimbursements is needed to better align the financial paradigm so that the performance measures can be implemented without disadvantage to a particular component of the healthcare system.
Finally, providing incentives through pay for performance for physician compliance with performance measures may also have unintended consequences on cost-effectiveness. Such incentives may lead to physicians “gaming” the system. By using strategies such as aggressive screening and overdiagnosis, clinicians can appear to achieve better performance with some performance measures (eg, achieving higher rates of hemoglobin A1C [HbA1C] of <7.0 for patients with diabetes mellitus and blood pressure control for patients with hypertension) because their sicker patient population is “diluted” with patients having an early stage or milder forms of a condition.16 Such efforts only lead to increased population costs for treatment, decreased average net effectiveness of treatment, and, consequently, decreased cost-effectiveness (ie, higher ICERs). Moreover, the use of artificial thresholds to warrant payments may present a problem by rewarding a practice that achieves that threshold (eg, lowering HbA1C from 7.1 to 6.9) rather than making a more substantive improvement in patient management (eg, lowering HbA1C from 12.0 to 7.5).
Recognizing both the responsibilities of advancing a more efficient healthcare system and the existing limitations and lack of standardized methods in assessing costs and cost-effectiveness, the Task Force on Performance Measures believes that a working committee should be created to develop recommendations and standards for applying considerations of cost and cost-effectiveness to the creation of performance measures, including any potential medicolegal consequences of explicitly considering cost considerations in performance measures. Because all performance measures writing groups will confront the challenge of having to integrate costs into their selection process, an overarching strategy needs to be developed and implemented.
3. New Insights Into the Construction of Performance Measures
3.1. Use of Exceptions in Performance Measures
One area of performance measure creation that has garnered significant attention in the past few years is the subject of exclusions. As noted in the initial methodology report,1 “Occasionally the denominator will exclude subsets of patients within the target population and the dimension of care for the performance measure” (p 1153). These exclusions might more accurately be termed exceptions because the data from these patients should still be captured for purposes of internal quality improvement analyses, even though the data may not be included in performance measurement reports. This also implies that the performance measure was at least considered for each potentially eligible patient, a primary goal of performance measures and the quality improvement that they are intended to facilitate. Provisions for exceptions should be made in most process and outcome measures that are used for accountability, including both provider compensation (pay for performance) and public reporting purposes. It is less critical to provide for exceptions when measures are used solely for internal quality improvement, although collection and use of these measures could still be useful to physicians in analyzing practice patterns. A detailed discussion of the logic for this perspective is provided in Online Appendix D.
According to a useful construct developed by the American Medical Association–Physician Consortium for Performance Improvement (PCPI), exceptions to the use of process-based performance measures can be documented on the basis of medical, patient, or system reasons.17 These major categories are further delineated into subcategories:
Contraindicated (patient history of allergy, potential adverse drug reaction, other)
Not indicated (already received/performed, not likely to benefit, other)
Intolerant (therapy tried and patient could not tolerate it)
Other medical reason(s)
Other patient reason(s)
Resources to perform services not available
Insurance coverage/payer-related limitations
Service/treatment to be provided by another physician
Other reasons attributable to healthcare delivery system
The principal advantage of such a categorization is the decreased burden of data collection. Rather than listing and collecting data on each potential contraindication—and given the virtually infinite number of unique situations that likely exist to justify when a performance measure should be responsibly withheld from a potentially eligible patient—it is now possible to merely select the category for which the performance measure is not appropriate. The writing committee continues to support this framing, given its advantages in improving the feasibility of performance measurement, but recognizes that there are potential problems with both the reproducibility of assigning a specific contraindication into the correct category and the possibility of “gaming” the performance assessment efforts by incorrectly excluding potentially eligible patients. To correct misclassification, either additional staff training or more accurate coding of electronic medical records (EMRs) is likely to be needed. Professional ethics and a structured audit system are the 2 most effective means of minimizing intentional manipulation of patient data to achieve artificially better performance reports. Selective auditing of practices with a large proportion of potentially eligible patients with exclusions might be one way of ensuring more accurate categorization of exclusions. Importantly, the writing committee believes that if a patient has a potential exclusion but receives the treatment, then that patient should be included in both the numerator and denominator of the measure. Finally, it is recognized that the use of patient-level exclusions has the potential to sustain or exacerbate disparities in care. In its report, “Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care,” the IOM acknowledges that poor (or culturally insensitive) presentations of treatment recommendations may potentially influence patient decisions.18 To the extent that the quality of medical explanations presented to different racial, ethnic, sex, or age groups varies, then patients may refuse treatment—a “patient-centered” exclusion—even though a better or more culturally sensitive explanation might have led the patient to accept the therapy and receive it. This underscores the need to examine not only actual performance rates, but also the proportion of providers' populations excluded so that outliers (eg, those with large proportions of potentially eligible patients excluded) can be identified for further investigation.
In summary, the Task Force on Performance Measures supports the application of exclusions by removal of patients from the denominator. If a patient with a potential exclusion does in fact receive treatment, then the patient should be counted in both the numerator and denominator. This approach recognizes that some contraindications are relative, and if a clinician believes that the benefits of a treatment outweigh its risks, then the clinician should receive credit for successfully fulfilling the performance measure.
3.2. Considerations in the Use of Outcomes Measures
Outcomes measures are emerging as a critical component of the measurement portfolio. In 2007, the Centers for Medicare and Medicaid Services (CMS) began to publicly report the results of its National Quality Forum (NQF)–approved 30-day mortality measures for patients admitted with heart failure and acute myocardial infarction (AMI), respectively. In 2008, a 30-day mortality measure for pneumonia was added, and in 2009, 30-day readmission data for these conditions were added.19 Additional organizations, such as the Society for Thoracic Surgeons, have NQF-approved mortality measures. These measures are understood as complementing the process measures because they provide a broader perspective on the quality of care provided.20
The Task Force on Performance Measures recognizes significant strengths and limitations in the use of outcomes as performance measures. First, there is no debate as to the importance of clinically meaningful outcomes, such as mortality and health status.21 It therefore follows that the most interpretable and potentially important performance measures to patients are outcomes measures. Potential limitations include the fact that some patients are more likely to have adverse outcomes regardless of the quality of care received, and that the system should encourage, not discourage, care for such high-risk patients. One method for preventing negative consequences from treating the sickest patients is the use of risk adjustment to “level the playing field.” But even the best risk-adjustment models can explain only a modest proportion of the observed variation in outcomes. This may be a limitation if other unmeasured patient characteristics account for differences in outcomes or a strength if the unmeasured variance is due to differences in quality. In addition, for many outcomes, no well-validated risk-adjustment models exist.
The Task Force on Performance Measures recognizes that debate continues regarding the inclusion of specific risk-adjustment factors. A particularly controversial issue is whether race and socioeconomic status should be included in models. Some experts argue that these variables can carry important prognostic information that can improve the performance of the models and perhaps serve as surrogates for chronic diseases or poorly managed conditions before hospital admission. Others argue that these characteristics may be associated with the quality of care and that their inclusion may “adjust away” quality differences among providers, making it possible that those caring for vulnerable populations who perform poorly will not be identified. Finally, the clinical interpretation of models that adjust for race or sex would suggest that different outcomes for blacks or women are acceptable and could undermine the goal of equity in US health care.7 Consequently, the CMS measures do not adjust for race because of the concern of creating different standards of care by the use of these variables.
In 2006, the AHA published a consensus statement with the endorsement of the ACCF that articulated the key attributes of outcomes measures suitable for public reporting.22 In developing the statement, the writing group, which included clinicians, quality experts, a statistician, and policymakers, identified the following 7 preferred attributes:
1) clear and explicit definition of an appropriate patient sample
2) clinical coherence of model variables
3) sufficiently high-quality and timely data
4) designation of an appropriate reference time before which covariates are derived and after which outcomes are measured
5) use of an appropriate outcome and a standardized period of outcome assessment
6) application of an analytic approach that takes into account the multilevel organization of data
7) disclosure of the methods used to compare outcomes, including disclosure of performance of risk-adjustment methodology in derivation and validation samples
The Task Force on Performance Measures supports these standards in developing valid and useful outcomes-based performance measures, although several clear challenges exist in their application.
An example of such a challenge is in the use of outcomes to evaluate the quality of coronary revascularization. Current risk-adjustment methods are useful for comparing one provider's performance against all other studied providers who have performed that procedure. But these same risk-adjusted results may not be appropriate for comparing one hospital with another, leading to errors in interpretation. Current outcomes-based mortality performance measures exist for both bypass surgery and percutaneous coronary intervention (PCI). Because these treatments are handled separately, it is possible that 2 institutions with identical patient populations and performances may look very different. Online Appendix E outlines such a potential scenario for both periprocedural mortality outcomes and efficiency when bypass surgery and PCI are examined independently or together. To facilitate these comparisons between hospitals, it would be more appropriate to redefine the population of analyzed patients as those with significant obstructive coronary artery disease rather than create one stratum for those undergoing PCI and another for those being treated with bypass surgery. Consequently, the writing committee favors, wherever possible, using a clinical condition and state rather than a procedure as the basis for applying an outcomes-based performance measure. Nevertheless, all of the domains articulated by this report are similarly complex, and transparency by writing committees is needed to support and promulgate outcomes-based performance measures, a clear priority for performance measurement development.
Another outcome measure relates to patient health status. Several performance measurement sets include the assessment of patients' symptoms and function—a process—as measures of healthcare quality meeting the dimension of care associated with serial monitoring of patients.23,24 The results of these assessments, although not currently reported, would be a clinically important outcome measure and could provide quantitative information on the variability in symptom control and quality of life of outpatients with coronary disease or heart failure. A recent national study of primary care clinics in Australia examined the proportion of each clinic's patients with coronary disease who had weekly or more frequent episodes of angina. The results showed that although 14% of clinics had no patients with weekly angina, in 18% of clinics, more than half of patients had weekly angina (weekly episodes of angina across the 207 clinics ranged from 0% to 100%).25 However, until robust risk-adjustment models are developed, these outcomes are likely better used as tools for quality improvement than accountability.
The use of outcomes measures as indicators of quality are currently best understood as tools to assist hospitals and healthcare professionals to understand their performance. Because not all adverse events represent a failure of quality, given that some events cannot be averted even with the highest-quality care, the goal of outcomes-based assessments is to show relative differences in performance across delivery systems so that those with the worst performance may reflect on ways to improve care or to learn from those with the best performance.
3.3. Numbers of Measures
The proliferation of agencies and entities developing performance measures, often with different methodological rigor, goals, and perspectives, is creating an unmanageable burden for providers that threatens to undermine the stated goal of performance measurement: to improve the quality of health care. Not only do professional organizations, such as the ACCF, the AHA, and the PCPI, create performance measures within a given disease, but payers (eg, CMS, United Healthcare, Blue Cross/Blue Shield) and other accrediting bodies (eg, the National Committee for Quality Assurance [NCQA]) also create unique measures for the same condition(s). Consequently, even when the same process of care is being evaluated, subtle differences in measure specifications can lead to a marked administrative burden in properly providing the requisite data, as well as differences in results. It is therefore possible that different assessors examining the same patients may reach different conclusions about a provider's performance. Because these differences can be attributable to the method of assessment rather than the quality of care provided, they have the potential to undermine trust and confidence in the system and can impair the capability of performance measurements to be used to improve care. The Task Force on Performance Measures strongly supports the need to attain national consensus on a limited number of measures that are universally accepted by all who are interested in performance assessment. Toward that end, the task force has been actively engaged with other professional organizations (eg, the PCPI) and payers (eg, the Joint Commission, CMS, NCQA) to achieve consensus on definitions of these measures. Yet the different perspectives of these different bodies sometimes make reconciliation difficult. These differences are being negotiated to achieve balance between the available clinical evidence, the need to have clinical rather than administrative data, and feasibility. By using the same measures with the same definitions and a reasonable number of requisite data elements, consistent collection and benchmarking is likely to be far more feasible.
The Task Force on Performance Measures recommends 2 additional strategies to limit the number of measures. First, the NQF has emerged as a national clearinghouse for vetting and approving measures. On the one hand, this provides a valuable validation of the methodology used to create measures and should theoretically elevate the quality of a performance measurement set. On the other hand, it also accepts measures from multiple different entities and has the potential to include measures that are built from data systems (eg, administrative data, proprietary date ranges surrounding a seminal event) that are opaque to clinicians, create administrative costs to understand and contest, and are not sufficiently actionable so that they cannot be used for quality improvement, the ultimate purpose of any quality-assessment program. Over time, a vetting process such as that provided by the NQF should limit the number of measures to only those that have the greatest potential to achieve the goal of improving healthcare quality. Currently, CMS is reporting only measures that have been endorsed by the NQF.
The other promising approach is the creation of measure sets for a given disease and rotation of selected measures over time. By selecting a subset of measures to be used, a much more practical data collection effort that can be more easily accomplished by a larger number of practices and institutions can be undertaken and linked to meaningful quality improvement. The national Door-to-Balloon (D2B) Alliance, which reduced delays in performance of primary PCI, is a notable example of this approach.26,27 What is needed is a national consensus on which measures should be used over what period of time. Ideally such a decision-making body would include payers (for their pay-for-performance programs), regulators (for accreditation and public reporting), methodologists, and clinician representatives of those who care for patients with the disease under assessment. Over time, some measures would be retired and others introduced. Ideally, when the subset of measures to be selected is defined, measures from all of the multiple dimensions of care would be chosen so that a more comprehensive assessment of quality health care could be attained. Those measures that are not actively being used to quantify performance, either because there is inadequate variability in care, difficulties in collection, or insufficient data to support their elevation as performance measures could still be used as quality metrics. A recent statement by the ACCF and AHA delineates the differences between these 2 types of measures.28
3.4. Modification and Retirement of Measures
To date there has been a strong push to expand the number and diversity of the performance measures portfolio to provide a more complete assessment of quality. However, there is also a need to periodically reconsider whether previously established performance measures should be modified or retired. This can occur for several reasons. First, new scientific evidence may come to light that changes the previous consensus views regarding a measure. An example of such a change is the use of early beta-blocker therapy for patients with AMI.25 This performance measure was originally based on older trials that found that acute beta-blocker therapy reduced postinfarction angina, arrhythmias, and reinfarction risks. More contemporary trials, however, that found acute beta-blocker therapy had no net impact on mortality. Although such therapy reduced deaths from arrhythmias, it also increased risks for cardiogenic shock in certain subpopulations.29 On the basis of changes in guideline recommendations, the ST-Elevation and Non–ST-Elevation Myocardial Infarction Performance Measures Writing Committee determined that early beta-blocker therapy should be dropped from the measure set, citing the complexity required to distinguish patients who benefit from this therapy from those who may be harmed.25 Similarly, because cigarette smoking is known to have a detrimental impact on cardiovascular health and there is evidence that high-intensity behavioral and pharmacological therapies can help patients quit,30 smoking cessation counseling was developed as a performance measure for several conditions.24,25 But recent studies found a striking discordance between hospital performance on this measure and the rates at which patients actually quit smoking after myocardial infarction (MI).31,32 These data suggest the need for reevaluation of the smoking cessation measure.
A second reason for modifying or dropping a performance measure is that collection of the data necessary to calculate the measure is prohibitively complex or expensive. In some cases these issues can be corrected with clearer instructions, more training, or minor changes in the numerator or denominator of the measure.
A third reason for considering revision or retirement of a measure is if its use has unintended adverse consequences. An example of this was recently raised regarding a performance measure to give intravenous antibiotics for community-acquired pneumonia within 4 hours of diagnosis. Although rapid administration of antibiotics is beneficial for patients with pneumonia, the metric has been criticized because it may pressure clinicians to administer antibiotics despite diagnostic uncertainty and may lead to overtreatment.33 Similar concerns have been raised that the current ACCF/AHA performance measure for D2B within 90 minutes may lead to an increase in false-positive activation of cardiac catheterization for patients with suspected ST-elevation MI.34 Although, in this case, the net benefits of the D2B measure likely outweigh the risks, such examples highlight the need to carefully study the real-world impact of performance measures on provider care and patient outcomes to minimize unintended consequences.35
A final reason for retiring a measure is when there is limited to no room for further improvement in performance and clinical practice reaches near-perfection. Currently, several MI performance measures are achieving asymptotic “ceilings” of performance, including aspirin at arrival and discharge, as well as beta-blocker use at discharge for patients with AMI.36 This achievement of near-perfection in performance should be seen as a celebration for the field and a mark of success of the performance measure and quality-improvement cycle. Yet, ever conscious of the burden of data collection on the provider, some have argued for consideration of retiring these metrics.37 Retirement of a performance measure because of its success, however, should also be carefully monitored, because there is a risk that ending active measurement may lead to provider complacency and ultimately a regression in performance. As noted in Section 3.3, recycling measures after a period of dormancy can both assess the sustainability of the original performance assessment effort and reinforce the need for this process of care.
4. New Insights Into the Implementation of Performance Measures
Although the initial publication addressing the methodology for measures creation and selection1 explicitly called for feasibility testing before endorsement of performance measures, this has rarely been done. The writing committee wanted to emphasize the importance of preliminary testing of proposed measures in local, regional, or national projects before application for purposes of accountability. Congruent with this perspective, the NQF has begun issuing only time-limited endorsements of proposed measures pending demonstration of their feasibility.38 A number of potential barriers exist that could render an otherwise valuable potential performance measure impractical to collect in clinical practice.
The burden of data collection has emerged as a primary challenge to implementation of performance measures. Not only do multiple performance measures often exist for a particular condition, but patients also have multiple diseases, so that it is not practical or possible to collect all measures for all patients. This situation is compounded, in particular, by the superiority of clinical data over administrative data one encounters when seeking to quantify the quality of health care.39 The concepts described in this report, including elevating the evidentiary threshold for endorsing a performance measure, simplifying the inclusion/exclusion criteria, limiting the number of measures, and retiring measures, may all lead to a more parsimonious, feasible measurement set for quality improvement and accountability.
A second critical aspect of collecting performance data is the integration of data collection through the process of providing care. To the extent that extra work is needed to provide the data required for performance assessment, the more unsustainable such a program will become. An important responsibility of performance measures writing committees is to consider how data elements can be acquired throughout the transactions of a clinical encounter without requiring the collection and recording of additional data at a clinical visit. The challenge for the Task Force on Performance Measures is to consider how multiple measurement sets for different conditions, the similarity of measure construction across diseases, and the totality of ACCF/AHA-approved measures might affect a clinical practice or institution.
Although EMRs would seem to offer a potential solution, this is not currently the case. Many systems are unable to export the collected data to other entities for performance assessment and improvement, have data definitions that are not congruent with those used by the developers of performance measures (eg, the ACCF/AHA data standards40) and may require “pop-ups” and other prompts that are increasingly ignored by practitioners frustrated by the perception that these aids are interfering with efficient patient care. It is important that the effort to create exporting functions from EMR systems in standard formats be accelerated so that those who use EMRs can more efficiently participate in quality assessment and improvement efforts. Although alternatives, such as the patient flow sheets proposed by the PCPI and prior ACCF/AHA performance measures writing committees, still have some potential to help in performance measurement, a range of strategies for data collection needs to be considered by writing committees. In addition, it would be valuable for experts in medical informatics to participate in such writing groups, given the unique perspective and knowledge required to convert clinical logic into code.
Beyond the challenges of data collection, other insights have emerged over the past several years, including the need to develop “windows” around timeframes for performance. For example, although it is reasonable to state that cholesterol levels should be assessed every year in a patient with chronic stable coronary artery disease,23 a patient assessed in the last week of December one year and the first week of January in the following year, 12½ months later, would not meet the measure. Even more challenging is the current requirement for a patient with atrial fibrillation to have an international normalized ratio (INR) measurement every month.41 Even if a patient had 10 to 12 INR assessments per year, which would generally be considered high-quality anticoagulation management, many of these assessments might be within the same month, whereas in other months there might be none. The increasing use of home INR monitoring42 further compounds the problem. Although there may be no solution to handling the example of serial cholesterol screening that falls just outside a reporting window, the case of atrial fibrillation might be better handled with a range of possible assessments over the entire reporting window (eg, ≥10 assessments within the reporting year) to minimize the challenges in accurately representing the quality of care being provided.
A final lesson learned from early experiences in performance measurement is the limited reproducibility of some measures. For example, measures that encourage counseling represent an important dimension of care, but the quality of delivering counseling is difficult to quantify and yet may have profound influence on the ability of the activity to achieve its desired outcomes. Also, as noted above, smoking cessation counseling at the time of an MI has long been endorsed as a performance measure,43 yet recent data suggest that there is no overall association between smoking cessation counseling and quit rates among smokers.32 In these studies, however, there was an association between the presence of an inpatient smoking cessation program with patients stopping smoking after discharge.31 Although not definitive, these findings suggest that the quality of counseling achieved may influence behavior and that failure to quantify or specify the quality of counseling efforts may lead to a measure that is not associated with a clinically meaningful outcome. From this experience, future performance measurement writing groups need to be confident that proposed performance measures can be adequately quantified so that the expected benefits from adherence to the measure can deliver the expected benefits in outcome.
5. New Insights Into the Analysis and Interpretation of Performance Measures
5.1. Composite Measures
Combining measures or indicators of performance is a relatively new consideration for assessing the quality of medical care.44,–,46 The proliferation of efforts to measure, publicly report, or reward healthcare providers has focused attention on the need to ensure that performance measures comprehensively represent the quality of care, including sampling from among the multiple dimensions of care identified in the original methods report.1 Composite quality of care measures are increasingly being developed and deployed. A composite measure is a single measure of a construct that is defined in terms of ≥2 individual measures. Although composite measures have many advantages, their construction, development, and validation require more attention than that needed for individual performance measures for a number of reasons.47,–,50
Several lessons have been learned with the construction of composite quality measures. First, standard psychometric properties of composites, such as reliability, accuracy, and predictive validity, may be difficult to demonstrate. In particular, there may be no universal standard for some composite measures. Consequently, other quality or health measures should be shown to be related to the composite. Individual measures that make up the composite performance measure should contribute unique information to the underlying construct but at the same time should not differ from the other components of the composite.
Second, the scoring methods used to create the composite measure deserve serious consideration. A scoring method is the rule used to combine the individual components of the composite. Common methods of combining individual components include all-or-none rules, where a success is declared only if all the individual components are met (conjunctive scoring); any rules, where a success is declared if any of the specified components are met (compensatory scoring); and empirically weighted rules, where a number is produced using the variability in the data to determine the weight of each specific component (factor analytically derived or item response theory derived). It is important to note, however, that these methods can lead to different conclusions.
Third, although missing data always pose a problem in any analysis, the extent and impact of missing data can be hidden depending on the scoring rule. Moreover, the scoring strategy may affect how missing data are handled. For all-or-none rules, if a single component of the composite is missing, then the composite is missing; however, for any rules, as long as one component is observed to have met success, the composite is observed. Strategies for handling missing data in the scoring rule must be transparent and valid.
Fourth, because some individual performance measures may be continuous (health status) and some may be binary (within-range blood pressure), statistically combining such measures requires some thought. Most applied researchers will try to solve this problem by converting all individual components into the same scale. Although this is an easy solution, it is associated with a loss of information.
To address these challenges, the ACCF/AHA Task Force on Performance Measures has developed a position statement on composite measures.51 In addition, several professional organizations have developed recommendations for the development of valid composite measures. The NQF has created a consensus report that has outlined a composite evaluation framework.52 Only those composite measurements aligned with these recommendations will be considered as potential performance measures. The Task Force on Performance Measures recommends that composite performance measures follow the criteria described by the ACCF/AHA 2010 Position Statement on Composite Measures for Healthcare Performance Assessment51 as outlined in Table 2.
The majority of patients who have cardiovascular disease have multiple comorbidities and hence often have multiple healthcare providers within a single system of care or among different systems of care. The complexity of measuring the quality of coordinated cardiovascular care across multiple healthcare professionals and multiple settings is compounded by the difficulty in establishing the appropriate individual, institution, or healthcare system to which to assign attribution or accountability. Although some aspects of care and care coordination are suitable for measurement at the level of the individual and appropriate accountability lies with the individual provider, others are more appropriate for measurement at the group, institutional, or system level.
The IOM has called for measurement approaches that foster shared accountability.7 In such measures, all members of the healthcare team(s) are held accountable for quality and efficiency of care. The IOM has identified gaps in current performance measurement sets, including too few measures of patient-centered care, too few focusing on more than a narrow time window, and too few with more than a narrow focus of accountability beyond individual provider actions.
The NQF has endorsed measures for efficiency of episodes of care across the continuum of care53 that focus on quality and efficiency of care as perceived by the patient rather than by the healthcare provider or institution. In this construct the framework for efficiency measurement addresses all levels of the healthcare system, including individual providers, provider organizations, and communities. From the patient's perspective, an episode of care is not a discrete encounter or hospitalization but a longitudinal experience that may last months to years or even an entire lifetime. For example, the patient's experience of an AMI does not begin and end with the D2B time but encompasses the full spectrum from onset of chest discomfort through activation of an emergency medical system, the hospital experience, discharge planning, and return to long-term outpatient care and rehabilitation. In some cases this experience also entails end-of-life planning and palliative care. There are multiple real and potential gaps in care related to the many transitions in this definition of an episode of care. For a hospital discharge, this would include, among others, medication reconciliation, transmission of the discharge record, timeliness of postdischarge tests and services, and patient understanding of the discharge plan and care needed. New longitudinal measures need to be developed to fill gaps in the episode-of-care framework related to transitions from inpatient to outpatient settings (and vice versa), transitions among health systems, changes in the plan of care, and transitions and hand-offs among multiple providers. Care coordination is essential because these transitions can be disconnected, uncoordinated, and unsafe. Assigning attribution in this framework is difficult and can only be accomplished by assuming that there is shared accountability for the quality of care provided across all providers, institutions, and systems involved in the episode of care.
The concept of an accountable care organization, based on the local delivery system (eg, a multispecialty group practice or hospital and extended professional staff) from which patients receive the majority of their care, rather than the individual practitioner, has been proposed as a first step in creating the sense of shared accountability.54,–,56 The patient-centered medical home also promotes shared accountability, because members of the team are equally responsible for satisfactory delivery of the care plan.57 When patients transition among different delivery systems, however, measures also need to aggregate care and care coordination across sites and over time to operationalize the concept of shared accountability.
Thus, the concept of shared accountability may be effective in well-organized systems of care that exist for some patients, such as those enrolled in the Veterans Affairs system, but would be associated with pitfalls and unintended consequences if applied at the provider or institutional level for the majority of patients who transition over time among providers and care sites. For example, length of stay will be prolonged if an institution transfers patients only to preferred nursing homes (and a bed is not available at that facility), and acute cardiac care will be diminished if an institution refuses transfer of patients from another institution considered to be of poorer quality. Conversely, current performance measures may exclude patients who are transferred from one institution (where a procedural complication may have occurred) to another institution, so that neither institution may have the adverse outcome attributed to their system. Some aspects of quality of care of acute coronary syndromes are related to the emergency preparedness of the community, such as 9-1-1 responsiveness and availability of rapid transfer between institutions, and this may be beyond the ability of the medical profession to control. Finally, there would also be unique methodological issues in implementing a system of shared attribution related to feasibility and determination of sample sizes needed for measurement. The Task Force on Performance Measures recommends that the concept of shared accountability undergo appropriate field testing before there is further consideration of implementation. However, it would be prudent for hospitals to examine and reengineer their processes of discharge planning, patient education/self-management, and communication with community physicians and nurses if they wish to improve hand-offs and decrease 30-day mortality and readmission rates. This could be an important first step toward shared accountability.
This update to the methodology of performance measure selection and creation seeks to clarify key challenges and opportunities to elevate the science of quality assessment and improvement. Experience since the publication of the initial methodology report has identified critical opportunities to improve the selection, construction, implementation, and interpretation of performance measures (Table 2). With respect to the selection of potential measures, there is a pressing need to elevate the transparency and rigor by which the evidence supporting a performance measure is synthesized, including a focus on clinically meaningful outcomes, and the need to express the costs, both incremental cost-effectiveness and overall societal costs, associated with a potential performance measure. With respect to the construction of performance measures, refinement of patient eligibility, considerations in the use of outcomes, and the number of measures and their retirement have all emerged as important opportunities to improve the process of performance measure creation. With respect to implementation of performance measures, challenges have emerged that require ever-greater scrutiny of the importance of potential performance measures to quality improvement and the need to create measures that can be feasibly collected. Finally, with respect to the analysis and interpretation of performance measures, careful attention to and testing of composite measures and the attribution of performance measures to appropriately accountable units need to be tested before implementation. Although these recommendations substantially increase the complexity and work involved in creating performance measures, the Task Force on Performance Measures believes that following these processes will elevate the consistency and quality of new measures and improve the processes of quality improvement so that patients and society may benefit from higher-quality cardiovascular care.
American College of Cardiology Foundation
John C. Lewin, MD, Chief Executive Officer
Charlene May, Senior Director, Clinical Policy and Documents
Melanie Shahriary, RN, BSN, Associate Director, Performance Measures and Data Standards
Jensen S. Chiu, MHA, Specialist, Clinical Performance Measures
Erin A. Barrett, MPS, Senior Specialist, Clinical Policy and Documents
American Heart Association
Nancy Brown, Chief Executive Officer
Rose Marie Robertson, MD, FACC, FAHA, FESC, Chief Science Officer
Gayle R. Whitman, PhD, RN, FAHA, FAAN, Senior Vice President, Office of Science Operations
Nereida Crawford, MPH, Science and Medicine Advisor
ACCF/AHA TASK FORCE ON PERFORMANCE MEASURES
Frederick A. Masoudi, MD, MSPH, FACC, FAHA, Chair; Elizabeth DeLong, PhD; John P. Erwin III, MD, FACC; David C. Goff, Jr, MD, PhD, FAHA; Kathleen Grady, PhD, RN, FAHA, FAAN; Lee A. Green, MD, MPH; Paul A. Heidenreich, MD, FACC; Kathy J. Jenkins, MD, MPH, FACC; Ann R. Loth, RN, MS, CNS; Eric D. Peterson, MD, MPH, FACC, FAHA; David M. Shahian, MD, FACC
This document was approved by the American College of Cardiology Foundation Board of Trustees in June 2010 and the American Heart Association Science Advisory and Coordinating Committee in June 2010.
The American Heart Association requests that this document be cited as follows: Spertus JA, Bonow RO, Chan P, Diamond GA, Drozda JP Jr, Kaul S, Krumholz HM, Masoudi FA, Normand S-LT, Peterson ED, Radford MJ, Rumsfeld JS. ACCF/AHA new insights into the methodology of performance measurement: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Performance Measures. Circulation. 2010;122:2091–2106.
This article has been copublished in the Journal of the American College of Cardiology.
The online-only Data Supplement is available at http://circ.ahajournals.org/cgi/content/full/CIR.0b013e3181f7d78c/DC1.
Copies: This document is available on the World Wide Web sites of the American College of Cardiology (www.cardiosource.org) and the American Heart Association (my.americanheart.org). A copy of the document is also available at http://www.americanheart.org/presenter.jhtml?identifier=3003999 by selecting either the “topic list” link or the “chronological list” link (No. KB-0075). To purchase additional reprints, call 843-216-2533 or e-mail .
Expert peer review of AHA Scientific Statements is conducted at the AHA National Center. For more on AHA statements and guidelines development, visit http://www.americanheart.org/presenter.jhtml?identifier=3023366.
Permissions: Multiple copies, modification, alteration, enhancement, and/or distribution of this document are not permitted without the express permission of the American Heart Association. Instructions for obtaining permission are located at http://www.americanheart.org/presenter.jhtml?identifier=4431. A link to the “Permission Request Form” appears on the right side of the page.
↵* Although the PCPI has placed economic reasons under System Reasons for exclusion because insurers can greatly influence both the tier and level of copayments that patients are required to pay, the writing committee thought that the choice to buy a medication is ultimately one made by the patient, and if the patient chose not to buy an expensive medication, it was appropriate to include this exclusion within the Patient Reasons category. Because the exclusion of a patient from the denominator of a performance measure is not influenced by the category of the exclusion, this should not alter current estimates of performance.
- © 2010 American Heart Association, Inc.
- Normand SL,
- McNeil BJ,
- Peterson LE,
- Palmer RH
- Krumholz HM,
- Keenan PS,
- Brush JE Jr.,
- et al
- Brush JE Jr.,
- Krumholz HM,
- Wright JS,
- et al
- Bufalino V,
- Peterson ED,
- Krumholz HM,
- et al
- Gibbons RJ,
- Smith S,
- Antman E,
- et al
Committee on Quality of Health Care in America, Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001.
ACCF/AHA Task Force on Practice Guidelines Methodology Manual and Policies From the ACCF/AHA Task Force on Practice Guidelines. Available at: http://assets.cardiosource.com/Methodology_Manual_for_ACC_AHA_Writing_Committees.pdf. Accessed June 25, 2010.
- Gold MR,
- Siegel JE,
- Russell LB,
- Weinstein MC
- Diamond GA,
- Kaul S
- Weintraub W,
- Cohen D
- Mark DB,
- Nelson CL,
- Anstrom KJ,
- et al
Specifications and categorization of measures exclusions: recommendations to Physician Consortium for Performance Improvement Work Group. Available at: http://www.ama-assn.org/ama1/pub/upload/mm-370/exclusions053008.pdf. Accessed June 25, 2010.
- Smedley B,
- Stith A,
- Nelson A
- Krumholz HM,
- Merrill AR,
- Schone EM,
- et al
- Krumholz HM,
- Normand SL,
- Spertus JA,
- et al
- Spertus JA
- Krumholz HM,
- Brindis RG,
- Brush JE,
- et al
American College of Cardiology, American Heart Association, Physician Consortium for Performance Improvement. Clinical performance measures: chronic stable coronary artery disease. Available at: http://www.ama-assn.org/ama1/pub/upload/mm/370/cadminisetjune06.pdf. Accessed June 25, 2010.
- Bonow RO,
- Bennett S,
- Casey DE Jr.,
- et al
- Krumholz HM,
- Anderson JL,
- Bachelder BL,
- et al
- Rigotti NA,
- Munafo MR,
- Murphy MF,
- Stead LF
American Heart Association. Heart Disease and Stroke Statistics: Our Guide to Current Statistics and the Supplement to Our Heart and Stroke Facts. 2008 Update At-a-Glance. Available at: http://www.americanheart.org/downloadable/heart/1200078608862HS_Stats%202008.final.pdf. Accessed June 25, 2010.
National Quality Forum. Measure Evaluation Criteria. August 2008. Available at: http://www.qualityforum.org/uploadedFiles/Quality_Forum/Measuring_Performance/tbEvalCriteria2008-08-28Final.pdf. Accessed June 25, 2010.
- Estes NA III.,
- Halperin JL,
- Calkins H,
- et al
- Jacobson AK
- Marciniak TA,
- Mosedale L,
- Ellerbeck EF
- Kaplan S,
- Normand SL
- Normand SL,
- Wolf RE,
- McNeil BJ
- Peterson ED,
- DeLong ER,
- Masoudi FA,
- et al
National Quality Forum. Composite measure evaluation framework and national voluntary consensus standards for mortality and safety—composite measures. 2009. Available at: http://qualityforum.org/Publications/Composite_Measures.aspx. Accessed July 10, 2010.
National Quality Forum. National voluntary consensus standards for hospital care: outcomes and efficiency phase II. Available at: http://www.qualityforum.org/projects/hospital_outcomes-and-efficiency_II.aspx. Accessed July 10, 2010.
- Fisher ES,
- Staiger DO,
- Bynum JP,
- Gottlieb DJ
- O'Kane M,
- Corrigan J,
- Foote SM,
- et al
American College of Physicians. Patient-centered medical home. Available at: http://www.acponline.org/running_practice/pcmh/. Accessed July 10, 2010.
- 2. New Insights Into the Selection of Possible Performance Measures
- 3. New Insights Into the Construction of Performance Measures
- 4. New Insights Into the Implementation of Performance Measures
- 5. New Insights Into the Analysis and Interpretation of Performance Measures
- 6. Conclusion
- Figures & Tables
- Supplemental Materials
- Info & Metrics