Predicting Outcome of Defibrillation by Spectral Characterization and Nonparametric Classification of Ventricular Fibrillation in Patients With Out-of-Hospital Cardiac Arrest
Background—In 156 patients with out-of-hospital cardiac arrest of cardiac cause, we analyzed the ability of 4 spectral features of ventricular fibrillation before a total of 868 shocks to discriminate or not between segments that correspond to return of spontaneous circulation (ROSC).
Methods and Results—Centroid frequency, peak power frequency, spectral flatness, and energy were studied. A second decorrelated feature set was generated with the coefficients of the principal component analysis transformation of the original feature set. Each feature set was split into training and testing sets for improved reliability in the evaluation of nonparametric classifiers for each possible feature combination. The combination of centroid frequency and peak power frequency achieved a mean±SD sensitivity of 92±2% and specificity of 27±2% in testing. The highest performing classifier corresponded to the combination of the 2 dominant decorrelated spectral features with sensitivity and specificity equal to 92±2% and 42±1% in testing or a positive predictive value of 0.15 and a negative predictive value of 0.98. Using the highest performing classifier, 328 of 781 shocks not leading to ROSC would have been avoided, whereas 7 of 87 shocks leading to ROSC would not have been administered.
Conclusions—The ECG contained information predictive of shock therapy. This could reduce the delivery of unsuccessful shocks and thereby the duration of unnecessary “hands-off” intervals during cardiopulmonary resuscitation. The low specificity and positive predictive value indicate that other features should be added to improve performance.
Although early defibrillation of ventricular fibrillation (VF) increases the survival rate after cardiac arrest,1 2 a recent study from Cobb et al3 indicates that some patients with VF might have a better chance of return of spontaneous circulation (ROSC) after a period with chest compressions and ventilation before the defibrillation attempt. The cardiopulmonary resuscitation (CPR)-induced myocardial perfusion can cause changes in the power spectrum of the VF with an attendant increase in the ROSC rate.4 5 6 Futile defibrillation attempts are in themselves detrimental, because tissue damage and postresuscitation myocardial dysfunction may be caused by the shock itself7 and by the lack of tissue perfusion from chest compressions during the shock period (analysis, charging, defibrillation, and outcome evaluation). It would therefore be important to know whether a shock will cause ROSC.
From 20% to 80% of defibrillation attempts in clinical studies are reported to cause the discontinuation of VF8 9 10 11 (the great variability depends on different time definitions of VF reoccurrence or different shock waveforms), but we found a frequency of only 10% ROSC after 883 individual shocks in 156 patients.11 This is similar to the results of previous reports.8 9 12
Brown et al12 reported that the combination of centroid frequency (CF) and peak power frequency (PPF) of the VF could predict ROSC with 100% sensitivity and 47.1% specificity. We question the reliability of their results due to both the study design and the small data set, with only 9 incidents of ROSC after 128 shocks in 55 patients. The reliability of their results could have been confirmed if the prognostic criteria had been defined from 1 data set (“training set”) and the sensitivity and specificity had been derived from a new data set (“testing set”) instead of both having been determined from the same data set.
We have therefore attempted to predict defibrillation outcome in human cardiac arrest by combining features of spectral characterization, splitting the data into training and testing sets, and using classifier generalization techniques in an attempt to increase the degree of expected reliability. One of the combinations studied was that reported by Brown et al.12
In the observational prospective study from Oslo,11 data were collected from the medical control module of the defibrillator (Heartstart 3000; Laerdal Medical) and the regular Utstein registration.13 All patients with out-of-hospital cardiac arrest of cardiac origin13 that occurred between February 19, 1996, and February 18, 1998, were included if the advanced cardiac life support (ACLS) attempt was documented on the medical control module. Approval for the study was obtained through the Regional Committee for Research Ethics, Health Region III (Norway), and the Norwegian Data Inspectorate.
The ECG segments before shock were grouped according to shock outcome. Outcome was defined as ROSC if a palpable pulse was present in the postshock period (independent of duration). The remainder of the shocks corresponded to No ROSC, including conversions to electromechanical dissociation, asystole, VF (VF starting >5 seconds after the shock), or nonreset shocks (VF starting <5 seconds after the shock). If the initial postshock rhythm was present for >10% of the duration of the interval, it was defined as the postshock rhythm. Otherwise, the next rhythm was considered. This was done through an automated procedure, which handled all except 15 shocks. These failures were caused by illogics in the annotation structure. Each shock was analyzed as an independent event because we wanted to predict the result of each shock independent of clinical variables that may affect outcome.12 Thus, we made no distinction between shocks with or without prior ACLS in the class division scheme (Table 1⇓). We evaluated whether ACLS affected the results with the best predicting feature by further dividing the ROSC and No ROSC groups into subgroups who did or did not receive ACLS before shock.
The prediction analysis was performed in 2 stages that apply pattern recognition methods.14 First, the ECG was spectrally characterized (feature extraction), and second, decision regions for shock outcome prediction were determined and evaluated.
The characterizing features were computed from the estimated power spectral density (PSD) of each ECG segment.4 5 6 12 15 16 17 18 where f is the frequency and x(n) denotes sample n in an ECG segment of length L.
We attempted to discriminate between preshock ECG segments that correspond to ROSC and No ROSC outcome by computing the following features from the ECG segment PSD estimates.
The CF, or median frequency,4 5 15 17 is given by where f1≤f≤fu; f1 and fu are the lower- and higher-frequency band limits, respectively. By varying these limits, we could study the effect of extracting features from different frequency bands.
PPF is given by
The spectral flatness measure (SFM)19 of the VF is given by: SFM attains a value between 0 (peaky) and 1 (flat).
Various time domain measurements of signal amplitude characteristics of VF have been investigated.6 12 16 20 21 22 In the present study, we investigated an alternative frequency band–limited energy measurement (ENRG): This enables direct analysis in a specific frequency band.
An alternative decorrelated feature set was generated by principal component analysis (PCA) transformation.14 The features were projected onto the eigenvectors that best represented the entire data set. Thus, the new decorrelated feature set is represented by the magnitudes of the projections along the eigenvectors. Before classification, a combination from either the original or the decorrelated feature set was placed into a feature vector, v.
In the classifier, each feature vector, v, was considered to belong to 1 of the K classes, ωi, i=1, … , K, which corresponded to the shock outcome rhythms (K=5). As shown in Table 1⇑, ω1 and ω2–5 corresponded to the ROSC and No ROSC groups, respectively. Decision regions for these defined classes from annotated data can be retrospectively calculated with classification theory. The class membership of new data are decided through prospective comparison with these decision regions.
K decision regions, Ri, i=1, … , K, are computed by assigning costs for the possible wrong decisions. Each cost, C(ωi, ω̂j), expresses the risk associated with classification of a pattern of the true class ωias belonging to the decided class ω̂j. A reject class, ωK+1, is added to handle ambiguous or out-of-range patterns. Each Ri is calculated by selecting the minimum component of the risk vector r=[r1 r2 … rK+1]T, where This corresponds to minimization of the expectation of the classifier risk.14 P(ωj‖v), j=1, … , K denotes the a posteriori probability function for class ωj, which is derived according to Bayes’ rule: P(ωi) and p(v‖ωi) denote the a priori probability and the class-specific probability density function (PDF) for class ωi, respectively.
The classifier performance characteristics are expressed by the sensitivity (probability of positive prediction of ROSC outcome) and specificity (probability of negative prediction of No ROSC outcome) given by and respectively. P(Rj‖ωi) expresses the proportion of true class ωi with the corresponding decision being ω̂j.
The decision regions were calculated iteratively with minimization of the object function so that the classifier would meet the desired performance criterion given by Psnsd(ωi). This is done by multiplying the costs, C(ωi, ω̂j), j≠i, by factor α. By setting i=1 (ω1 corresponding to ROSC), this allowed specification of a sensitivity for the recognition of ROSC outcome.
The underlying statistics can be estimated with classification theory.14 Multidimensional histograms were applied in which the feature space is divided into bins of equal volume, in which the PDF estimates are computed. Each feature set is normalized by dividing by the respective feature axis into nb equal-sized intervals in the range from the minimum to the maximum feature value. The PDF estimates in each histogram bin are then distributed by applying an elliptic gaussian kernel function, resulting in a smoother continuous estimate.14
Histogram bin resolution and kernel width are the 2 key parameters of the classifier. A small number of large bins provide low histogram resolution, whereas a large number of small bins provide high resolution. Each feature axis of the PDF is divided into nb intervals. Thus, if the feature dimension is D for a specific feature combination, the feature space is divided into nbD bins of equal volume. Smoothness is governed by the width of the kernel function. A narrow kernel function provides a high-resolution estimate with high variance, whereas a wide kernel function provides a smoother low-resolution estimate with low variance.
The concept of generality is important in the design of classifiers. The decision regions are calculated with a training set of feature vectors that represent the experience on which the classifier will base future decisions. Testing is done on an independent set. In a well-designed classifier, the testing performance should approach the training performance. Both the histogram bin resolution and the kernel width applied in the estimation affect generality.
Training and testing were conducted with a cross-validation technique.14 In each of S consecutive experiments, an (S−1)/S portion of the entire data set is used to train classifier number i. The remaining 1/S portion is kept out for testing. i is varied from 1 to S, thus producing S classifier performance results.
The ECG was sampled at 100 Hz with 8-bit resolution, and PSD was estimated from segment lengths L=400 zero padded to 512 samples. Three feature sets were extracted with frequency ranges (fl−fu Hz) of 0 to 50, 0 to 25, and 0 to 12.5 Hz. The spectral features produced in each of these experiments were vSFM, vENRG, vCF, and vPPF. The PCA transformation of these features gave the corresponding decorrelated feature set of vPCA1, vPCA2, vPCA3, and vPCA4. The ECG immediately before defibrillation was analyzed, and the measurements were grouped according to the postshock rhythm for classifier design (Table⇑ 1).
Classifiers were designed and tested with the use of all possible combinations of spectral features and decorrelated features (Table 2⇓).
The statistical functions were estimated with multidimensional histograms. Resolutions were adjusted according to setting nb equal to 4, 8, 16, 32, 64, and 128 bins. For each of these resolutions, the smoothness was varied by setting the variance of the gaussian kernel function, kw, equal to 0, 1, 5, 10, 15, and 20.
This combination of changing bin size resolution and smoothness enabled a search for the classifier that met the generality criterion, which we defined to be that the test sensitivities and specificities should approach the training sensitivities and specificities to within a 5% tolerance range. In Figure 1⇓, the training and testing specificities and sensitivities are shown as functions of resolution and kernel width. Training with high bin resolution and narrow kernel width generates a classifier with 100% performance in both sensitivity and specificity as the result of overtraining, as verified by the large deviation in sensitivity in test performance. Generality in sensitivity is achieved either by increasing the kernel width or by using lower bin resolution, with both resulting in lower specificity.
A full-scale evaluation with respect to generality of the classifiers corresponding to all possible feature combinations (Table⇑ 2) was performed.
Finally, for a given feature combination, the classifier with the best general performance was defined as that corresponding to the highest average test performance, lowest bin resolution, and narrowest kernel width, requiring that the training and test performances satisfied the generality criterion.
The comparisons between ROSC and No ROSC and between ACLS and No ACLS were tested with the Wilcoxon rank sum test and presented as median values (25th and 75th percentiles). P<0.05 was regarded as statistically significant. Classifier performance results are presented as the mean±SD of the cross-validated sensitivities and specificities.
The results for the parameters are summarized in Table 3⇓.
The test performance results of the classifiers that met the generality criterion are shown in Figure 2⇓.
The performances of the reference classifier [vCF vPPF], for comparison with earlier work, and of the highest performing classifier [vPCA1 vPCA2] are listed in Table 4⇓, and the class-specific PDFs with corresponding decision regions for these 2 classifiers are shown in Figure 3⇓. The highest performing classifier, [vPCA1 vPCA2], shows a clearer distinction between ROSC and No ROSC than the reference classifier [vCF vPPF], where there is more intermingling of the classes. The highest performing classifier was based on PCA decorrelation and dimension reduction to 2 features and achieved a sensitivity of 92±2% and a specificity of 42±1% in testing, or a positive predictive value of 0.15 and a negative predictive value of 0.98 (Table 5⇓).
The frequency ranges of 0 to 25 and 0 to 12.5 Hz were best suited for the discrimination of ROSC from No ROSC outcomes (Table 3⇑). Spectral flatness was the least suitable individual feature for all frequency ranges. Although the 3 other spectral features seem promising, the results of the decorrelated features indicate that only 2 features are significantly different when grouped according to ROSC and No ROSC outcome. This indicates that there is redundant information in the original feature set.
The highest performing single-feature spectral classifiers were CF and PPF in the low-frequency range. The combination of these 2 features did not improve the results. For the single decorrelated feature classifiers, the 2 principal classifiers gave the best results for all frequency ranges. The combination of decorrelated features improved the performance significantly when the 2 midfrequency-range principal features were combined. The inclusion of >2 decorrelated features did not further improve the performance.
Whether ACLS caused changes in the PCA1 feature is summarized in Table 6⇓. The No ACLS/No ROSC subgroup may be considered the starting point, where the initial shocks are futile, and is further divided into the following subgroups:
The ACLS/No ROSC subgroup, where treatment has been futile and the myocardial condition probably has deteriorated as reflected by a significant decrease in the feature values
The ACLS/ROSC subgroup, where treatment probably has caused an improvement in myocardial condition, which is reflected by a significant increase in the feature values comparable to that corresponding to the No ACLS/ROSC subgroup.
In this study of 868 shocks in 156 patients, it was possible to predict in part whether the shock resulted in ROSC or No ROSC by analyzing 4 spectral features of the preshock ECG with improved results by combining, decorrelating, and reducing the features.
We further demonstrated how classification methodology allows the combination of features with an increase in classifier performance compared with individual classification of features. We also showed how decorrelation by PCA allows dimensional reduction in the feature set with no decrease in performance compared with a combination of the complete feature set.
The rate of ROSC after individual shocks in patients is reported to be low8 9 11 12 ; the rate was 10% in a recent study from Oslo.11 Most shocks are thus individually futile. Based on the present results, 42% of the unsuccessful shocks (328 of 781) could have been avoided, and a period of chest compressions, ventilations, and vasoactive drugs could have been administered before a new defibrillation attempt was made. Studies in animals have shown that this may be favorable,23 and a recent study in humans indicated that this might improve the outcome.3 It would minimize the detriment of “hands-off” intervals, where the vital organs are without perfusion, which reduces the possibility of ROSC, recovery with intact neurological status, or both. The number of shocks should also be kept to a minimum, because repetitive shocks and total electric power are injurious to the already ischemic myocardium and increase the severity of postresuscitation myocardial dysfunction.7 Moreover, because the spectral characteristics of the VF have been reported to reflect myocardial perfusion,4 5 6 the defibrillator also might guide the CPR attempt, because the myocardial perfusion depends on compression force, rate, and duration.24 25 26 27
On the other hand, 7 shocks that resulted in a pulse-giving rhythm would not have been administered. These shocks presumably would have been administered later if CPR changed the characteristics of the VF. The effects of this could not be evaluated. The comparison of ACLS with No ACLS features illustrates this aspect of use of the features for online monitoring of the CPR efficiency. The use of the features as monitoring parameters for performance feedback during CPR is an interesting idea that is closely related to the prediction problem. Retrospectively, we studied the influence of ACLS on a single feature and demonstrated changes in values according to treatment. The present study demonstrates how a general classifier can be designed through cross-validation, which allows training and testing on independent data sets in combination with different resolutions and kernel widths in the estimation of the statistics that describe the features. This method gives an indication of how well the classifier will perform when challenged with new data in the future.
In a similar study of 128 shocks in 55 patients with only 9 successful shocks (defined as a conversion of VF to a supraventricular rhythm with a palpable pulse or blood pressure of any duration within 2 minutes of the shock without ongoing CPR), Brown et al12 extracted 4 parameters from the recorded ECG. The combination of CF and PPF gave the best predictive potential (sensitivity 100%, specificity 47.1%).12 The same combination of features gave a poorer predictive potential in the present study (sensitivity 92±2%, specificity 27±2%). We believe that the results of our generalized classifier are more realistic due to the larger database and the use of independent testing and generalization that were not done by Brown et al.12 Those authors generated the sensitivity and specificity with the same data from which the threshold values were computed, with no independent evaluation.
Noc et al6 reported in pigs that maximum and mean VF amplitude and dominant VF frequency were all acceptable shock outcome predictors. They derived the threshold values from 1 group and tested these in a separate validation group but had different results in the 2 groups, indicating that the results might not be reliable.6 Our results indicate that Brown et al12 would have experienced the same if their threshold values had been tested on new data.
Our method includes independent testing and generalization to avoid these problems. To ensure reliability, the data were split in 2. Training performance for ROSC and No ROSC prediction was computed from half of the data, whereas the other half was used to compute the corresponding test performance.
There are some limitations in the present study. First, the number of ROSC observations is low. Second, in the cross-validation processing of the data, the test performances were considered in the design of the classifiers to choose the generalizing parameters. Ideally, a final evaluation should have been performed on yet another data set that did not influence the design process. Third, we used only 1 type of classifier: the histogram method. To obtain even more reliable results, the experiments should be repeated with other types of classifiers.
Spectral characterization of VF can be of clinical importance if it can be incorporated into the software of defibrillators. We have demonstrated a method to develop an outcome predictor for defibrillation attempts in out-of-hospital cardiac arrest patients, although the sensitivity of 92±2% and specificity of 42±1% are not satisfactory for clinical use. Therefore, other features should also be investigated to add discriminative power to the feature set.
This work was supported in part by the Research Council of Norway, the Norwegian Air Ambulance, the Laerdal Foundation for Acute Medicine, and Anders Jahre’s Foundation.
- Received February 7, 2000.
- Revision received April 26, 2000.
- Accepted May 2, 2000.
- Copyright © 2000 by American Heart Association
Cobb LA, Fahrenbruch CE, et al. Influence of cardiopulmonary resuscitation prior to defibrillation in patients with out-of-hospital ventricular fibrillation. JAMA. 1999;28:1182–1188.
Xie J, Weil MH, Sun S, et al. High-energy defibrillation increases the severity of postresuscitation myocardial dysfunction. Circulation. 1997;96:683–688.
Brown CG, Dzwonczyk R. Signal analysis of the human electrocardiogram during ventricular fibrillation: frequency and amplitude parameters as predictors of successful shock. Ann Emerg Med. 1996;17:436–437.
Schurmann J. Pattern Classification: A Unified View of Statistical and Neural Approaches. New York, NY: John Wiley & Sons; 1996.
Stewart AJ, Allen JD, Adgey AAJ. Frequency analysis of ventricular fibrillation and resuscitation success. Q J Med. 1992;85:761–769.
Kay SM. Modern Spectral Estimation: Theory & Application. Englewood Cliffs, NJ: Prentice-Hall; 1988.
Jayant NS, Noll P. Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: Prentice-Hall; 1984.
Weaver WD, Cobb LA, Dennis D, et al. Amplitude of ventricular fibrillation waveform and outcome after cardiac arrest. Ann Intern Med. 1985;102:53–55.
Kuelz KW, Hsia P, Wise RM, et al. Integration of absolute ventricular fibrillation voltage correlates with successful defibrillation. IEEE Trans BME. 1994;41:782–791.
Niemann JT, Cairns CB, Sharma J, et al. Treatment of prolonged ventricular fibrillation: immediate countershock versus high-dose epinephrine and CPR preceding countershock. Circulation. 1992;85:281–287.
Maier GW, Tyson GS Jr, Olsen CO, et al. The physiology of external cardiac massage: high-impulse cardiopulmonary resuscitation. Circulation. 1984;70:86–101.
Halperin HR, Tsitlik JE, Guerci AD, et al. Determinants of blood flow to vital organs during cardiopulmonary resuscitation in dogs. Circulation. 1986;73:539–550.
Feneley MP, Maier GW, Kern KB, et al. Influence of compression rate on initial success of resuscitation and 24 hour survival after prolonged manual cardiopulmonary resuscitation in dogs. Circulation. 1988;77:240–250.