II. International Congress on Critical Care on the Internet –
CIMC 2000
Conference:
Evaluation of the Severity of Illness
Philipp G. H. Metnitz
Current address:
Departement
Réanimation Médicale,
Hôpital St. Louis, Université
Lariboisière-St. Louis,
1 Avenue Claude Vellefaux, 75010 Paris,
France.
Email: philipp.metnitz@univie.ac.at
Modern intensive care
faces increasingly medical, ethical and economical requirements. Quality management
has the tools to deal with these issues. Intensive
care treatment is, however, a complex process which is carried out in a very
heterogeneous population and is influenced by several variables such as
cultural background, different health care systems and others. It is therefore
extremely difficult to reduce the quality of intensive care to something
measurable, to determine it and to compare it between different institutions.
Although quality has
a variety of dimensions, the main interest today focuses on effectiveness and
efficiency: It is clear that other issues are less relevant if the care being
provided is either ineffective or harmful. Therefore, the priority must be to
evaluate effectiveness. The instrument available to measure effectiveness in intensive
care is outcome research. Starting point for this research was the high variability
in medical processes [[1]], which was found during the first part of the
20th century, when epidemiology was developing. The variation in medicine –
including the lacking standardization – led to the search for the “optimal”
therapy. Outcome research provided the methods to compare different groups of
patients and institutions. Risk adjustment is now used to standardize
different groups of patients, which allows to evaluate possible associations
between treatments and outcome.
The
assessment of the severity of illness through prognosis of a hospital mortality
is the method of choice for risk adjustment in intensive. Recent studies have shown that the performance of the prediction models
can, however, vary considerably when they are applied to populations, different
from the one they were developed from [[2],[3]]
For the application
of a general severity of illness score, its performance has in general to be
tested by means of discrimination and calibration. Discrimination refers hereby
to the model’s ability to distinguish between nonsurvivors and survivors, assigning
higher scores to patients who die. Calibration refers to the accuracy of the
prediction when the number of predicted and observed deaths is compared over
the range of severity of illness. Customization of a general severity score by
deriving a new logistic regression equation has been found useful when
calibration of general scoring systems is poor [3,[4]].
During the evaluation
of the ASDI Documentation Standard for Intensive Care Medicine, lack of
calibration of the SAPS II [[5]] in Austrian patients was found [[6]]. There are several possible
explanations for the found lack of calibration. First, SAPS II does not take
into account all the factors that are known to influence outcome. Second, our
results as well as other studies [2] demonstrated that the lack in the uniformity of fit
of the SAPS II was also attributable to factors that are included in the model.
Third, there exist a variety of factors (known and unknown) which are not included
in the SAPS II, but contribute to the phenomenon of unmeasured case mix.
For this reason, SAPS
II was calibrated, using "first level customization [[7]]. The so derived new prognostic
model SAPS II-AM (AM for Austrian model) has later been validated in a bigger cohort
of patients [[8]]. The increased prognostic
performance of the SAPS II AM-99 can be seen from the appropriate calibration
curves (Figure 1).
Severity of illness
data and O/E ratios are increasingly used by governmental and commercial
institutions to assess the clinical performance of ICUs [[9]]. According to a recent study which
suggested that such data might be useful to classify ICUs into different levels
of clinical and economic performance [[10]], different models of using these
data as a measure for effectiveness in Austrian ICUs were discussed by official
institutions. Although these models have not been instituted yet, it seemed important
to evaluate the performance of the statistical models used to generate these
data and to detect any potential confounders.
Analyzing Austrian
data we found, that customization changed both predicted hospital mortality and
O/E ratios to varying degrees [8]. Moreover, O/E ratios differed widely across
subgroups in reason for admission. It is inevitable that this behavior will
influence the overall O/E ratio of ICUs: An ICU with a high proportion of a specific
patient group could accordingly exhibit lower or higher O/E ratios. Table 1
shows the O/E ratios in categories of the reason for admission, admitted continuously
to 35 adult Austrian ICUs in 1998 (n=7851).
These results
demonstrate that today’s severity scoring systems, such as the SAPS II, are
limited by not measuring (and adjusting for) a profound part of what constitutes
case mix. Changes in the distribution of patient characteristics (known and unknown)
therefore influence prognostic accuracy. Using O/E ratios as a measure for
effectiveness (or other dimensions of quality), one has to be critical when
using these data.
Table
1. O/E
ratios in reason for admission categories.
O/E: Observed to expected mortality ratio; CI: confidence interval;
SAPS II: original SAPSII model, as described by Le Gall et al.; SAPS II
AM99: customized Austrian model; n=7851
Figure 1a und 1b. Calibration curves for the SAPS II and the SAPS II-AM-99.
(a) Original SAPS II model
(n=2901). (b) Austrian SAPS II model (SAPS II-AM-99), validation sample (n=1451). Columns: Number of patients. Squares: Mean predicted hospital
mortality. Circles: Mean observed
hospital mortality.x-axis: predicted risk of death; y-axis:
observed hospital mortality;
References
[[2]] Moreno R, Apolone G, Reis Miranda D (1998) Evaluation of the
uniformity of fit of general outcome prediction models. Intensive Care Med 24:
40–47.
[[3]] Moreno R, Apolone G (1997) Impact of different customization
strategies in the performance of a general severity score. Crit Care Med 25:
2001-2008
[[4]] Le Gall JR, Lemeshow St, Leleu G, Klar
J, Huillard J, Rue M, Teres D, Artigas A, for the Intensive Care Scoring Group
(1995) Customized probability models for early severe sepsis in adult intensive
care patients. JAMA 273: 644 – 650.
[[5]] Le Gall JR, Lemeshow St, Saulnier F. A new Simplified Acute Physiology
Score (SAPS II) based on a European/North American Multicentre Study. JAMA
1993; 270(24): 2957-2963.
[[6]] Metnitz PhGH, Vesely H, Valentin A, Popow C, Hiesmayr M, Lenz K, Krenn
CG, Steltzer H (1998) Evaluation of an Interdisciplinary Data Set for National
ICU Assessment. Crit-Care-Med 1999; 27(8): 1486-1491.
[[7]] Metnitz PGH, Valentin A, Vesely H, Alberti C, Lang Th, Lenz K,
Steltzer H, Hiesmayr M. Prognostic Performance and Customization of the SAPS
II: results of a multicenter Austrian study. Intensive-Care-Med 1999; 25(2):
192-197.
[[8]] Metnitz PhGH, Vesely H, Valentin A, Lang T, Le Gall JR. Ratios of
observed to expected mortality are affected by differences in case mix and
quality of care. Intensive Care Medicine 2000; 26:1466-1472.