 |
When interpreting a test result, the clinician converts a pre-test or prior probability of disease into the appropriate post-test or posterior probability. Clinicians must use their best judgment—from all clinical data available—to assign reasonable estimates of pre-test probabilities. To illustrate this process in a clinical context, consider a 25-yr-old woman who presents with dysuria. Assume that the clinician thought the probability of UTI was moderately low (30%) based on the patient's history and physical examination. Before sending the patient's urine for microscopic examination of the sediment and for culture, the clinician does a dipstick leukocyte esterase test, which has a positive result. To determine the post-test probability of UTI, the clinician must know the sensitivity (71%) and specificity (85%) of the test.
Fig. 1: Clinical Decision Making: Interpretation of leukocyte esterase test result in a woman with a 30% prior probability of a UTI, simulating a cohort of 100,000 identical women. considers 100,000 women similar to this patient, of whom 30% (30,000) have UTI and 70% (70,000) do not. Test results will be positive in 21,300 (71%, the test's sensitivity) of the women with UTI and in 10,500 (15%, the test's false-positive rate) of the women without UTI. Of the 31,800 women from the original cohort with positive test results (whether true positive or false positive), 21,300 (67%) actually have UTI. Thus, the posterior or revised probability of UTI after a positive leukocyte esterase test result is 67%, making the diagnosis more likely than not but not certain. The positive result has not provided certainty but has made the diagnosis far more likely.
What if the test result had been negative? Of the 68,200 women with negative test results (whether true negative or false negative), 8,700 (13%) will actually have UTI. Thus, the posterior or revised probability of UTI after a negative leukocyte esterase test result is 13%, making the diagnosis less likely but still possible.
These calculations are shown in Table 1: Clinical Decision Making: Interpretation of a Leukocyte Esterase Test Result in a Woman with a High (77%) Prior Probability of UTI . To illustrate how this table can be used to revise probabilities, consider a 2nd woman with dysuria and frequency, but no vaginal discharge or irritation, who has had frequent UTIs in the past; assume that her prior probability of UTI is high, say 77%. The upper half of Table 1: Clinical Decision Making: Interpretation of a Leukocyte Esterase Test Result in a Woman with a High (77%) Prior Probability of UTI interprets a positive leukocyte esterase test result in this woman; the lower half interprets a negative test result. Although the sensitivity (71%) and specificity (85%) of the test are unchanged from the values in the first example, a positive test result increases the probability of UTI to 94%, or almost certain, and a negative result decreases it to only 54%, or just a bit more likely than not.
The process of using the pre-test probability of disease and the test characteristics to calculate the post-test probability is called Bayes' theorem or bayesian revision. Bayes' theorem can be expressed as an equation, but a flowchart (see Fig. 1: Clinical Decision Making: Interpretation of leukocyte esterase test result in a woman with a 30% prior probability of a UTI, simulating a cohort of 100,000 identical women. ) or tabular approach (see Table 1: Clinical Decision Making: Interpretation of a Leukocyte Esterase Test Result in a Woman with a High (77%) Prior Probability of UTI ) is easier to use and less subject to error.
When several tests must be interpreted, Bayes' theorem can be applied sequentially by using the posterior probability of one test as the prior probability for the next test. The conditional probability used for interpreting subsequent test results must be based on the gold standard for diagnosis and on the observed results of the preceding test. When such data are unavailable, the results of the various tests are often assumed to be conditionally independent (ie, the likelihood of a given test result depends on the gold standard alone rather than on the gold standard and results of the other test), and the performance characteristics of the 2nd test are assumed to depend only on the diagnosis.
Odds-Likelihood
Formulation
The examples discussed thus far assume a simple scenario of a disease being present or absent and one test result being positive or negative. The odds of disease (Ω) are simply the probability of the disease being present divided by the probability of the disease being absent. For example, if the probability of the disease is 0.3, then its odds (Ω) are 0.3/0.7 or 0.43. Furthermore, the likelihood ratio (LR) of the observed test result can be defined as the ratio of the likelihood of that result among patients with disease to the likelihood among patients without disease. That is, the LR for a positive result (LRpos) equals the true-positive rate divided by the false-positive rate. Similarly, the LR for a negative result (LRneg) equals the false-negative rate divided by the true-negative rate. For example, if the sensitivity and specificity are 71% and 85%, as above, then LRpos equals 0.71/0.15 or 4.73 and LRneg equals 0.29/0.85 or 0.34.
The odds-likelihood formulation of Bayes' theorem states the posterior odds of disease are the product of the pre-test or prior odds and the appropriate LR [(Ωpost) = Ωpre × LR]. For the first woman, Ωpre is 0.43. If the test result is positive, Ωpost would be 0.43 × 4.73, or 2.03. The corresponding probability is Ω/(Ω + 1), which is 2.03/3.03 or 67%, as shown above. The odds-likelihood formulation of Bayes' theorem is then simply the product of the prior odds times the LR.
This formulation of Bayes' theorem provides some interesting intuitive principles. An LR > 1.0 increases the post-test probability of disease; the greater the LR, the more information a positive test result provides. An LR < 1.0 decreases the post-test probability of disease; the smaller the LR, the more information a negative test result provides. Test results with LRs of 1.0 carry no information and do not affect the post-test probability of disease. Thus, LRs are convenient for comparing tests.
Often there are different tests available to screen for the same disease. Whether a test that has higher sensitivity or specificity is used depends on the consequences of a false-positive or false-negative test result as well as the pre-test probability of disease. For example, in testing for a serious disease for which an efficacious treatment is available (eg, coronary heart disease), one would be willing to tolerate more false positives (lower specificity) than false negatives (lower sensitivity). Among populations with a higher prevalence of disease, the positive predictive value of a screening test will increase; as prevalence decreases, the post-test or posterior probability of a positive result decreases. Therefore, when screening for disease in high-risk populations, tests with a higher sensitivity are preferred over those with a higher specificity because they are better at ruling out disease (fewer false negatives). On the other hand, in low-risk populations or for diseases for which therapy has lower benefit or higher risk, tests with a higher specificity are preferred.
Defining
a Positive Test Result
For Bayes' theorem (or even the terms “sensitivity” and “specificity”) to be used, every possible result of a diagnostic test must be classified as positive or negative. When a test is not inherently positive or negative, the laboratory (or whoever describes the test's performance) establishes a criterion for positivity such that all results beyond that criterion are defined as positive and vice versa. Two overlapping distributions of results are transected by a criterion or cutoff line (see Fig. 2: Clinical Decision Making: Distributions of test results. ). The region beneath the distribution of results for patients with disease that lies above (to the right of) the criterion corresponds to the test's true-positive rate (ie, its sensitivity); the region that lies below (to the left of) the criterion corresponds to the false-negative rate. For the distribution of results for patients without disease, these 2 regions correspond to the false-positive rate and the true-negative rate (ie, its specificity), respectively. For the overlapping portion of the 2 distributions (eg, patients with and without disease), changing the criterion or cutoff line affects sensitivity and specificity but in opposite directions. Sensitivity and specificity cannot both be improved by changing the cutoff line, but rather only by improving the discrimination of the test itself.
|
Fig. 2
|
 |  |  |
|
Distributions of test results.
|
 |
|
Patients with disease are shown in the upper distribution; patients without disease are shown in the lower distribution. The relationship between a test's true-positive and false-positive rates (for different criteria or cutoff points) can be displayed as a receiver-operator characteristic (ROC) curve. The area beneath such a curve corresponds to the discriminatory power of the test.
|
|
Multiple Diagnostic
Possibilities
Bayes' theorem can also help interpret clinical information in situations in which more than 2 diagnostic possibilities (ie, disease present and disease absent) exist. Reasoning explicitly is increasingly important when the diagnostic task becomes more complex. The only requirements for bayesian interpretation are that all diagnostic possibilities be considered and assigned a prior probability and that all be mutually exclusive (ie, only one of the listed possibilities can be present). Combinations must be explicitly listed. For example, in a woman with dysuria who may have UTI, vaginitis, or both, the mutually exclusive diagnostic possibilities are “UTI,” “vaginitis,” “UTI and vaginitis,” and “neither.”
The flowchart or tabular form of Bayes' theorem can easily accommodate more than 2 diagnostic possibilities. For the flowchart, the first tier of mutually exclusive diagnoses can be expanded to 3 or more branches. The next tier can be expanded to include each possible result and the number of patients in each diagnostic category with each possible test result. For the tabular approach, a new row is added for each additional mutually exclusive diagnosis considered, and a section is added for each possible test result.
For example, cardiac troponin I levels can help evaluate a 59-yr-old man with a history of diabetes and hypertension who presents to the emergency room with new-onset chest pain that occurred at rest 5 h ago. ECGs show no ST-segment elevations or Q waves; T waves are inverted anteriorly. Diagnostic possibilities include a non–Q-wave MI, unstable angina, and noncardiac disease. A serum cardiac troponin I level can help differentiate these diagnoses. Very high cardiac troponin I levels occur most often in patients with MI, intermediate levels occur in patients with unstable angina, and very low levels occur in patients with noncardiac disease. Mathematical analysis improves understanding of the implications of a test result. Assume that the conditional probabilities are those shown in
Table 2: Clinical Decision Making: Cardiac Troponin I Levels in Acute Ischemic Heart Disease . For each diagnosis, the sum of the conditional probabilities is 100% because all possible results are listed.
|
Table 2
|
 |  |  |
|
Cardiac Troponin I Levels
in Acute
Ischemic Heart Disease
|
|
|
|
Probability
|
|
|
|
Diagnosis
|
cTnI < 0.4 ng/ml (%)
|
cTnI = 0.4–2.5 ng/ml (%)
|
cTnI > 2.5 ng/ml (%)
|
Total (%)
|
|
Non–Q-wave MI
|
25
|
40
|
35
|
100
|
|
Unstable angina
|
40
|
55
|
5
|
100
|
|
Noncardiac disease
|
96
|
3.9
|
0.1
|
100
|
|
cTnI = cardiac troponin level.
|
|
After clinical evaluation (ie, history, physical examination, ECG), assume that the probability is 25% for non–Q-wave MI, 70% for unstable angina, and 5% for noncardiac disease. Now consider 3 different cardiac troponin I results: < 0.4 ng/mL in the 1st case, 1.0 ng/mL in the 2nd case, and 3.0 ng/mL in the 3rd case.
Table 3: Clinical Decision Making: Interpretation of Cardiac Troponin I Results Using Bayes' Theorem shows how Bayes' theorem is used to interpret these findings. A low cardiac troponin I level decreases the likelihood of a non–Q-wave MI, slightly increases the likelihood of unstable angina, and substantially increases the likelihood of noncardiac disease. An intermediate level modestly decreases the likelihood of MI and increases the likelihood of unstable angina, whereas the likelihood of noncardiac disease drops sharply. A high level increases the likelihood of MI and virtually excludes noncardiac disease.
|
Table 3
|
 |  |  |
|
Interpretation of Cardiac
Troponin I Results
Using Bayes' Theorem
|
|
A Diagnosis
|
B Prior Probability (%)
|
C Conditional Probability of Result (%)
|
D Product (B × C)
|
E Posterior Probability (%)∗
|
|
cTnI < 0.4 ng/ml
|
|
Non–Q-wave MI
|
25
|
25.0
|
|
625.0
|
16.0
|
|
Unstable angina
|
70
|
40.0
|
|
2800.0
|
71.7
|
|
Noncardiac disease
|
5
|
96.0
|
|
480.0
|
12.3
|
|
Total = 3905.0
|
|
cTnI = 1.0 ng/ml
|
|
Non–Q-wave MI
|
25
|
40.0
|
|
1000.0
|
20.5
|
|
Unstable angina
|
70
|
55.0
|
|
3850.0
|
79.1
|
|
Noncardiac disease
|
5
|
3.9
|
|
19.5
|
0.4
|
|
Total = 4869.5
|
|
cTnI = 3.0 ng/ml
|
|
Non–Q-wave MI
|
25
|
35.0
|
|
875.0
|
71.40
|
|
Unstable angina
|
70
|
5.0
|
|
350.0
|
28.56
|
|
Noncardiac disease
|
5
|
0.1
|
|
0.5
|
0.04
|
|
Total = 1225.5
|
|
cTnI = cardiac troponin I level.
|
|
*The row-wise product divided by the total of column D.
|
|
Last full review/revision November 2005
Content last modified November 2005
|  |