Feature Articles

Published: January 1, 2000
Find more content on:
Dealing with discrepancy analysis Part 1: The problem of bias

By: Cheryl L. Hayden and Michael L. Feldstein

IVD Technology Magazine
IVDT Article Index

Originally Published January 2000

For IVD product developers, accurately demonstrating the performance of a new test can involve important scientific, statistical, and methodological issues.

Cheryl L. Hayden and Michael L. Feldstein

Product developers and clinical researchers who are evaluating the performance of a new IVD test usually do so by comparing results obtained with the new method to results obtained with an existing method. While this is a logical approach to determining the performance and utility of an IVD—and sometimes the only feasible approach—there are some methodo- logical problems inherent in such a practice.

Companies conduct comparative IVD product testing for a number of reasons. Such testing can be used to generate the data required for an FDA product approval submission. It can also be used to demonstrate the advantages of the new test over its competitors, including greater clinical sensitivity, specificity, or detection limits for the analyte of interest. Often, the information derived from such testing is critical to helping a company determine whether its new assay for a particular analyte has reasonable potential for favorable FDA review and eventual commercial success.

When product testing is performed to compare the accuracy of a new test to that of an existing test, any discrepancies between the results of the two tests will naturally raise a number of questions. The manufacturer of the product under evaluation needs to know the cause of such discrepant results, so that they can be accounted for in any representation of the performance of the new test. Specifically, the researchers will want to know whether the results from the new method are correct (true positives) and those from the existing method incorrect (false negatives); or the results from the new method incorrect (false positives) and those from the existing method correct (true negatives). Additionally, the manufacturer will want an opportunity to show that the new test is better than the existing method, if that is likely the case, and a way to gather more information to clearly demonstrate such superiority.

The researchers evaluating the test will typically resolve such questions by focusing their attention on the discrepant results, employing one of several approaches. They may reassay the group of patient samples that produced discordant results using both of the original test methods; they may reassay the discrepant samples using a third test method; or they may use additional clinical information about the patients who provided the discrepant samples to determine which of the test results is correct. The new test results gathered in these scenarios will be used in place of, or in addition to, the initial test data from the original test methods. Whichever of these approaches is used, this method of resolving discrepancies has the potential to introduce statistical bias into the researchers' analysis of the test data. This can occur because each of these approaches employs a selection process that provides additional information only from those samples that originally produced the discrepant results.

Despite such a possibility, the manufacturer of the investigational test may feel that there is scientific justification for using one or more of these discrepancy resolution approaches. This is commonly the case when data gathered during technology discovery and development have demonstrated that the technology of the new method is inherently more sensitive or specific than current methods for testing the analyte, as in the case of emerging DNA and RNA amplification tests. In this situation, the true performance of the test cannot be demonstrated without introducing some approach to evaluating discrepant samples further. To be successful, such an approach should avoid introducing bias while at the same time offering a technical or scientific basis for gathering more information about discrepant results, understanding which test method is correct, and fully describing the performance of the new test.

In this article, the authors describe the problem of comparative accuracy using hypothetical examples, and suggest some statistically and scientifically valid ways of approaching a resolution.

Sources of Discrepant Test Results

In the past, clinicians seeking to diagnose a viral infection relied on the method of growing the virus in tissue culture inoculated with a sample obtained from the patient. This method is considered to be 100% specific; that is, if viral growth is seen in the tissue culture, it is considered unequivocal evidence of a viral infection. However, this method of growing viruses in tissue culture is also notoriously insensitive. Depending on how difficult it is to grow a particular virus in vitro, the method can yield a high proportion of false negative results.

Within the past decade, the emergence of molecular diagnostic technologies has created a culture-free alternative method for diagnosing viral infections by making it possible to detect viral DNA and RNA in tissue or blood samples. One approach to performing such molecular testing involves hybridization of a DNA or RNA strand tagged with a fluorescent marker to the target viral DNA or RNA.

Molecular amplification methods are capable of producing many copies of the DNA or RNA hybrid generated in the test and are therefore exquisitely sensitive, yielding a positive result even in the presence of a very small number of viral particles. At the same time, however, such methods can suffer from a potential for yielding false positive results. Such false positives can be caused by environmental contamination or by cross-contamination occurring during performance of the assay.

When a patient sample is assayed using both the culture and the molecular methods, it is not uncommon for the test results to disagree with one another. In such cases, the tissue culture method may yield a negative result (no viral growth), while the hybridization method yields a positive result.

For the test manufacturer, such a disagreement creates a dilemma. Which test results are correct? Did the tissue culture method yield a false negative, or did the hybridization method yield a false positive? When this situation arises, researchers usually perform discrepancy analysis using one of the following approaches.

  • Reassay the original patient samples using both the culture and the molecular methods.
  • Obtain a new patient sample to be assayed using both methods.
  • Employ a third method to evaluate the presence of virus in the original sample.
  • Obtain information about symptoms or response to treatment from the patient's medical record.

There are drawbacks to each of these approaches. If it is decided to reassay the original patient sample (if any remains available), the researcher must deal with the probability that the sample no longer contains any viable organism. If a new sample is obtained, it will reflect the present condition of the subject, who may have undergone treatment in the interim. Moreover, using a new sample may provide new information, but it does not contribute to an understanding of the original discrepant results, which remain unresolved. A third method for determining the presence of the organism may not be available or, if it is, it may be subject to similar problems of sensitivity or specificity. Finally, clinical information may not be definitive for establishing a diagnosis for that particular infection or disease.

Regardless of the method used to resolve discrepant results, reanalysis of the data based solely on a corrected classification of the discordant samples can result in a biased outcome. Several recent articles have quantified the bias that can occur when using discrepancy analysis.1–5 For instance, Hagdu has pointed out that when a discrepancy occurs such as that commonly encountered with viral DNA hybridization methods (that is, when the increased sensitivity of the new assay causes it to yield seemingly false positive results), subsequent discrepancy analysis will always be biased in favor of the newer, more-sensitive method.1

When the new test has high sensitivity, the potential for bias may be very small simply because it is difficult to make a test that is already 98% sensitive even more sensitive. However, such a test may not be very specific, and the bias introduced by traditional discrepancy analysis may affect specificity as well.

DNA Hybridization Tissue Culture Positive Tissue Culture Negative Total
Positive 200 100 300
Negative 10 490 500
Total 210 590 800
Accuracy: [(200 + 490)/800] x 100 = 86.3%
Table I. Hypothetical evaluation to determine the accuracy of a viral DNA hybridization assay when compared with a tissue culture "gold standard," showing discrepant results for 110 samples. Accuracy is defined as the number of apparent positives (both assays positive in agreement) plus the number of apparent negatives (both assays negative in agreement), divided by the total number of results, and multiplied by 100 to express the results as a percentage.



 

Hypothetically Speaking

As an example, consider the results of a hypothetical evaluation using 800 patient samples to determine the accuracy of a viral DNA amplification assay when compared with a tissue culture "gold standard." Accuracy is defined as the number of apparent positives (both assays positive in agreement) plus the number of apparent negatives (both assays negative in agreement), divided by the total number of results, and multiplied by 100 to express the results as a percentage. In this example, all 800 patient samples were subjected to initial testing by both molecular and culture methods, yielding discrepant results for 110 samples (see Table I). Using these figures, the accuracy of the viral DNA hybridization assay compared with tissue culture can be calculated as follows:

[(200 + 490)/800] x 100 = 86.3%.

If a third method of testing is available, the 110 patient samples that yielded discrepant results can be subjected to further testing. Hypothetical results for such an analysis are shown in Table II.

Previous Results Third Assay Positive Third Assay Negative Total
Tissue culture negative,
DNA hybridization positive
75 25 100
Tissue culture positive,
DNA hybridization negative
10 0 10
Table II. Hypothetical results of retesting the 110 patient samples that yielded discrepant results (see Table I) using a third assay method.



 

The results of all three test methods can then be combined to form an adjusted evaluation of the accuracy of the viral DNA amplification test. In this example, viral infection can be considered confirmed when the result of the tissue culture test was positive, or when a sample with a negative tissue culture result yielded a positive result from the third method of testing. Using these figures, the adjusted accuracy of the DNA hybridization method, when compared with confirmed viral infection, is 96% (see Table III).

DNA Hybridization Viral Infection Positive Viral Infection Negative Total
Positive 275 25 300
Negative 10 490 500
Total 285 515 800
Accuracy: [(275 + 490)/800] x 100 = 95.6%
Table III. Hypothetical results of three assays conducted to evaluate the accuracy of a viral DNA hybridization test. Using these figures, the adjusted accuracy of the DNA hybridization method, when compared with confirmed viral infection, is 96%.



 

Although researchers commonly employ this approach to resolving discrepant results, there are several problems inherent in it. The first problem arises when results from the test under investigation are compared with those from another method. Using this approach, the results of the test under investigation are considered to be correct if they are the same as those from the comparison procedure, the so-called gold standard. In essence, this procedure defines the accuracy of the test under investigation according to its own results—an approach that has questionable scientific validity.

In the hypothetical example above, this approach to discrepancy analysis ignores the known insensitivity of the tissue culture method and makes potentially erroneous assumptions about the sensitivity of the molecular method, which is actually the parameter that the study is seeking to determine. For instance, it is certainly possible that some of the 590 samples that did not grow virus in the culture test were actually positive for the presence of virus. It is also possible that some of the 500 samples that gave a negative result by DNA hybridization were actually positive for the presence of virus. A change in either of these values would significantly alter the calculated accuracy of the molecular method.

The second problem with this approach to discrepancy resolution occurs when the results of the first two sets of assays (in the example, the culture method and the molecular method) are used to determine what samples should be reassayed using a third method. Since there can be no certainty about which of the earlier test results are actually correct, any selection of discrepant results for retesting must also be suspect. In short, it is not scientifically valid to perform a third, confirmatory assay on only those samples that yielded discordant results by the two original tests.

With these problems in mind, the only way to determine the real sensitivity of a test under investigation is to use the actual truth about a viral infection as the gold standard. Unfortunately, such actual truth is not always ascertainable, and often the best that can be done is to avoid introducing bias into the study.

Investigators can prevent the introduction of bias into such discrepancy resolution studies in one of two ways. First, they can allow the original test data to stand uncorrected. Second, they can perform the third assay on all of the samples, using the results of that assay to define the actual truth about viral infection.

Previous Results Third Assay Positive Third Assay Negative Total
Tissue culture positive,
DNA hybridization positive
200 0 200
Tissue culture negative,
DNA hybridization negative
90 400 490
Tissue culture negative,
DNA hybridization positive
75 25 100
Tissue culture positive,
DNA hybridization negative
10 0 10
Table IV. Hypothetical results of retesting conducted on the 800 patient samples previously examined by culture and molecular methods (see Table I), using a third testing method. Results of this retesting are assumed to represent the actual truth about the viral infection of the samples.



 

Continuing the hypothetical example described above, an example of this procedure is shown in Table IV. Here, all 800 original patient samples have been reassayed using a third test method, the results of which are assumed to represent the actual truth about the viral infection of the samples. In other words, a positive result using this method is presumed to represent a clinically important infection, while a negative result represents no infection or an infection of subclinical importance.

From the results shown in Table IV, two 2 x 2 tables can be constructed, as shown in Tables V(a) and V(b). Table V(a) compares the results of the culture method to the third assay, yielding accuracy of 79%. Table V(b) compares the results of the molecular method with the third assay, yielding accuracy of 84%. Using this approach, the researchers would have demonstrated that the DNA hybridization method is more accurate than the tissue culture method in an unbiased statistical test.

Tissue Culture Third Assay Positive Third Assay Negative Total
Positive 210 0 210
Negative 165 425 590
Total 375 425 800
Accuracy: [(210 + 425)/800] x 100 = 79.4%
Table V(a). Results from third-method retesting (see Table IV), comparing the results of the culture method with the third assay, yielding accuracy of 79%.




 

DNA Hybridization Third Assay Positive Third Assay Negative Total
Positive 275 25 300
Negative 100 400 500
Total 375 425 800
Accuracy: [(275 + 400)/800] x 100 = 84.4%
Table V(b). Results from third-method retesting (see Table IV), comparing the results of the molecular method with the third assay, yielding accuracy of 84%.



 

This procedure for avoiding the introduction of bias suffers from one, not insignificant, limitation. It can only be employed in cases in which there already exists an accurate assay that can be used as a confirmatory test. Such a test must have high sensitivity and specificity and must be very reproducible. It also should be recognized as a standard by clinical laboratorians and test manufacturers.

Conclusion

Researchers can avoid introducing bias into their statistical analyses by making use of several scientifically and statistically valid approaches to testing. In the next installment of this article, the authors look at the techniques available to researchers, with special reference to their adherence to emerging FDA guidances.

Continue to Part 2 of this article.

References

1. A Hagdu, "The Discrepancy in Discrepant Analysis," Lancet 348 (1996): 592–593.

2. A Hagdu, "Bias in the Evaluation of DNA- Amplification Tests for Detecting Chlamydia trachomatis," Statistics in Medicine 16 (1997): 1391–1399.

3. HB Lipman and JR Astles, "Quantifying the Bias Associated with Use of Discrepant Analysis," Clinical Chemistry 44 (1998): 108–115.

4. TA Green, CM Black, and RE Johnson, "Evaluation of Bias in Diagnostic-Test Sensitivity and Specificity Estimates Computed by Discrepant Analysis," Journal of Clinical Microbiology 36 (1998): 375–381.

5. WC Miller, "Bias in Discrepant Analysis: When Two Wrongs Don't Make a Right," Journal of Clinical Epidemiology 51 (1998): 219–231.

6. Method Comparison and Bias Estimation Using Patient Samples, approved guideline, NCCLS document EP9-A (Wayne, PA: National Committee for Clinical Laboratory Standards, 1995).

7. "Guidance on Labeling for Laboratory Tests," (Rockville, MD: Division of Clinical Laboratory Devices, Office of Device Evaluation, Center for Devices and Radiological Health, FDA, 1999), p. 4.

Cheryl L. Hayden is an independent consultant in clinical trial design and management and Michael L. Feldstein is director of clinical services with Medical Device Consultants Inc. (Attleboro, MA).

Continue to Part 2 of this article.



Return to the IVDT Jan/Feb table of contents |


Copyright ©2000 Medical Product Manufacturing News


No votes yet