Implementing proper statistical methods ensures the development of safe and effective assays.
|Figure 1. A sample calibration curve showing calibration using a linear fit to a standard curve. Using the regression equation obtained from the linear regression, one can predict the concentration of an analyte in a corresponding unknown sample (click to enlarge).|
The key statistical objective in IVD device evaluation is establishing performance criteria while minimizing bias and maximizing precision. While the statistical considerations and methodologies required for such evaluation contribute to this objective, they vary throughout the IVD product development process.
Although clearly separate, the assay development and validation phases are often erroneously folded into one. An assay's development phase requires continuous evaluation that should be clearly distinct from the validation phase. The regulatory and statistical requirements for assay development and validation are different. Understanding the differences between them is crucial for discussing the statistical considerations in each phase.
Utilizing good statistical practices early in an assay's development will ensure not only that the regulatory requirements are met but also that a solid body of knowledge exists regarding the device's performance. Furthermore, confidence in the performance profile of an assay can potentially affect its clinical evaluation.
Assay Development versus Assay Validation
What Is Assay Development. During the assay development, or assay optimization, phase, an analytical process or idea is defined and optimized into a robust and reproducible device that delivers results as intended. The majority of assays can be categorized into one of the following three types: qualitative, semiquantitative (qualitative assays based on a quantitative determination in which a clinically meaningful gradation of results exists), and fully quantitative. One optimization strategy for fully quantitative and semiquantitative assays is using a calibration curve. Another common optimization technique for semiquantitative assays involves using receiver operating characteristic (ROC) curves.
An assay's evolution begins with a clearly desired objective. Whether for basic research or clinical purposes, an assay's intended use becomes the anchor to which all optimization and validation activities are set. Optimizing an assay involves choosing its optimal format. With the intended use in mind, a new assay's appropriate performance characteristics are then defined. Although a number of performance characteristics, such as stability, accuracy, and precision, should be reviewed in all assays, other individual characteristics such as robustness and reproducibility may not be as important, depending on the device's intended use.1,2
|Figure 2. Graphical patterns (residual plots) of four regression fits: a) normal pattern, b) nonlinear kinetics, c) regression outlier, and d) heterogeneity of variance. Examples of residual patterns associated with a proper regression fit and three commonly occurring improper fits. The y-axis is residuals or the difference between the observed value (Yi) and the predicted value E(Yi) from the regression line. The x-axis is predicted value (click to enlarge).|
The maturation, or optimization, phase is a continuous cycle that begins with defining these initial performance characteristics, and continues until the performance metrics are established and there is confidence in the results that are obtained from the assay. As a device prototype is being finalized, the final stages of assay development focus on the initial feasibility of manufacturing and marketing the device. Once a final optimized and feasible prototype design is completed, it proceeds to assay validation. The culmination of any assay development involves drafting a development report.
What Is Assay Validation. After successfully completing the development phase and prior to implementation, an assay must undergo a validation period. Validation can begin only after the assay design is set and the test parameters have been established. Based on the data obtained during the development phase and with the help of sound judgment, a validation protocol is prepared. Such a protocol should include experiments that confirm those assay parameters deemed important during the design period and test whether the device meets the performance criteria for its intended use. Some test parameters that could be included in the validation protocol are accuracy, precision, linearity, and specificity. The protocol must also include predefined acceptance criteria for each of the assay parameters.
If an assay successfully passes all criteria in the validation protocol, a validation report should be prepared at the conclusion of this phase. Such a report should outline the experiments performed, any deviations (with justifications) from the protocol, and the results of the evaluations.
Delineating Assay Development and Validation. An assay cannot fail in the development or optimization phase. If an assay does not meet the criteria during development, it either gets reoptimized until it can achieve acceptable performance standards or is rejected for its intended use. However, an assay can fail in the validation phase. If an assay does not meet the predefined acceptance criteria during validation, further development is required. After determining and resolving the cause of the failure, an assay should be reoptimized. Once satisfactory performance is achieved, an assay will then be tested under a new validation protocol. The validation results certify that an assay is fit for use.
Statistical Considerations in Assay Development
|Figure 3. Examples of three ROC curves showing excellent, good, and worthless assays plotted on the same graph. The accuracy of an assay depends on its ability to separate the group being tested into those with and those without the condition of interest. The accuracy can be measured by the area under the ROC curve (1 = perfect test, 0.5 = worthless test) (click to enlarge).|
Statistics play a crucial role in understanding assay results and developing experimental strategies for optimizing the device. The statistical methods employed during assay optimization are generally simple and understandable. However, assay developers should pay careful attention to ensure that these methods are implemented correctly and the results are appropriately interpreted.
Analytical Method Calibration. Many quantitative or semiquantitative assays may require comparing the results to a standard curve, or in other words, they require calibration. Although the same statistical principles apply to all assays, those devices in which quantitation depends on a standard curve present unique challenges.
In order to quantify the amount and activity of an analyte in a sample, calibration to a standard may be required. A series of samples with known amounts of an analyte are run on an assay. The assay results are then plotted against the reference standards, and a curve is statistically fitted to the data. In some cases, a calibration curve with a simple linear fit may be generated (see Figure 1).
Although the example in Figure 1 shows a linear fit, most biological assays do not exhibit linearity across their complete range. The strategy in such cases should be to assess an assay's linear range and establish a standard curve within this range. The linear range can be determined graphically using a plot similar to the example above. A number of linear fit techniques are available, although the least-squares approach is often sufficient. A simple transformation of the data (such as a log, log-log, square root transformation) may also be required to obtain a linear fit. However, if an assay does not perform in a linear fashion throughout its analytical range (i.e., typical transformation methods are not adequate) and graphical plots show a sigmoidal relationship, the standard curve may be able to be modeled on the four-parameter logistic regression equation.3
In order to verify the goodness of fit of a linear equation, common regression diagnostics should be applied. Although no standard exists, a regression line's fit is commonly measured using a threshold value of the coefficient of determination (r2) associated with the regression fit. An assay that does not meet this criterion will be deemed invalid in the validation phase and should undergo further development. While relying on r2 as a goodness-of-fit measure, assay developers should take caution to ensure that the standard curve does exhibit a linear response, that no regression outliers exist, and that the data have no systematic bias or heterogeneous variability.4 Such features can be elicited using common graphical diagnostics techniques such as residual plots (see Figure 2). While quantitative statistical tests of curvature, outliers, and patterned residuals are also available, they should be understood prior to use.5 Once the validity of the regression line has been established, predicted concentrations can be determined using inverse regression techniques.
|Figure 4. A sample Pareto plot from a 23 factorial experiment showing the absolute effects of factors X, Y, and Z as presented in Table I on the activity level of the analyte as well as the effect of 2-factor and 3-factor interactions on the activity level of the analyte (click to enlarge).|
Receiver Operating Characteristic (ROC) Curves. When developing semiquantitative assays that can be compared with a gold-standard in which true disease states are known, ROC curves should be used.6 An ROC curve is a plot of the true-positive rates against the false-positive rates for the different possible cut points of a test (see Figure 3). An ROC curve shows the trade-off between the sensitivity and specificity of an assay.6 The closer a curve follows the y-axis and the top border of the graph sheet, the more accurate the test. Another way to express it is that the area under an ROC curve is a measure of assay accuracy. The Clinical and Laboratory Standards Institute (CLSI; Wayne, PA) has established guidelines (in document GP10) that provide further guidance on the use and utility of ROC curves.7
Assay Optimization. The goal of assay optimization is establishing a plan for evaluating the factors that may affect a device's performance. The factors that may affect an assay are temperature, humidity, sample contaminants, the matrix tested, and reagent composition. Assay developers should study these factors alone and in combination to assess their effects on the device's accuracy, precision, repeatability, and cross-reactivity.
The most commonly chosen experimental design for assay optimization is a factorial design (see Table I). Depending on the number of factors to be tested, such an experiment will employ either a full factorial or fractional factorial design. A full description of factorial experiments can be found in many statistical data analysis texts.8 Depending on the experimental scenario, other methods such as a full-randomized design or a randomized block design may be employed.8
|Figure 5. An example of a boxplot (also known as a box-and-whisker plot) showing outliers in the dataset, which are identified using Tukey's rule (click to enlarge).|
A factorial design allows for simultaneous evaluation of multiple factors that might influence an assay's performance. (Fewer than five factors use a full factorial design, more than five factors use a fractional factorial design.) When using a factorial design in an experiment, the runs should be performed in a random order. Fractional factorial designs are used when the number of runs in a full factorial design becomes too tedious and when resources are not available to complete a full factorial design. Since deciding which runs to choose and which to leave out can be complex, researchers should consult the relevant references to ensure that the runs are appropriately selected.7
Once an appropriate factorial design is chosen, the experiment can be conducted. The results of the experiment can be graphically summarized using a Pareto plot (see Figure 4). A Pareto plot shows the effects of any given factor compared with all other factors, including possible factor interactions, in an ordered fashion.
Statistical Considerations in Assay Validation.
|Table I. An example of a 23 factorial experimental design studying the effect of three hypothetical factors (X, Y, Z) on the activity level of the analyte (click to enlarge).|
Unlike the statistical and experimental methods for assay development, which are not well defined, the statistical methods for assay validation and data analysis have been established and widely published.
Assay Performance Metrics. As part of its Harmonized Tripartite Guideline, the International Conference on Harmonization (ICH; Geneva) released the following two documents: “Text on Validation of Analytical Procedures” and “Validation of Analytical Procedures: Methodology.”1,2 These documents provide detailed definitions of appropriate validation parameters and suitable evaluation methods. These documents also provide the metrics of assay performance and their definitions (see Table II).
When reviewing the statistical considerations for assay validation, the standard validation parameters should be grouped into two areas: the measurements themselves and the variability in those measurements (see Table II). The validation metrics related to an analytical assay's measurements are accuracy, linearity, and specificity. The validation metrics related to the variability of an analytical assay are precision, robustness, range, limit of quantitation, and limit of detection.
|Table II. A list of accepted assay performance metrics and their definitions as well as their relation to assay measurement or assay variability as presented in the ICH Tripartite Guideline and the NCCLS guidelines (click to enlarge).|
The CLSI guidelines provide a solid methodology for validating an assay against the chosen parameters.7 Although specialized software is available for analyzing the data from assay validation experiments, in accordance with CLSI methods, such data analysis can also be performed by using standard elements in Excel and any statistical software package.7 The most common statistical methods used in assay validation are precision analysis, linearity analysis, and methods comparison.
Every precision analysis must begin by setting a precision goal or precision acceptance criterion. The experiment involves repeated measuring of a known amount of a sample analyte. The larger the number of replicates measured during the experiment, the greater the accuracy of the precision estimate. The results of such analysis should be reviewed for the presence of outliers. Since outliers cannot be defined arbitrarily, they should be assessed using acceptable methods such as Tukey's rule.9
Tukey's rule proposed that observations lying at least 1.5 times the inter-quartile range (the difference between the first and third quartiles) beyond one of the quartiles could be removed from an analysis. Such observations can be discerned by summarizing data using a boxplot (see Figure 5). However, during the assay development stages, removing such outliers is not advisable, unless a well-founded reason is identified. At the same time, outliers resulting from operator error, contamination, or mechanical failure should be removed.
|Table III. A summary of guidelines and references for early assay development (click to enlarge).|
Precision is determined by calculating the mean, the standard deviation with a 95% confidence interval, and the coefficient of variation (CV) of the data. Although the CV has been widely used, it is not useful in all cases. For example, in negative analyte cases in which the signal approaches zero, the mathematical result will be an infinitely large CV that does not correctly reflect the assay's precision. Presenting the standard deviation and its corresponding 95% confidence interval is preferred. The pass-fail criteria for this type of precision analysis involve assessing whether or not the standard deviation of the estimate exceeds the precision goal.
In addition, other more-complex parameters for within-run precision and total-run precision are required for assay validation. The experimental process required to obtain data for such parameters is defined in the CLSI guidelines in document EP5.7 To assess data outliers using this method, guidelines necessitating a preliminary run should be followed. Once the data are collected for the course of the experiment, it should be assessed using variance component analysis and analysis of variance to determine within-run and total-run precision, as well as the 95% confidence intervals. The results are then scrutinized against the predetermined precision acceptance criteria.
Linearity analysis involves using an assay to test approximately 6–11 samples, preferably with multiple replicates of each specimen. This type of analysis assesses three assay performance metrics: accuracy, linearity, and reportable range. The acceptance criteria for these parameters should be defined before beginning an experiment. For example, the acceptable parameters for accuracy are defined in terms of total allowable error, which is comprised of two components: systematic error and random error. While these error components can be individually predefined, doing so is not necessarily required. For the assay's reportable range, the acceptable parameters should reflect the device's analytical range. The degree of linearity of the assay results can be predefined by an acceptable goodness-of-fit statistical test range. However, as indicated in the CLSI guidelines in document EP6, using the rule of thumb, “if it looks like a straight line,” is also adequate.7 In general, the pass-fail criteria for the assay performance metrics involve assessing whether the results are sufficiently close to the predefined acceptance level.
Traditional methods comparison begins with setting either appropriate acceptance criteria for methods correlation or standard criteria for the slope and intercept. Methods comparison involves selecting two comparative methods, defining their standard deviations, and analyzing 20–50 specimens using both methods. The key point in traditional methods comparison is ensuring that the sample measures cover the full reportable range. Replicates are not usually performed in traditional methods comparison, and assay developers should take caution in analyzing results in which replicates were taken because assumptions of independence will be void.
Any outliers can be dealt with through one of two methods. Although not preferable, the first removes the outliers by using a method, such as Grubb's test, to identify them.10 The second uses regression methods, such as Passing-Bablok regression, that are not distorted by outliers.11
With any statistical software package, assay developers can compute the slope, the intercept, and their associated 95% confidence intervals. Two methods are identical if the 95% confidence interval (CI) for the slope includes 1.00, and the 95% CI for the intercept includes 0. Methods correlation must also meet or exceed the prespecified level.
In addition to traditional methods comparison, two special cases exist for which specific CLSI guidelines have been written: document EP9 for comparing two methods with a 1:1 relationship, and document EP12 for comparing two qualitative methods.9
The statistical considerations for assay development and assay validation are important to keep in mind during product development. Whether used in basic research or clinical diagnosis, assays require vigilant development processes to ensure that the devices will consistently perform to specifications and that they will be deemed safe and effective for their intended use after validation. Just as good manufacturing practices (GMP) are required during the development of new IVDs, manufacturers and researchers alike should also conform to good statistical practices.
The methods presented in this article are not meant to represent an exhaustive review of the available techniques, but rather a general guide to governing principles in early assay development. Consulting a statistician may offer IVD companies better and more-tailored approaches to their particular assay development.
Regardless of the approach, following good statistical practices will ensure that an assay is validated with the appropriate methodologies. In addition, using sound statistical methodologies during assay development may lead to reduced development time and reduced overall cost. A number of Web sites provide useful guidelines and references for early assay development (see Table III).
1. “Text on Validation of Analytical Procedures (Q2A),” the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use Web site (Geneva: 1994 [accessed 23 November 2004]); available from Internet: www.ich.org/MediaServer.jser?@_ID=417&@_MODE=GLB.
2. “Validation of Analytical Procedures: Methodology (Q2B),” the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use Web site (Geneva: 1996 [accessed 23 November 2004]); available from Internet: www.ich.org/MediaServer.jser?@_ID=418&@_MODE=GLB.
3. A DeLean, P Munson, and D Rodbard, “Simultaneous Analysis of Families of Sigmoidal Curves: Application to Bioassay, Radioligand Assay, and Physiological Dose-Response Curves,” American Journal of Physiology 235 (1978): 97–102.
4. F Anscombe, “Assay Development,” Journal of the American Statistical Association 27 (1973): 17–21.
5. V Barnett and T Lewis, Outliers in Statistical Data (New York: Wiley, 1978), 252–256.
6. “Statistical Guidance on Reporting Results from Studies Evaluating Diagnostics Tests; Draft Guidance for Industry and FDA Reviewers,” the FDA Web site (Rockville, MD: 2003 [accessed 12 January 2005]); available from Internet: www.fda.gov/cdrh/osb/guidance/1428.html.
7. “CLSI Evaluation Protocols (EP05 through EP21),” the Clinical and Laboratory Standards Institute Web site (Wayne, PA: 2004 [accessed 2 February 2005]); available from Internet: www.nccls.org.
8. L Ott, An Introduction to Statistical Methods and Data Analysis, 4th ed. (Florence, KY: Wadsworth Publishing, 1988), 870–891.
9. “Tukey's Outlier Filter,” the RoyaltyStat Web site ([accessed 12 January 2005]); available from Internet: www.royaltystat.com/tukeysoutlier.cfm.
10. GraphPad QuickCalcs, “Grubb's Test for Detecting Outliers,” the GraphPad QuickCalcs Web site (2004 [accessed 23 November 2004]); available from Internet: www.graphpad.com/quickcalcs/GrubbsHowTo.cfm.
11. MedCalc Manual, “Method Comparison: Passing & Bablok Regression,” the MedCalc Web site (Mariakerke, Belgium: 2004 [accessed 23 November 2004]); available from Internet: www.medcalc.be/manual/mpage06-12b.php.
Anastasia N. Derzko is a biostatistician at Spectral Diagnostics Inc. (Toronto). She can be reached at email@example.com.
Copyright ©2005 IVD Technology