![]() |
Regulations & Standards |
IVD applications of FDA’s guidance document discussing clinical investigations to evaluate the safety and effectiveness of a medical device.
FDA issued draft guidance on August 15, 2011, that addresses design of pivotal clinical trials for medical devices. The comment period expired November 14, 2011. There are no surprises in this guidance. The principles of trial design that are discussed are standard techniques that have been used to evaluate clinical utility of medical devices for many years. The guidance does make clear that FDA expects device manufacturers to be as rigorous in design of clinical trials as drug and biologic manufacturers.
FDA defines a pivotal trial as “a definitive study in which evidence is gathered to support the safety and effectiveness evaluation of the medical device for its intended use.” The guidance document addresses trial design for therapeutic and aesthetic devices, as well as diagnostic devices. This article will consider only those designs appropriate for diagnostic devices. Trials to establish clinical validity of companion diagnostic devices are specifically not covered in this guidance, nor are trials of products that are regulated by CBER, such as blood donor screening tests.
A device or diagnostic clinical trial conducted in the United States must comply with 21 CFR 812. A trial conducted outside the United States under an investigational device exemption (IDE) granted by FDA also must comply with 21 CFR 812, if the data are to be submitted to FDA to support a marketing application. The Federal Food, Drug, and Cosmetic Act grants FDA regulatory authority under which it may require valid evidence of the safety and effectiveness of a device. Types of evidence accepted by FDA as valid are listed in 21 CFR 860.7(c)(2) as “evidence from well-controlled investigations, partially controlled studies, studies and objective trials without matched controls, well-documented case histories conducted by qualified experts, and reports of significant human experience with a marketed device.” Valid scientific evidence does not include “isolated case reports, random experience, reports lacking sufficient details to permit scientific evaluation, and unsubstantiated opinion.”
In determining safety and effectiveness of a device, FDA considers the risk-benefit assessment, which is defined in 21 CFR 860.7(b)(3) as weighing “the probable benefit to health from the use of the device…against any probable injury or illness from such use.” Factors that are considered in the risk-benefit analysis include diversity of the population in which the device may be used, variability in performance of the device depending on user expertise, and availability of alternative devices and their relative performance.
Prior to commencing a trial for a significant risk device, the sponsor must obtain an IDE. Non-significant risk devices are considered to have an approved IDE unless FDA has notified the sponsor that an IDE application is required. Determination of significant vs. non-significant risk is the responsibility of the sponsor, as described in 21 CFR 812.2(b). Institutional Review Board (IRB) approval for the study must be obtained for each institution where the study will be performed, regardless of the risk determination. In many cases, IVDs may be considered non-significant risk devices, pursuant to 21 CFR812.2(c)(3). Regardless of risk determination, the scientific rigor of the trial, and the robustness of evidence collected, must be adequate. It is suggested that all sponsors, whether the device is significant risk or non-significant risk, commence a dialog with FDA prior to finalization of the protocol, to ensure that reviewers agree that the trial design addresses the appropriate scientific questions and would support a premarket submission. [Authors’ note: When a sponsor has determined that a device meets the criteria for non-significant risk, the Institutional Review Board will review that determination.]
Two statutory provisions stipulate that FDA must promulgate regulations allowing for the least burdensome requirements while still reflecting rigorous scientific standards [Sections 513(a)(3)(D)(ii) and 513(i)(1)(D) of the FD&C Act]. A guidance document was issued in 2002 (“Least Burdensome Provisions”) that interprets least burdensome to mean the most appropriate investment of time, effort, and resources by both industry and FDA to bring a product to market.
In this guidance, a diagnostic device is defined as any device that provides results to assess a subject’s health condition of interest (target condition), whether used alone or with other information. Included in this definition are devices used to collect, prepare, or examine specimens taken from the human body (in vitro diagnostics, IVDs). Devices that may assess conditions other than disease, such as pregnancy, specific immunity, or genotyping information, are also included. A device may have more than one intended use, which may require more than one clinical trial to support the claims of multiple intended uses.
Features unique to IVDs should be addressed in the trial design, including how and why the device works, user skill level and training, user learning curve, and human factors. (See guidance document, Medical Device Use-Safety: Incorporating Human Factors Engineering into Risk Management; July 18, 2000.)
Exploratory studies, such as bench or analytical studies, provide important information for the design of the pivotal trial. Because device development is often an ongoing, iterative process, the decision of when to move from the exploratory to the pivotal stage is not always obvious. The exploratory stage should bring the device close to the form that will be marketed prior to starting the pivotal trial. This decreases the likelihood that midstream alterations of the trial will be needed, with the consequent increases in time, cost, and sample size (i.e., number of patients enrolled). Major changes in the device while a trial is ongoing may invalidate the trial or necessitate abandoning it.
For IVDs, the exploratory stage should include analytical validation to establish performance characteristics, including analytical specificity, precision, and limit of detection. If an algorithm is required for use of the device, it should be developed during the exploratory stage. The threshold for clinical decisions (cut-off) should also be determined. However, stability testing may continue through the pivotal stage. Finally, the exploratory stage should yield information to finalize the device design, and establish appropriate endpoints for the pivotal trial. This is stated in 21 CFR 860.7(f)(2): “…a well-controlled investigation shall involve the use of a test device that is standardized in its composition or design and performance.” FDA expects that the device used in the pivotal clinical trial is equivalent in design and function to the device that is marketed.
The choice of trial design must be appropriate to support the determination of safety and effectiveness of the device. For IVDs, this generally means that the trial design will be a clinical performance trial. In this type of trial, test results are obtained and compared to some endpoint (e.g., presence of disease, extent or severity of disease, comparator test result), but not used for patient management. A clinical outcome trial is appropriate for an IVD in the case where disease management is based on the test result, so that the patient’s subsequent course of treatment or management is changed.
The trial should be designed so that bias is minimized. Bias is defined as the introduction of systematic error into the trial results, and can occur in subject selection, trial design, trial conduct, and data analysis. Another consideration in trial design is variability in device performance. Confidence limits on variability in device performance are controlled by sample size (number of patients enrolled); however, a very large sample may result in statistical significance for a clinically insignificant outcome. Statistical analysis of the results may be formal hypothesis testing or point estimation with confidence intervals.
To avoid bias in subject selection, the trial population should reflect the target population for the device’s intended use, based on specific enrollment criteria. This means that the trial population should have similar demographics to the target population with respect to age, race, sex, and ethnicity, and should also be similar in prevalence of the target condition and in diagnosis and intervention patterns for the condition for which the device is intended. There are several possible subject selection methods, including random selection, consecutive or systematic selection, and convenience selection. Random selection is the most likely to provide an unbiased sample, but is not always appropriate for IVD pivotal trials. Consecutive selection (selecting every subject who meets the enrollment criteria in the order they present at the clinical site) or systematic selection (selecting, for example, every fourth subject who meets the enrollment criteria) will also provide unbiased samples as long as there are no confounding variables
introduced over the trial period (e.g., change in patient referral patterns). A convenience selection consists of subjects enrolled because they are conveniently accessible. There is a good chance that this selection method will not provide a generalizable result.
Obtaining a representative trial population may require a multicenter trial. Additionally, it is necessary to demonstrate that the device yields reproducible results under different circumstances, such as when different instruments are used or different operators perform the assay. [Authors’ note: Reproducibility studies should follow recommendations found in CLSI document EP5-A2, Evaluation of Precision Performance of Quantitative Measurement Methods.] It is important to consider the epidemiologic distribution of the patients seen at the trial sites. For example, tertiary care centers often see a different spectrum of a disease than primary care physicians. If the trial population has more severe disease than the intended use population for the device, the trial may yield a biased estimate of device performance.
A comparative trial is one that compares the performance of two or more diagnostic tests. This is most often done in a paired design, where all tests are performed on the subject or sample at the same time. It is also possible to use a parallel group design, where only one test is performed on each subject or sample. Comparisons are subsequently made between the subject groups. For this type of design, randomization of subjects to the trial groups is recommended. A cross-over design also may be used, in which all subjects undergo all tests, but not at the same time. This design is appropriate when there is no carry-over effect among the tests; randomization of the test order is recommended.
Trial objective. The trial objective must be related to the intended use of the device and indications for use in the labeling. Wording of the trial objective is important; the outcome measure of the trial is determined by the trial objective and should be something that can be determined definitively. There can be multiple trial objectives, but the trial design and analysis plan will be determined by the primary objective.
Trial population and subject selection. The trial population should be representative of the target population. The method for choosing subjects for the trial should be specified in the protocol and be uniform across all clinical sites. It is important to ensure that trial subjects represent the entire spectrum of the target disease or condition. If the target condition is a rare event (e.g., patients with a very rare form of cancer) it may be appropriate to overrepresent the target condition in the patient population. In that case, the statistical analysis plan must consider the potential for bias in the estimate of test performance.
Site selection. The clinical sites chosen for the trial should be representative of the types of sites for which the device is intended for use. For example, if a device is intended to be used in point-of-care settings, then the trial should be conducted in point-of-care clinics and not in hospital central laboratories.
Masking (blinding). It is important to ensure that the user of the device is unaware of the subject’s status relative to the outcome measure, such as the presence of the target condition or the result of a comparator device (in the case of IVDs that would be the person actually performing the assay and anyone who is interpreting the results of the assay). Inadequate masking of the outcome measure may result in bias in the results. Masking requires that all samples be made to look equivalent; if archived samples are used, they must not be distinguishable from prospectively acquired samples.
True status of the target condition. Ascertainment of the true status of the target condition is often the outcome measure. It is important that this be well defined in the protocol and determined in the same manner across all clinical sites. For some target diseases the true status may be difficult to ascertain. In that case, a method comparison study may be performed. If the comparator method reflects the true status of the target condition (i.e., is considered a gold standard), then presence or absence of disease is considered to be definitively determined, and sensitivity and specificity of the investigational IVD may be reported. If a gold standard is not available and a comparison study is performed using a comparator method, then the trial is said to be an agreement trial. In such a trial, the absolute accuracy relative to the true status of the target condition cannot be estimated, and percent positive and negative relative to the comparator method is reported.
Specimen collection. Specimens may be collected prospectively or banked samples may be used in a retrospective study. When the specimens and clinical data are collected following a protocol, in which only those subjects who meet the trial criteria are included, the specimens are considered to be collected prospectively. If specimens and clinical data are used that were collected under a different protocol or no protocol, they are banked samples that are studied retrospectively. It is important to consider possible sources of bias when using banked samples in a retrospective study; they may not be representative of the target population. For IVD trials possible sources of bias may arise from differences in specimen collection, storage, and handling. These must be fully described in the protocol and it is critical to ensure that all clinical sites follow the same procedures.
Total test concept. The use of diagnostic devices is dependent on the skill and training of the person operating the device. Differences in performance level among personnel within a site and across a site may affect device performance. The trial design and the analysis plan should account for variability among device users. It may be necessary to perform additional studies to determine the variability of device performance across the spectrum of users.
Other issues that may affect the integrity of the trial include trial conduct, data management, data analysis, and changes to the protocol or the device while the trial is in progress. Trial conduct must be monitored carefully at all the clinical sites to ensure that the protocol is strictly followed, randomization code and procedure (if used in the trial) are carefully preserved, and masking of the outcome to device users is strictly maintained. A log of protocol violations and any unmasking events should be kept.
Data management should be specified prior to initiation of subject enrollment, following the principles of good clinical data management practices. This requires that data be collected in a consistent format across all clinical sites, and that monitoring of the data collection is vigilant to ensure that it is accurate. Monitoring procedures and a quality assurance program should be in place before the trial commences to ensure the trial is conducted properly.
A statistical analysis plan (SAP) should be designed prior to sample and data collection. Inappropriate or post hoc data analyses can jeopardize the usefulness of the data to support safety and effectiveness of the device. The SAP should specifically address the objectives of the trial. [Authors’ note: If a presubmission (pre-IDE) meeting is held, sample size and SAP should be a topic for discussion.]
Changes to the protocol or the device that occur after trial initiation can endanger the scientific validity of the trial. However, if interval modifications to the trial design, such as a change in sample size, are incorporated into the original protocol, then the statistical integrity can be maintained. Changes that are not preplanned may severely weaken the trial.
This draft guidance may be viewed as an introduction to clinical trial design. It briefly discusses the essential elements of good trial design for medical devices, including in vitro diagnostic devices. If a device company were to follow the recommendations in this guidance when sponsoring a clinical trial, the results of that trial would most likely be scientifically valid and acceptable to FDA reviewers as supporting evidence for the safety and effectiveness of the device.
Gail Radcliffe, PhD is President and Cheryl Hayden, PhD is a Consulting Partner at Radcliffe Consulting, Inc. Radcliffe has more than 20 years’ experience helping medical device and diagnostic companies with technical assessment, market research and clinical/regulatory affairs. Hayden has more than 30 years’ experience with clinical devices, specializing in clinical and regulatory affairs and medical writing.