General Performance Metrics for Imaging Markers

Cure Arthritis Naturally

Beat Arthritis Naturally

Get Instant Access

Based on the considerations outlined above, a fundamental set of performance criteria can be defined by which the utility of different imaging markers can be compared in a variety of clinical and research applications. These metrics are: (1) validity, (2) responsiveness (rate of change) to disease and therapy, (3) measurement precision, (4) convenience, and (5) cost.

As stated previously, the most important measure of marker's utility in clinical trials or clinical practice is its link to the "true" clinical outcome of interest. Very few biomarkers thus far have been widely accepted as true surrogate endpoints for clinical outcomes. However, the degree of validity required depends on the specific objectives of the study. Therapeutic confirmatory studies, upon which regulatory approval will be based, demand the most stringent validation and regulatory acceptance of the primary endpoints used. Secondary endpoints or those used in studies aimed primarily at internal decision making can take advantage of more novel techniques and markers that may not yet be formally ratified but that carry significant advantages in terms of predictive power and sensitivity to change.

In addition to the pathophysiological validity of the disease feature itself, it is important to consider the technical, or assay, validity of the instrument used to measure it. This relates to the image acquisition technique and the reading/measurement system used, and is typically expressed in terms of sensitivity, specificity and area under the receiver operating characteristic (ROC) curve, provided a gold standard exists. In the absence of hard criterion validity, softer, indirect validation measures, such as expert opinion or group consensus, must be relied on.

It is also important that the dynamic range of the surrogate marker capture as much of the clinically relevant range of changes in the "true" outcome as possible (Fig. 15). Insensitivity of a surrogate marker to early changes in the true outcome is sometimes called a "floor effect" while failure to register severe but still relevant changes in the true outcome is known as a "ceiling effect." Radiographic minimum joint-space width, which is an indirect marker of articular cartilage thickness in diarthroidial joints, shows a ceiling effect when additional cartilage loss is still possible in a joint compartment after the space has been completely obliterated in one particular spot. Dynamic range characteristics can influence study results in complex ways. Consider the following scenario (Fig. 16). Hypothetical techniques X and Y both measure the same disease feature (e.g., bone erosion). Technique X shows greater sensitivity than technique Y for small changes but equivalent sensitivity for large changes. Paradoxically, the floor effect of technique Y could make it appear to be more responsive to change than technique X under certain circumstances. It is important, therefore, to consider dynamic range in the design and interpretation of studies containing longitudinal image data.

Figure 15 Dynamic range. Dynamic range refers to the proportion of change in the "true" outcome of interest that is captured by changes in the surrogate endpoint. Inability of the surrogate endpoint to detect small changes in the true outcome is sometimes referred to as a "floor effect." Failure to register severe changes is called a "ceiling effect." Ideally, any such floors or ceilings lie outside of the range of changes in the true outcome that are relevant to the question under study.

Figure 15 Dynamic range. Dynamic range refers to the proportion of change in the "true" outcome of interest that is captured by changes in the surrogate endpoint. Inability of the surrogate endpoint to detect small changes in the true outcome is sometimes referred to as a "floor effect." Failure to register severe changes is called a "ceiling effect." Ideally, any such floors or ceilings lie outside of the range of changes in the true outcome that are relevant to the question under study.

Lie Smaller Equal

Figure 16 Dynamic-range effects. (A) Technical dynamic ranges of two hypothetical techniques for imaging bone erosions. Technique X is insensitive to erosions smaller than Score 1 (hypothetical scoring system) and technique Y is insensitive to erosions smaller than Score 1 (hypothetical scoring system) and technique Y is insensitive to erosions smaller than Score 2, but both show equivalent performance for erosions greater than or equal to Score 2 (i.e., the techniques show nonlinear sensitivity profiles). If a patient had a Score 1 erosion at baseline, as in (B), it would be registered by technique X but not technique Y. If the erosion grew to grade 2 by 6 months, both techniques would register the erosion at this time point. However, in a plot of the change in erosion score over baseline (C), technique Y, by virtue of its floor effect, would paradoxically register greater progression between baseline and 6 months than would the more sensitive technique X. This hypothetical example illustrates the importance of considering dynamic range in the design and interpretation of longitudinal imaging studies.

Figure 16 Dynamic-range effects. (A) Technical dynamic ranges of two hypothetical techniques for imaging bone erosions. Technique X is insensitive to erosions smaller than Score 1 (hypothetical scoring system) and technique Y is insensitive to erosions smaller than Score 1 (hypothetical scoring system) and technique Y is insensitive to erosions smaller than Score 2, but both show equivalent performance for erosions greater than or equal to Score 2 (i.e., the techniques show nonlinear sensitivity profiles). If a patient had a Score 1 erosion at baseline, as in (B), it would be registered by technique X but not technique Y. If the erosion grew to grade 2 by 6 months, both techniques would register the erosion at this time point. However, in a plot of the change in erosion score over baseline (C), technique Y, by virtue of its floor effect, would paradoxically register greater progression between baseline and 6 months than would the more sensitive technique X. This hypothetical example illustrates the importance of considering dynamic range in the design and interpretation of longitudinal imaging studies.

How quickly the marker changes in response to disease or therapy is another important metric of performance. Marker responsiveness determines the minimum follow-up interval theoretically possible for demonstrating disease progression or improvement. Highly responsive markers are important in clinical practice for identifying patients who are failing therapy and may require dose adjustments or change to a different, hopefully more effective treatment. Marker responsiveness is also important during the early clinical testing of a new drug when the safety profile has not yet been fully established. Typically, such trials need to be less than 3 months in duration, and therefore require markers that can demonstrate change within that time frame. Additionally, there is enormous financial incentive to accelerate the drug development process and enter the marketplace sooner. This includes first-mover advantages for novel agents, but also longer market exclusivity during the finite life span of a drug patent. The patent for a new investigational drug must be filed as soon as human testing of the drug begins, and therefore a portion of the period of market exclusivity provided by the patent will be consumed by formal clinical testing and regulatory due diligence. When measured in terms of lost revenues, this can amount to $25 million for every month that a $300 million/year drug is delayed entry into the market.

In addition to the rate at which a market changes, how precisely that change can be resolved is an important parameter. Measurement precision thus determines the magnitude of change that can be resolved with confidence, and therefore the marker's sensitivity to change (change-to-error ratio). Sources of precision error include interindividual variation, variability of the method used to acquire the images, and errors stemming from the actual measurement method used. Measurement precision in large multicenter trials is maximized by careful patient selection and use of homogeneous study populations. It is also important to use image acquisition methods that are widely available, stable over time and different equipment platforms, easy to perform, and well tolerated by patients.

In addition to expertise and experience in designing and implementing specialized imaging protocols for multicenter trials, specialized IAAs can help improve image quality and consistency in clinical trials. These IAAs include positioning devices designed to optimize patient positioning, minimize patient movement, and maximize reproducibility of serial examinations (Fig. 17). Other IAAs, known as "phantoms," are external standard references used to test and correct the stability of imaging equipment through out the duration of a trial, cross calibrate different equipment platforms used in the trial and occasionally to correct errors or limitations in the raw image data acquired at different sites (Fig. 18).

Finally, measurement precision error is minimized by centralized data management and image analysis using maximally controlled conditions and highly trained readers coupled with the most powerful image-processing and analysis tools. As discussed earlier, centralized reading can support more

Figure 17 Image acquisition aid for radiography of the knee in clinical trials of osteoarthritis. This Plexiglas frame (SynaFlex, Synarc) was designed to position the feet and knees properly and reproducibly for fixed-flexion radiography of the knees in clinical trials. Down the center of the frame is a phantom (arrows) that allows verification of the x-ray beam angle used and quantification of changes in magnification factor that can result from changes in equipment or human error. (Courtesy of Synarc, Inc. with permission.)

Figure 17 Image acquisition aid for radiography of the knee in clinical trials of osteoarthritis. This Plexiglas frame (SynaFlex, Synarc) was designed to position the feet and knees properly and reproducibly for fixed-flexion radiography of the knees in clinical trials. Down the center of the frame is a phantom (arrows) that allows verification of the x-ray beam angle used and quantification of changes in magnification factor that can result from changes in equipment or human error. (Courtesy of Synarc, Inc. with permission.)

complex and demanding scoring methods and quantitative analyses than would be feasible in clinical practice. Clinical practice typically demands rapid turnaround and therefore on-site readings or efficient teleradiology services. Readings for clinical trials, in contrast, are usually not needed until all of the patients have completed the study, and therefore readings can be done in batches by a remote central facility. Increased measurement precision can be traded for decreased study duration and the number of patients and sites required to test the hypothesis. In addition to the financial upside of early market entry for commercial products, reducing a clinical trial by 200 patients can save more than $1-2 million in direct costs.

Both the responsiveness of a marker and the precision error associated with measuring its rate of change affect longitudinal sensitivity. For a given technique, the smallest change detectable with 95% confidence is 2.8 times the precision

Figure 18 Importance of calibration phantoms in DXA. Longitudinal quality-control data from a DXA scanner showing an abrupt shift in calibration associated with failure of a system component. With use of a standardized bone mineral density test object or "phantom" the calibration of each densitometer in a clinical trial can be measured each day a subject is scanned or at a minimum of three times per week. By comparing these daily measurements against control limits, the DXA technologist can identify scanner problems and request service from the DXA manufacturer. By collecting, reviewing, and analyzing these data, the central radiology service can further evaluate scanner calibration history.

Figure 18 Importance of calibration phantoms in DXA. Longitudinal quality-control data from a DXA scanner showing an abrupt shift in calibration associated with failure of a system component. With use of a standardized bone mineral density test object or "phantom" the calibration of each densitometer in a clinical trial can be measured each day a subject is scanned or at a minimum of three times per week. By comparing these daily measurements against control limits, the DXA technologist can identify scanner problems and request service from the DXA manufacturer. By collecting, reviewing, and analyzing these data, the central radiology service can further evaluate scanner calibration history.

error (e.g., root-mean-square standard deviation for replicate measurements)[20]. To reach 80% confidence (two-tailed, or 90% one-tailed confidence), a change of only 1.8 times the precision error is needed [21]. This less stringent change criterion has been referred to as the trend assessment margin. The time interval required to reach either threshold is determined by dividing the change criterion by the responsiveness of marker (median rate of change per year). The shorter this follow-up interval, the greater the longitudinal sensitivity of the marker. Differences in longitudinal sensitivity among different markers can also be expressed in terms of precision error if corrected for difference in responsiveness. Thus, the standardized precision error for technique A can be expressed as its raw precision error divided by the response ratio of technique A relative to technique B (response rate A/response rate B) [21]. Response ratios are less cohort dependent than response rates because part of the cohort bias cancels out.

Accordingly, a marker that shows twice the precision error but four times responsiveness will still show half the standardized precision error.

Convenience and cost are factors that track with availability, examination time, patient tolerance, and ease of image data transfer, storage, and processing. It is important to optimize the cost-benefit ratio of each element of a study in the context of the overall development program, as some techniques with higher unit costs may actually help contain cost in other areas and/or yield greater benefit— not to mention save time, effort, and frustration.

Was this article helpful?

0 0
Treating Rheumatoid Arthritis With Herbs Spices Roots

Treating Rheumatoid Arthritis With Herbs Spices Roots

Did You Know That Herbs and Spices Have Been Used to Treat Rheumatoid Arthritis Successfully for Thousands of Years Do you suffer with rheumatoid arthritis Would you like to know which herbs and spices naturally reduce inflammation and pain 'Treating Rheumatoid Arthritis with Herbs, Spices and Roots' is a short report which shows you where to start.

Get My Free Ebook


Post a comment