Background
The
promise of Alzheimer’s disease (AD) biomarkers has led to their
incorporation in new diagnostic criteria and in therapeutic trials;
however, significant barriers exist to widespread use. Chief among these
is the lack of internationally accepted standards for quantitative
metrics. Hippocampal volumetry is the most widely studied quantitative
magnetic resonance imaging (MRI) measure in AD and thus represents the
most rational target for an initial effort at standardization.
Methods and Results
The
authors of this position paper propose a path toward this goal. The
steps include: 1) Establish and empower an oversight board to manage and
assess the effort, 2) Adopt the standardized definition of anatomic
hippocampal boundaries on MRI arising from the EADC-ADNI hippocampal
harmonization effort as a Reference Standard, 3) Establish a
scientifically appropriate, publicly available Reference Standard
Dataset based on manual delineation of the hippocampus in an appropriate
sample of subjects (ADNI), and 4) Define minimum technical and
prognostic performance metrics for validation of new measurement
techniques using the Reference Standard Dataset as a benchmark.
Conclusions
Although
manual delineation of the hippocampus is the best available reference
standard, practical application of hippocampal volumetry will require
automated methods. Our intent is to establish a mechanism for
credentialing automated software applications to achieve internationally
recognized accuracy and prognostic performance standards that lead to
the systematic evaluation and then widespread acceptance and use of
hippocampal volumetry. The standardization and assay validation process
outlined for hippocampal volumetry is envisioned as a template that
could be applied to other imaging biomarkers.
Future steps:
- Although a position paper is a first step, the objective of standardizing hippocampal volumetry as an AD biomarker will require active participation by stakeholders in academia and industry. The authors’ objective is to see hippocampal volumetry evolve from its current state, a measure that is valid only in specific studies or a single institution, to a universally accepted biomarker with standardized units of measure. In some cases, this could simply involve having developers of automated measurement tools directly import the EADC-ADNI anatomic definition of the hippocampal boundaries into the atlas of the automated application.
- Standardizing single time point hippocampal volume as an AD biomarker is the most logical and readily achievable initial goal; however, the authors recognize that other more complex topographic structural MRI measures might be more specific, or ultimately more powerful. The major difficulty here is identifying an appropriate reference standard if an anatomically based classifier does not conform to the boundaries of a classically defined anatomic structure as the hippocampus does.
- Longitudinal change measures on structural MRI should be standardized using the approach outlined above as a template. This could include an extension of the EADC-ADNI effort to include expert manual tracing of serial hippocampi to create a longitudinal reference standard dataset using the same model as the single time point dataset proposed in this position paper.
- FDG PET, amyloid PET imaging, and possibly other MRI modalities (e.g., resting state functional connectivity, diffusion tensor imaging, and arterial spin labeled perfusion imaging) are also important imaging biomarkers for AD. Pursuing standardized quantitative metrics for these imaging modalities is a high priority. The efforts to standardize, validate and evaluate quantitative measures in these modalities could roughly follow the same approach outlined above for hippocampal volume.
- For all imaging biomarkers, future efforts will need to focus on developing a quantitative score to allow the assessment of individual imaging biomarker measures against well-developed norms that incorporate other appropriate covariates, such as age, sex and head size are for the hippocampus (91, 92).
- To optimize the use of biomarkers in new AD diagnostic criteria - future efforts will need to focus on establishing diagnostic cut points in the continuous range of quantitative values to identify normal, abnormal, and indeterminate levels in individual subjects. For use in clinical practice, quantitative metrics will need to be developed and then tested in clinically typical and representative populations. Diagnostic biomarkers in AD should function analogously to those in other diseases where, for example, cut points in the continuous range of blood pressure and fasting serum glucose are universally recognized as useful in aiding the diagnosis of hypertension and diabetes and standardized treatment protocols are based on these biomarker cut points. For the purposes of diagnosis in typical clinical settings, cut points should be derived from carefully characterized groups of subjects chosen in such a way that the results can be generalized to the overall population. For example, ADNI subjects were selected to represent a typical AD clinical trial, with specific inclusion/exclusion criteria. Thus the results from ADNI are not generalizable to the overall population and are not optimal to generate normative data for general diagnostic purposes. Selecting meaningful diagnostic cut points is complicated by the fact the many cognitively normal elderly subjects harbor significant AD pathology. Thus the definition of normal is not straightforward. Consensus guidelines have been established for evaluating and reporting the clinical utility of diagnostic biomarkers and should be followed in studies using the results of the assay validation steps described here. In clinical settings, the sensitivity of detecting AD should exceed 80% and specificity for distinguishing AD from other similar dementias also should exceed 80% (94). Standardized reporting of results should follow STARD criteria (95) and for clinical settings additional reporting criteria to demonstrate pragmatic utility are needed (96).
1. Introduction
A
biomarker is a physiological, biochemical, or anatomic parameter that
can be objectively measured as an indicator of normal biologic
processes, pathological processes, or responses to a therapeutic
intervention (1).
Biomarkers used in the Alzheimer’s disease (AD) field include both
imaging measures and biofluid analytes. Biofluid analytes in this
context can refer to proteins in any biofluid, however cerebrospinal
fluid (CSF) biomarkers are presently the most well developed (2). The five most widely studied biomarkers in AD can be divided into two major categories: 1) Biomarkers of cerebral Aβ amyloid
accumulation - these are increased radiotracer retention on
amyloid-tracer based positron emission tomography (PET) imaging and low
CSF Aβ 1-42, and 2) Biomarkers of neuronal degeneration or injury
- these are elevated CSF tau (both total and phosphorylated tau);
decreased fluorodeoxyglucose (FDG) uptake on PET in the temporo-parietal
cortex; and brain atrophy in the medial, basal and lateral temporal
lobes and the medial and lateral parietal cortices determined from
structural magnetic resonance imaging (MRI) or computed tomography (CT) (3).
Three of these five major AD biomarkers are imaging measures and
imaging is the primary focus of this position paper. Biomarkers are
increasingly important in AD in two contexts: clinical
diagnosis/prognosis and therapeutic trials.
Criteria for the clinical diagnosis of AD were established in 1984 (4).
These criteria have been widely adopted, validated against
neuropathological examination in many studies, and are still used today.
A consensus now exists, however, that diagnostic criteria for AD should
be updated to reflect the scientific advances of the past quarter of a
century. One of most important of these advances is the development of
biomarkers for AD. This recognition has inspired recent efforts on
several fronts to revise diagnostic criteria for AD. The two most
well-known such efforts are those of Dubois et al (5, 6) and the National Institute on Aging (NIA)-Alzheimer’s Association (AA) (7-10).
The NIA-AA commissioned three work groups to revise diagnostic
criteria. Each was assigned the task of defining or revising criteria
for one of three recognized phases of the disease: pre-clinical or
asymptomatic AD, symptomatic pre-dementia or mild cognitive impairment
(MCI), and the AD dementia phase (7-10).
Biomarkers providing evidence of in situ AD pathophysiology are
employed in the revised definitions of AD in all three phases of the
disease by the NIA-AA and are also included in the criteria of Dubois et
al (5, 6).
The
second major use for biomarkers of AD is in clinical trials, where
biomarkers can be employed for several distinct purposes. As an
indicator of AD pathophysiological processes, AD biomarkers may be used
for subject inclusion/exclusion – to ensure study subjects are
appropriate for targeting of the therapeutic mechanism of action or as
an enrichment strategy to improve efficiency of therapeutic trials (2, 11).
Biomarkers also provide a biologically-based measure of disease
severity. They can be used as a covariate in outcome analyses and as
safety measures. Finally, an important application of AD biomarkers in
clinical trials is as outcome measures, in which an effect on the
biomarker is sought as evidence of modification of the underlying
pathological AD process (12-21).
However, since AD pathophysiology is increasingly being recognized to
be very complex and multifaceted, effects of candidate drugs on some
individual pathophysiological aspects of AD may not necessarily be of
functional or cognitive relevance. Therefore, increasing efforts are
being spent on developing biomarkers which could serve as surrogate
endpoints in clinical trials, accurately predicting and reflecting
clinically significant outcomes (2, 22)
Biomarkers are more objective and reliable quantitative measures of AD
pathophysiological processes than traditional cognitive and functional
outcomes that are affected by subject motivation and extrinsic factors
such as alertness, environmental stresses, and informant mood and
distress.
The evaluation of the value of biomarkers is
different for therapeutic trials than for clinical diagnosis, but the
rationale and methods to standardize and validate the reliability of the
measures are very similar. Moreover, if an imaging biomarker is used as
an inclusion criterion for subjects participating in a clinical trial
of a compound that subsequently achieves regulatory approval, then it is
possible, some would say likely, that regulators will require the same
biomarker must be approved as a diagnostic to identify patients that are
suitable for treatment. This would then require that the biomarker, in
our case imaging, be easily implementable in clinical imaging facilities
world-wide. Therefore, although requirements in terms of precision and
sensitivity to pathology may vary, issues pertaining to standardization
of an imaging biomarker for use in clinical trials and for clinical
diagnostics are inextricably interwoven.
The
potential value of quantitative imaging biomarkers for both clinical
diagnosis and clinical trials is clear, but major barriers exist to
widespread acceptance and implementation. The most substantive barriers
have been the lack of standardized methods for 1) image acquisition, 2)
extraction of quantitative information from images, and 3) linking
quantitative metrics to internationally recognized performance criteria.
These in turn have impeded the establishment of cut points in the
continuous range of quantitative values that can be used in diagnosis
and evaluating change in clinical trials. Standardization of image
acquisition for structural MRI and PET scans has been a major focus of
the Alzheimer’s Disease Neuroimaging Initiative (ADNI) project (23, 24) and ADNI acquisition protocols have become the de facto
standard for clinical trials and could be applied clinically. On the
other hand, little progress has been made in the standardization of
techniques for quantitative image analysis, either in ADNI or in the
field in general. This is particularly true for MRI where the lack of
standardization has led to publication of values that are highly
disparate across the literature. For example, greater than two-fold
differences in hippocampal volume of cognitively normal elderly subjects
have been reported from different centers (25).
This is unlikely to have a basis in biology and is almost certainly due
to inter-center differences in the measurement tools and the anatomical
protocols for delineating the hippocampus. Likewise, a strong
methodological dependence is evident in published rates of hippocampal
atrophy. Three-fold differences in rates of hippocampal atrophy have
been reported in elderly controls as well as wide variations in
apparently similar cohorts of AD patients (26). For example, Du et al (27) reported annualized rates of hippocampal atrophy in healthy elderly controls mean age 77 of 0.8%/yr; Jack et al (28) in controls age 78 of 1.4%/yr and Wang et al (29)
mean age 73 of 2.3%/yr. This strong dependence upon the method used and
its specific implementation undermines the credibility of the results.
Both newly proposed diagnostic criteria explicitly point out that
extensive work on imaging biomarker standardization is needed prior to
widespread adoption for diagnostic purposes.
2. Why hippocampal volume?
Qualification
or general acceptance of the validity of a biomarker in clinical trials
must rest on a well-established body of evidence beginning with
widespread agreement that there is clinical significance to the result
of the biomarker and that it can be measured with appropriate accuracy
and reproducibility. Quantitative measurement of hippocampal volume
fulfills these basic criteria. The advantages of hippocampal volume as a
target for an initial standardization and assay validation exercise
are: 1) The hippocampus is an anatomically defined structure with
boundaries that are visually definable in a properly acquired MRI scan.
2) The hippocampus is involved early and progressively with neuronal
loss and neurofibrillary tangles, which is one of the primary hallmarks
of AD pathology (30).
3) A large imaging and pathology literature provides evidence that loss
of hippocampal volume is significant in AD. Numerous studies have shown
the association of hippocampal atrophy with neurodegenerative pathology
at autopsy (31-36), with clinical diagnoses of AD or MCI (37-43), and with the severity of cognitive disorders and episodic memory deficits due to AD pathophysiology (44, 45).
In addition, longitudinal measures of change in hippocampal volume both
predict the future cognitive decline and correlate with contemporary
indices of clinical decline (46, 47), and quantitative measures of the hippocampus predict progression from MCI to AD (48-63).. 4) Fully automated software tools are now available that can measure hippocampal volume efficiently and reproducibly (21, 37, 58, 64-71). Visual rating (72-74),
while convenient and currently used in some diagnostic settings, does
not lend itself to detecting subtle size differences, lacks precision
relative to quantitative methods, and does not take advantage of the
power of current technology. Formal computer-aided manual tracing of the
entire hippocampus was introduced over two decades ago to aid in
seizure lateralization (75).
Although manual hippocampal tracing has been effective for research
studies in different diseases, and still serves as the best available
Reference Standard measure of the hippocampus on MRI (76),
it is time consuming and requires highly trained operators. Thus it is
not feasible in routine clinical practice and due to its expense it is
impractical in clinical trials. Fully-automated hippocampal volumetry
using standardized methods would be a practical alternative to manual
methods. Automated hippocampal volumetry has successfully enabled the
discovery of novel genes associated with hippocampal volume in over 7000
subjects scanned at multiple internationally distributed sites. This
result supports the assertion that such methods can be efficiently and
reproducibly applied on a worldwide scale (77). Furthermore, software methods that employ within-subject registration permit sensitive measures of volume change over time (51, 78).
5) While more complex MRI measures of disease-related atrophy
consisting of combinations of multiple regions of interest (ROI) might
have superior diagnostic properties compared to hippocampal volume (79-84),
the analysis of hippocampal volume is less complex than multi-ROI
approaches so a reference standard is easier to generate. Specifically,
the hippocampus can be delineated by hand, but the disease signatures of
more complex analytic methods are a result of training and machine
learning methods that would present a further challenge to validate, and
are likely to evolve over time.
Further supporting
hippocampal volumetry as a target for initial AD imaging biomarker
standardization and assay validation is the fact that clinical
guidelines in many countries (85, 86)
dictate that all patients investigated for cognitive impairment should
undergo structural brain imaging to exclude treatable causes such as
tumors and hematoma. An MRI acquisition sequence that would permit
quantitative analysis of hippocampal volume is easy to include in a
routine clinical MRI examination, only lengthens the exam by a few
minutes, and is currently considered to be an essential part of a
clinically diagnostic imaging protocol at some centers. Moreover, a
significant effort has already been expended to standardize acquisition
parameters for the high resolution 3D anatomical MR imaging sequence
needed for quantitative volume measures across MRI vendors in the ADNI
study (23).
The ADNI 3D T1 anatomical sequence used for volumetric measurements can
be performed in a standardized manner in an overwhelming majority of
imaging centers worldwide. Finally, there is an ongoing international
initiative led by one of the co-authors (GBF) to establish a Reference
Standard in hand-drawn hippocampal volumes, which is the European
Alzheimer’s Disease Centers (EADC) – ADNI Hippocampal Harmonization
Effort (87, 88).
The
issue of validating imaging biomarkers for AD has recently drawn the
attention of non-profit organizations, including the Radiological
Society of North America (RSNA) and the Coalition Against Major Disease
(CAMD). CAMD is part of Critical Path Institute a nonprofit public
private partnership dedicated to more efficient drug development.
Qualification of hippocampal atrophy for use in clinical trial
enrichment is being pursued by CAMD with the US Food and Drug
Administration (FDA) and European Medicines Agency (EMA). At a meeting
of The Radiological Society of North America Quantitative Imaging
Biomarkers consortium in September, 2010 a work group was convened to
address the issue of standardizing quantitative imaging of AD. Among the
candidate imaging modalities discussed, measures of hippocampal volume
on structural MRI were identified as the most widely used in the context
of multicenter clinical trials, and therefore were the most obvious
candidates for an initial (exemplar) effort to standardize quantitative
imaging biomarkers. This position paper follows from the recommendations
of this RSNA work group.
3. Biomarker development
In general terms, three separate steps are required for biomarker development: 1) Assay validation
(also called technical or analytical performance validity) to show
that, when following defined standardized procedures, the biomarker can
be measured precisely and accurately compared to a reference standard (89), 2) Clinical Validation to establish that the biomarker has value for a specific intended task and context of use, and 3) Qualification
of the biomarker with the appropriate regulatory agencies based upon
wide-spread consensus that the biomarker is “fit for purpose” for a
particular use. Each proposed task (e.g., diagnostic, prognostic,
outcome) needs to be considered separately. Qualification of a biomarker
for clinical trials may be a stepping stone to a qualification for its
use as a clinical diagnostic. However, the use of a biomarker in
clinical diagnosis is distinct from its use in therapeutic trials, and
development may focus on one or the other first. The use of a biomarker
in clinical trials is at the discretion of the trial sponsor, but
mechanisms have been introduced by which regulatory bodies (e.g., the US
Food and Drug Administration Center for Drug Evaluation and Research,
FDA CDER; or the European Medicines Agency EMA) qualify biomarkers for
use in clinical trials. The use of a biomarker for clinical diagnosis
requires regulatory approval in the relevant jurisdiction (e.g.,
approval by FDA Center for Devices and Radiological Health, CDRH, in the
USA; or CE marking in Europe), and may separately also require approval
from healthcare funders for reimbursement.
4. Steps to standardization and validation of hippocampal volumetry as a biomarker of AD
Below
we outline the steps of a proposed work plan that would lead to
standardization of quantitative (automated or manual) hippocampal
volumetry as a biomarker for AD in evaluative studies in the context of
clinical trials and for diagnosis.
- Establish an Oversight Board to manage the effort and empower this body with authority to make decisions necessary to assess the results as outlined below. The Oversight Board should have the following attributes: a) include all necessary areas of expertise, b) be unbiased, c) represent both academia as well as industry, and d) be international. All potential conflicts of interest must be fully disclosed. Our recommendation is that this oversight board be linked to the Alzheimer’s Association.
- Identify a standardized definition of anatomic hippocampal boundaries on MRI with the assistance of expert neuroanatomists for use as a Reference Standard. Anatomic boundary criteria should be acceptable to the international scientific community and consistent with use in all neuroscience disciplines. We recognize that for hippocampal volume measures to be widely used diagnostically in clinical practice and in clinical trials, automated techniques are essential. However, manual tracing of the hippocampus using a consensus-from-experts approach in accordance with a standardized definition provides the most effective Reference Standard to evaluate automated methods. Expert opinion is an accepted method to create a reference standard. This is preferable to the alternative, arbitrarily picking one automated method and anointing it as the Reference Standard, which would be problematic. Because most, if not all, automated techniques rely on some a priori anatomical notion of hippocampal boundaries, such an arbitrary approach would not reflect a consensus from the scientific community as a whole and would not result in a Reference Standard with broad-based support from all stakeholders. Since an international effort is currently in place with precisely this aim, leveraging the work of the EADC-ADNI Hippocampal Harmonization effort (87, 88) is the most logical and practical approach. The Reference Standard recommended by the authors of this position paper is therefore the manual hippocampal tracing of ADNI subjects who will be developed by the EADC-ADNI effort.
- Establish a Reference Standard Dataset based on manual delineation of the hippocampus in accordance with the standardized definition. The Reference Standard Dataset should have the following attributes:
- All subjects in the reference database must have given informed consent for public access under an ethics board-approved protocol. Compliance with relevant privacy legislation to the jurisdiction where the data were collected, and permission of a research ethics committee for use of the data should be obtained. In the US, the relevant guidelines are those of the Health Insurance Portability and Accountability Act (HIPAA); however, other jurisdictions will have different regulations.
- Access to the database must be straightforward, open, and readily available.
- Appropriate subjects, in clinical characteristics and number, must be included in the reference database – in this case, elderly cognitively normal control, MCI and AD subjects diagnosed according to internationally recognized diagnostic criteria.
- MRI scans must have been acquired with a standardized protocol that is amenable to widespread use.
- Appropriate clinical meta-data must be linked to the MRI scans and readily available to users – i.e., demographics, clinical diagnosis, basic neuropsychology, and longitudinal clinical course. The subjects, 3D volume T1-weighted images, and clinical data of ADNI represent a data set that meets these criteria. The authors recommend that the EADC-ADNI harmonization traces or masks of the 1.5T ADNI MPRAGE data serve as the hippocampal volume Reference Standard Dataset.
- Extend the Reference Standard Dataset to enable a thorough evaluation of technical aspects of MR acquisition on measurement performance. This includes the effects of MR vendor, receiver coil type, accelerated acquisition methods, and field strength. Although the EADC-ADNI harmonization plan focuses are on 1.5T data, a significant portion of neuroimaging in the future will be performed at 3T, with acquisition acceleration, and with increasingly complex coil arrays. The potential effects of these technical advances on measurement standardization should be investigated (90).
- Split the complete sample of traced hippocampi into balanced training and test data sets for assessing the technical performance characteristics of new analysis methods. This would enable automated methods to be trained on a portion of the reference data and then test performance against an independent subset of the reference data. Careful attention to the composition of these subsets is important so that age, gender or clinical variables are not inadvertently unbalanced.
- Develop standards for reporting measurement units including a standardized approach for normalization of raw hippocampal volume measures. This will include defining correct measures of head size through standardization of intracranial volume measures. In addition to disease severity, hippocampal volume is affected by other variables that are easily ascertained such as age, sex, and head size (taller people tend to have larger brains and thus larger intracranial volume) (91). Experience indicates that normalization of raw hippocampal volumes for these descriptive or confounds variables improves the performance of hippocampal volumetry in evaluation studies, and thus recommendations for standardized normalization procedures for adjusting raw hippocampal volumes (e.g., by head size, age, sex) in the reference data set will be necessary.
- Define minimum technical performance metrics as benchmarks to judge new analysis methods (89). At a minimum these metrics should include:
- Accuracy with respect to the manually traced Reference Standard Dataset. We note that automated techniques will likely not precisely match a manually traced Reference Standard. However, a straightforward mathematical transformation of the output an accurate automated algorithm to match the reference standard should be possible. Criteria would need to be set as to how close the automated method would have to match the manual tracing in order for it to be credentialed by the oversight board.
- Test/re-test precision. This would include not just numeric precision at the volume level, but also more exacting indices of area/pixel overlap such as Dice coefficients.
- Compliance with regulatory requirements (Good Clinical Practice (GCP), FDA 21 CFR part 11, EU GMP Annex 11 on Computerized Systems) for any computer systems running these algorithms.
- Define minimum prognostic performance metrics for new analysis methods based upon benchmarks established from Reference Standard Dataset: We recommend metrics that predict conversion from MCI to AD within 24 months, progression of dementia severity at 24 months in patients with AD, and maintenance of normal cognition at 24 months in cognitively normal subjects (sensitivity, specificity, positive and negative predictive value, ROC analysis). This will serve as further assay validation for new analysis methods.
- Empower the oversight board to oversee credentialing of applications for analysis methods. While the Reference Standard Dataset can be used to credential new manual tracers, its primary use is envisioned as a means of validating and credentialing automated hippocampal quantification methods for use in therapeutic trials and for new clinical diagnostic criteria. The board could also make context of use recommendations based on limitations identified during the evaluation of a particular method. In order for a potential hippocampal volume measurement application to be credentialed by the oversight board it would have to meet established technical and prognostic performance benchmarks using the reference data set described above.
Ideally,
the work plan would follow the timeline above where initial steps would
focus on establishing the reference standard of manual hippocampus
traces, generating a standardized approach to volume normalization and
benchmark performance metrics. Once the reference standard is
established, then the focus likely would be on evaluation studies and
qualifying the reference standard with the FDA and EMA for diagnostic,
prognostic and outcome use in clinical trials. Standardized acquisition
of MRI scans suitable for hippocampal volumetry are already widely
performed and support from the pharmaceutical industry is likely.
Subsequently, we expect evaluation studies will be conducted to show the
diagnostic value of hippocampal volumetry use outside the context of
clinical trials. We wish to emphasize that the intent of this position
paper is not to stifle existing alternative methods or innovative
development of new methods, but rather to facilitate the development of
widely available implementations of automated hippocampal volumetry
methods, and to serve as a template for an initial effort which can then
be used for other imaging biomarkers.
5. Illustration
As
an example illustrating the approach discussed above we identified 373
ADNI subjects diagnosed as MCI at baseline who qualified for an analysis
of time to progression to AD. Of the 397 ADNI subjects diagnosed as MCI
at baseline, 16 had no follow-up visits, and 8 failed quality control,
leaving 373 for this analysis (Table 1). A list of the ADNI subject ID numbers used in the example MCI analyses is included as a Supplement.
All subjects had hippocampal volume measured in three ways, labeled
Methods A, B and C here. In this exercise, we considered Method A to
represent the Reference Standard Dataset, and assessed Methods B and C
in two ways: technical performance accuracy relative to the Reference
Standard Dataset and prognostic performance in predicting conversion
from MCI to AD at 2 years post baseline. While the data presented below
are real, and not hypothetical, the specific methods are left undefined
because we do not wish to have this position paper misconstrued as
evidence that the authors endorse a particular method for credentialing.
Of
the 373 patients, 166 progressed from MCI to AD during follow-up and 8
progressed to non-AD dementia based upon clinical criteria. We also
examined a subset of 313 subjects that either progressed to AD at or
prior to the 24 month visit (n=135) or had available follow-up through
the 24 month visit without progressing to AD (n=178) to evaluate
differences in hippocampal volume for those that progressed at 24 months
vs. those that remain stable. Subjects who progressed to non-AD
dementia at or before 24 months were excluded from this analysis.
Method
B potentially meets two major criteria for credentialing – it is highly
accurate in the group-wise and individual measurement of hippocampal
volume relative to Method A as shown in the table and scatter plots, and
it also has essentially identical performance in predicting conversion
from MCI to AD (Fig. 1, Table 2).
Method C has a similar prognostic performance in predicting conversion
to AD as Method A as shown in the ROC analysis, but in its current form
might not meet technical accuracy criteria relative to the reference
standard dataset. This is how we would envision the credentialing
process would proceed for most automated applications, with the
EADC-ADNI harmonization data set of manually traced hippocampi serving
as the Reference Standard Dataset and the oversight committee setting
predetermined minimal benchmark criteria to judge the performance of
individual methods.
Scatterplots of hippocampal volume (cm3) by method. Spearman correlations and p-values are shown for each pair.
ROC Curves Comparing Prognostic Performance of Methods A, B, and C for Progression from MCI to AD within two years
One
important feature of the process for critically evaluating automated
hippocampal segmentation algorithms is the failure rate. For a variety
of reasons, usually related to poor scan quality, automated algorithms
will fail to produce a plausible result in some proportion of cases in a
study. Taken to the extreme, imagine, for example, a method that
produced perfect predictive results in cases that underwent successful
hippocampal segmentation, but the method failed in 99% of the time. The
method would score quite well on prognostic metrics, but would not be
practical. A fair and objective approach therefore is needed to penalize
automated segmentation algorithms that fail in an unacceptably high
proportion of cases.
No comments:
Post a Comment