Reliability and validity of the list learning paradigm and implications for clinical memory assessment
AffiliationMelbourne School of Psychological Sciences
Document TypePhD thesis
Access StatusThis item is embargoed and will be available on 2019-10-06.
© 2017 Dr. Sara Fratti
The free-recall multi-trial paradigm of verbal list learning is conventionally used to detect episodic memory deficits in clinical neuropsychology practice. The widespread use of the test occurs despite evidence of poor precision. The focus of the current thesis was to improve the reliability of the list learning test. The need for improved precision in memory measurement was also addressed in the context of novel methodologies such as computerized testing and diagnostic validity research. Three experimental studies and a brief systematic review with meta-analysis were conducted to achieve a better estimation of verbal memory assessment in clinical practice. In STUDY 1, the reliability of the conventional five-trial structure of the Rey Auditory Verbal Learning Test (RAVLT, Rey 1964; Schmidt, 1996) was modelled. A Quasi-Markov simplex, which is a structural equation auto-regressive model specific to the alternative formulation of the reliability coefficient, was applied to a normative dataset of 398 participants. Model fitting results indicated that long-term memory, corresponding to the learning efficiency trait (Gl) under the Cattell-Horn-Carroll (CHC) model of cognition, was mostly captured by the first two trials of the RAVLT, with subsequent trials adding little to reliability and Gl estimation. Results of STUDY 1 provided the basis for the development of STUDY 2, where the aim was to examine the psychometric quality of an experimental list learning test of multiple lists of two trials only. Two samples of participants were prospectively selected. One sample of n = 119 clinical participants was recruited from the Victorian Comprehensive Epilepsy Program at St Vincent’s Hospital, Melbourne, and a second sample, including n = 89 student participants, was selected from a tertiary education population. The experimental list learning test was administered twice and compared against established and experimental neuropsychological tests. Correlational analyses were performed to obtain reliability and construct validity estimates. Overall, the experimental administration of the list learning test showed better retest-reliability coefficients (rxx = .61 to .79) than the RAVLT (e.g., rxx = .26 to .64; Cairstairs, Myors & Shores, 2012), with moderate-to-large convergent validity correlations with standard tests of long-term memory ability and small-to-moderate discriminant validity correlations with tests of working memory. Despite promising data, the reliability of the experimental word list was still considered inadequate for the purpose of individual clinical assessment. A subsample of the seizure disorders participants (n = 80) and the full sample of student participants (n = 89) took part in an additional assessment (STUDY 3) involving the administration of two computerized memory tests from the CogState Computerized Cognitive Battery. Correlational analyses and calculation of Reliable Change Indices (RCIs) on individual performance were used to interpret results. Similarly to the results in STUDY 2, the CogState subtests showed inadequate retest reliability coefficients (rxx = .49 to .77) for clinical assessment purpose. Small-to-medium construct validity correlations with conventional tests of learning and working memory were also found. Discrepancies in RCIs interpretations were observed when different test parameters and reliabilities were used. In particular, the Within-Subject Standard Deviation (WSD) index was found to substantially increase the rate of Type I error when test reliabilities were low. Finally, the diagnostic validity of the list learning test was addressed in STUDY 4. Results from a brief systematic review and meta-analysis of the California Verbal Learning Test (CVLT, CVLT-II and CVLT-C; Delis, Kramer, Kaplan & Ober, 1987, 2000) were reported. The study reviewed the CVLT sensitivity, specificity and the reporting of confidence intervals across various scores from 25 studies published between 2002 and 2016. A meta-analysis examining the diagnostic accuracy of the score most consistently reported, the forced choice recognition, was also performed. The methodological quality of all 25 studies selected was rated according to the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2; Whiting et al., 2011) criteria. Results from STUDY 4 indicated moderate-to-high specificity (75% or above) and moderate-to-low sensitivity (75% or below) across all CVLT scores; however, reporting of confidence intervals was missing in all 25 studies reviewed. Calculation of confidence intervals would have altered interpretation of results in the sensitivities of 22 studies (88%) and in the specificities of five studies (20%) due to the considerable uncertainty associated with the wide confidence limits obtained (95% confidence intervals [CI]: 30% to 80%). Meta-analytic results of the CVLT forced choice recognition score indicated optimal specificity (Hedges’ g = 93%, 95% CI: 90% to 94%) but mediocre sensitivity (Hedges’ g = 45%, 95% CI: 38% to 52%). Although the forced choice recognition score may be more accurate for the detection of true positive rates, extremely low sensitivity may produce a high rate of false negatives. In terms of methodological quality, lack of clarity with respect to blinding and use of well-validated reference standard appeared to be the major methodological limitation of most of the studies reviewed. In conclusion, the current thesis highlighted the need for better memory tests to accurately assess verbal episodic memory functions. Irrespective of the structural variation of the list learning test, the list learning psychometrics could not be improved to acceptable standards in a way that is feasible for clinical assessment purposes. Use of word list tests in neuropsychology practice should therefore be discouraged to avoid misleading clinical interpretations. In a similar way, use of the CogState computerized memory measures prior to rigorous independent validation of their psychometrics may lead to incorrect clinical inferences. With regards to the diagnostic validity of the CVLT, reporting of confidence intervals around sensitivity and specificity estimates was ignored and the presence of methodological flaws undermined the true diagnostic utility of the CVLT scores. Overall, adequate knowledge of psychometric theory should become a priority for every clinician interested in selecting the best available testing resource for clinical memory assessment. The current thesis showed that new and more precise memory measures are needed in neuropsychology practice. Clinicians who choose to use tests with low reliabilities need to do so with understanding of the resultant limitations in diagnostic precision.
Keywordslist learning; word list; reliability; validity; memory assessment; neuropsychology
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References