TY - THES
AU - bibby, Yan
Y2 - 2020/03/26
Y1 - 2019
UR - http://hdl.handle.net/11343/235798
AB - Abstract
A common purpose of large-scale assessments is to describe the performance of populations of students, with the aim to refine education policies and strategies and improve education systems. Large-scale assessments are often administered to a sample of students to reduce costs. Large scale assessment may also utilise rotated test designs to reduce the burden of a long test on individual students.
The plausible value imputation technique was developed in the 1980’s to analyse NAEP 1983-84 results to overcome the issues of using individual student estimates to make inferences regarding population parameters. Since then the plausible value technique has been widely used in large-scale assessments as a computational approach for estimating population parameters. When first implemented, five imputations were considered sufficient for all settings. In recent years, however, many researchers have recommended increases to well beyond five imputations. In this dissertation, two simulation studies and two empirical data studies were conducted in order to investigate the relationship between the number of imputations and the properties of population parameter estimates.
The results from the simulation studies show that there is a small bias in population parameter estimates, increasing the number of plausible values used from 1 to 20 did not reduce the magnitude of the bias for the simulated conditions. The RMSEs of population parameter estimates showed a small decreasing trend when an increased number of plausible values were used. A similar decreasing trend was also observed in the estimated standard errors of population parameter estimates with an increase in the number of plausible values used. It was found that this decrease is related to the adjustment factor for infinite number of plausible values used in the standard error evaluation formula. More importantly, it was found that the decrease in the estimated standard error with an increase in the number of plausible values used tend to move the estimated standard error further from the expected standard error values. Therefore, this may not reflect true situation in practice.
The empirical data studies were carried out using the NAPLAN 2015 Numeracy data. The results showed that there was no consistent pattern observed in the parameter estimates when the number of plausible values used was increased from 1 to 46. The decreasing trend in the estimated standard errors was not observed in the empirical data studies when the number of plausible values used increased from 3 to 46. This means that the number of plausible values used, up to 46, does not improve the estimates of NAPLAN parameters, nor the standard errors of the parameter estimates. However, when sample sizes were small (N < 2,000), the parameter estimates using one or three plausible values showed a large difference from the estimates using five or more plausible values. The simulation and empirical studies both show that the test reliability and sample size have a much larger effect on optimising the population parameter estimates than the effect of increasing the number of plausible values used.
KW - large scale assessment, plausible values, item response model, population parameters, simulation study, estimate standard errors
T1 - Plausible values: How many for plausible results?
L1 - /bitstream/handle/11343/235798/52f06ea5-ece6-e811-9495-0050568d7800_Thesis_March2020-2.pdf?sequence=5&isAllowed=y
ER -