Authors: Qingping He; Chris Wheadon
Addresses: Office of Qualifications and Examinations Regulation (Ofqual), Spring Place, Herald Avenue, Coventry Business Park Coventry CV5 6UB, UK ' Assessment and Qualifications Alliance (AQA), Stag Hill House, Guildford, Surrey GU2 7XJ, UK
Abstract: One of the major factors affecting the stability and accuracy of parameters in item response theory (IRT) and the Rasch measurement models is the size of samples used to calibrate the items. This study investigates the effect of sample size on the stability and accuracy of model parameters of the partial credit model (PCM) for a large dataset generated from a high-stakes mathematics achievement test which consists of a mixture of dichotomous and polytomous items. Results obtained indicate that the level of stability and accuracy of item parameters is affected by the sample size, the number of categories of the items and the distribution of category scores within the items. It was also found that the actual measurement errors associated with model parameters for polytomous items estimated from operational test data are generally substantially higher than the theoretical model standard errors exported from the Rasch analysis software used.
Keywords: item response theory; IRT; Rasch measurement models; partial credit model; PCM; item calibration; sample size effect; parameter stability; parameter accuracy; test equating.
International Journal of Quantitative Research in Education, 2013 Vol.1 No.3, pp.297 - 315
Received: 07 Feb 2013
Accepted: 24 May 2013
Published online: 13 Nov 2013 *