Please note that server maintenance will be carried out on 24 October 2021 (Sunday) from 8 am to 11:59 am (BST). Website is inaccessible during the maintenance hours. We apologise for any inconvenience caused. If you
have any question or concern, please contact supportinderscience.com.
Forthcoming and Online First Articles
International Journal of Quantitative Research in Education
Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.
Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
International Journal of Quantitative Research in Education (5 papers in press)
The Impact of the Difficulty Level of Compromised Common Items on IRT Scaling and Equating under the Common-Item Equating Design by Moatasim A. Barri Abstract: The common item equating design requires two forms of a test which have a set of items in common in order to control for differences in examinee ability. The common set is subject to compromise when it is used repeatedly, which most likely becomes a serious threat to test fairness. If cheating occurs on common items, the equating process produces inaccurate results which might vary as a result of common item difficulty. This simulation study was conducted to evaluate the impact of the difficulty level of compromised common items on the equating process. The recovery of scaling coefficients and equated scores was assessed using bias and RMSE under
various cheating conditions. The results indicated that cheating on higherdifficulty common items produced the most overestimation in the scaling coefficients; which, in turn, caused the most inflation in equating true scores for all test takers, whether they engage in cheating or not. Keywords: item response theory; linking; equating; common items; test compromise. DOI: 10.1504/IJQRE.2020.10030357
Gender Fairness in Immigration Language Testing: A Study of Differential Options Functioning on the CELPIP-G Reading Multiple-Choice Questions by Amery Wu, Minjeong Park, ShunFu Hu Abstract: The CELPIP-G test is used by the Canadian federal government to screen immigration eligibility for the skilled worker class. Differential option functioning is a technique used to detect potential bias in the options of multiple-choice items. The purpose of this paper is to investigate DOF in a CELPIP-G reading test form by way of multinomial logistic regression. The results showed that 13.7% of options were flagged as gender DOF. Nonetheless, 11.2% were negligible or small DOF. In the case of uniform gender DOF, twice as many options were found to function against female immigration applicants than against their male counterparts. Female test-takers were more likely to be disadvantaged when tackling questions that asked them to make direct inferences based on factual but unfamiliar information. In contrast, male test-takers were more likely to be disadvantaged when tackling questions that asked them to develop their own interpretations over different views. Moreover, test questions that required an understanding of more sophisticated ideas in complex language structure and allowing personal interpretation tended to show more marked and non-uniform gender DOF. Keywords: CELPIP-General; differential options functioning; differential distractor analysis; measurement bias; immigration; test fairness; language testing; multinomial logistic regression; multiple-choice questions; reading comprehension; gender bias.
Comparing the Normalized and 2PL IRT Scoring Methods on Multiform Examinations by Aolin Xie, Serina Chiu, Keyu Chen, Gregory Camilli Abstract: This study compared the candidates scores based on the normalized model and the two-parameter Item Response Theory (2PL IRT) model using simulated multiform exam data. Candidates calculated scores, rankings, qualification status and score ties from the two models were compared with their true values. The results suggest that the 2PL IRT model outperformed the normalized model when the candidate ability distributions varied across forms. It was found that candidate scores based on the 2PL model were more closely related to the true scores. The qualification status of candidates belonging to the top 10% group were more accurately classified by the 2PL model than the normalized model with group abilities differed. Keywords: 2PL IRT model; normalized model; multiform exam; equating; candidate classification;.
The suitability of similarity measures to the grading of short answers in examination by Okure U. Obot, Samuel S. Udoh, Kingsley F. Attai Abstract: Grading of short answers in an examination is a tedious exercise that takes so much of examiners time. Fatigue could set in leading to errors. Sometimes sentiments come into play. The attendant effect of this is variations in the marks awarded to candidates even when they express the same opinion. In this study, Jaccard, Cosine, Jaro and Dice similarity measures were used to grade the answers provided by candidates in examinations of 647 questions. The similarity measures were tested with the aim of ascertaining the measure that rank closest to the average scores provided by three human examiners with the same examinations answers and marking guides. Results showed that Jaro similarity measure ranked closest to the mean score of the examiners with a variance absolute error of 0.62% and covaried strongly by 97% with a significant level of 0.001. Keywords: Jaro; Jaccard; Cosine; Dice; examination; short answers; similarity measures; semantics; lexical; natural language processing; NLP. DOI: 10.1504/IJQRE.2021.10038454
DETECTING GENDER-BIASED ITEMS IN A HIGH-STAKES LANGUAGE PROFICIENCY TEST: USING RASCH MODEL MEASUREMENT by Soodeh Bordbar, Seyyed Mohammad Alavi Abstract: The consequential aspect of validity interprets the real and potential consequences of a test score, particularly when it comes to sources of invalidity related to the conceptions of fairness, bias, injustice, and inequity. Differential Item Functioning (DIF) analyzes the test items to evaluate test fairness and validity of educational tests. Besides, gender is mentioned as one of the elements that frequently acts as a source of construct-irrelevant variance. If gender imposes a large influence on the test items, it will bring about bias. In an attempt to explore validity and DIF analysis, the present study explores the validity of a high-stakes test and considers the role of gender as a source of bias in different subtests of language proficiency tests. To achieve this, the Rasch model was used to inspect biased items and to examine the construct-irrelevant factors. To obtain DIF analysis, the Rasch model was run to 5000 participants who were selected randomly from a pool of examinees taking part in the National University Entrance Exam for Foreign Languages (NUEEFL) as a university entrance requirement for English language studies (i.e., English literature, Teaching, and Translation). The findings reveal that the test scores are not free from construct-irrelevant variance and some misfit items were modified based on the fit statistics suggestions. By and large, the fairness of the NUEEFL was not confirmed. The results obtained from such psychometric assessment could be beneficial for test designers, stake-holders, administrators, as well as teachers. It also recommends future administering standard and bias-free test and instructional materials. Keywords: Differential Item Functioning analysis; Bias; Dimensionality; Fairness; The Rasch Model.