Debra Sandilands – Final Ph.D. Defence (MERM)

Thursday, May 22, 2014 at 12:30 pm
Room 200, Graduate Student Center (6371 Crescent Road), UBC Point Grey Campus

 

Title:  “Accuracy of differential item functioning detection methods in structurally missing data due to booklet design”

 

Supervisor: Dr. Kadriye Ercikan (MERM)
Supervisory Committee: Dr. Sterett Mercer (School Psychology) & Dr. Bruno Zumbo (MERM)
University Examiners: Dr. Pam Ratner (Nursing) and Dr. Jennifer Shapka (HDLC)
External Examiner: Dr. Randall Penfield (University of North Carolina)

 

ABSTRACT

Differential item functioning (DIF) analyses are used to analyze structurally missing data (SMD) due to balanced incomplete block (BIB) booklet designs commonly used in large scale assessments (LSAs). Only one DIF method, the Mantel Haenszel (MH) method, has previously been studied in this context. The purposes of this study were to investigate and compare the power and Type I error rates of an additional DIF method, the IRT-based Lord’s Wald test, with the MH method and to extend the research on methods of forming the MH matching variable (MV) by proposing and testing a modification to the MH MV in the SMD context.
A simulation study investigated the effects of sample size, ratio of group sizes, test length, percentage of DIF items, and differences in group abilities on the power and Type I error rates of four DIF methods: the IRT-Lord’s and MH using a block-wise, a booklet-wise, and a modified MV. The study design was selected to reflect authentic situations in which DIF might be investigated in LSAs that typically use BIB designs.
The three MH methods maintained better Type I error rates than the IRT-Lord’s method which was inflated when the group sample sizes were unequal. None of the four methods had high power to detect DIF at the smallest sample size (1200). In the other sample size conditions the IRT-Lord’s method had high power to detect DIF only when group sizes were equal. None of the MH methods had high power when the group mean ability levels differed, nor when the proportion of DIF in the MV was high.
These results indicate that DIF may go undetected in many realistic SMD conditions, potentially undermining the validity of score comparisons across groups. Recommendations to maximize DIF detection in SMD include using the MH method with a block-wise MV, ensuring a large overall sample size, and over-sampling small policy-relevant groups to result in more balanced group sample sizes. Results also indicate that other sources of validity evidence to support score comparability should be provided since DIF analyses cannot yet be solely relied upon for this purpose.