Arwa Alkhalaf – Final Ph.D. Defence (MERM)

Wednesday, March 8, 2017 at 12:30 p.m.
Room 203, Graduate Student Centre (6371 Crescent Road), UBC Point Grey Campus

 

Supervisor:  Dr. Bruno Zumbo
Supervisory Committee:  Dr. Amery Wu, and Dr. Sterett Mercer
University Examiners:  Dr. Jeremy Biesanz (Psychology), and Dr. Chris Richardson (Population and Public Health)
External Examiners: Dr. Rachel Fouladi (Simon Fraser University)

 

Title:  “The impact of Predictor Variable(s) with Skewed Cell Probabilities on Wald Tests in Binary Logistic Regression”

 

ABSTRACT

What happens to the parameter estimates and test operating characteristics when the predictor variables in a logistic regression are skewed? The statistics literature provides relatively few answers to this question. A series of simulation studies are reported that investigated the impact of a skewed predictor(s) on the Type I error rate and power of the Wald test in a logistic regression model. Five simulations were conducted for three different models: a simple logistic regression with a binary predictor, a simple logistic regression with a continuous predictor, and a multiple logistic regression with two dichotomous predictors. Three main factors were considered: sample size (varied in 13 levels), degree of skewness (varied in 18 levels), and effect size (varied in 3 levels). This dissertation confirms the importance of conducting a thorough diagnostic analysis of the variables in a logistic regression. The results show that the Type I error rate and power were affected by severe predictor skewness, but that the effect was moderated by sample size. In general, Type I error rate was consistently deflated for all three models and power improved with less skewness.

Although a deflated Type I error rate results in a valid hypothesis test, the hypothesis test for the skewed variable and the interpretation of the resulting analyses should not be trusted because of the related reduction in statistical power. For smaller sample sizes and high degree of skewness, simulated data was on occasion characterized by separation as evidenced by the non-convergence of the maximum likelihood estimation (MLE). In such cases, it is well established that an alternative estimator should be used. A detailed description on the impact of skewed cell predictor probabilities and sample size provide guidelines for practitioners as to where to expect the greatest problems. These findings highlight the importance of the effects of predictor characteristics on statistical analysis of a logistic regression.