Zero-one augmented beta and zero-inflated discrete models with heterogeneous dispersion for the analysis of student academic performance
- 141 Downloads
The purpose of this work is to present suitable statistical methods to study the performance of undergraduate students based on the incidence/proportion of failed courses/subjects. Three approaches are considered: first, the proportion of failed subjects is modeled considering a zero-one augmented beta distribution; second, discrete models are used to model the probability of failing subjects with logit link; third, incidence is modeled using regression for count data with log link and the logarithm of the total number of subjects as an offset. Zero-inflated versions are used to account for the excess of zeros in the data when appropriate and we also considered the heterogeneous dispersion parameter, when applicable. Overall, the zero-inflated negative binomial and zero inflated beta-binomial models, with regression on the mean and the dispersion parameters, present good measures of goodness of fit to the data. The database consists of records of Engineering major students who entered the State University of Campinas, Brazil, from 2000 to 2005. Entrance exam scores and demographic variables as well as socio-economic status are considered as covariates in the models.
KeywordsAcademic performance Heteroscedasticity Quantile residuals Residual analysis Overdispersion Zero-inflated discrete models Zero-one augmented beta models
Mathematics Subject Classification62-07 62J12
The authors are grateful to the two anonymous referees and the Associate Editor for their careful revision, valuable suggestions, and comments which improved this paper. We also would like to thank the Espaço da Escrita - Pró-Reitoria de Pesquisa/UNICAMP - by the language services provided, Conselho Nacional de Desenvolvimento Cientíífico e Tecnológico (CNPq) [grant 308583/2015-9 to H.P.P.] and Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp) [grants 11/15047-7 and 14/03043-5 to H.P.P.].
- Attanasio M, Boscaino G, Capursi V, Plaia A (2013) Can students’ career be helpful in predicting an increase in universities income? In: Giudici P, Ingrassia A, Vichi M (eds) Statistical models for data analysis. Studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 9–16Google Scholar
- Chambers JM, Cleveland WS, Kleiner B, Tukey PA (1983) Tutorial on methods for interval-censored data and their implementation in R. In: Graphical methods for data Analysis. Wadsworth, Belmont, CAGoogle Scholar
- Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244Google Scholar
- Pinheiro HP, Sen PK, Pinheiro A, Kiihl S (2018) A nonparametric approach to asses undergraduate performance. arXiv:181000678 [statME] (submitted)
- R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
- Rigby RA, Stasinopoulos DM, Heller G, Voudouris V (2014) The distribution toolbox of GAMLSS. http://www.gamlss.org/
- Salehi M, Roudbari M (2015) Zero inflated poisson and negative binomial regression models: application in education. Med J Islam Repub Iran 29:297–297Google Scholar