Abstract
This article shows how to conduct multiple imputation in big identifiable data for educational research purposes. The R statistical package and procedures to handle missing data applied for the purpose of this study were “BaylorEdPsych” and “mi”. Firstly, we checked that every dataset rejected the null hypothesis for Missing Completely At Random (MCAR), using the function “LittleMCAR”. Simulated and real data analyses were conducted. Results suggest that the improvement of the quality of imputation requires alternative methods to be developed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alves, M.T.G., Ferrão, M.E.: Uma década de Prova Brasil: Evolução do desempenho e da aprovação (2018, submitted)
Beaujean, A.A.: Package “BaylorEdPsych” (2015). https://cran.r-project.org/web/packages/BaylorEdPsych/BaylorEdPsych.pdf. Accessed 11 Feb 2019
Bratti, M., McKnight, A., Naylor, R., Smith, J.: Higher education outcomes, graduate employment and university performance indicators. J. R. Statist. Soc. A 167(3), 475–496 (2004). http://www.jstor.org/stable/3559775
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)? Geosci. Model. Dev. Discuss. 7(1), 1525–1534 (2014)
Diggle, P.J.: Statistics: a data science for the 21st century. J. R. Stat. Soc. A 178, 793–813 (2015)
Errickson, J.: Parallel processing in R (2017). http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html. Accessed 10 Feb 2019
Ferrão, M.E., Alves, M.T.G.: Grade repetition in Brazilian primary education: 2007–2017 cross-sectional data modelling (2019, submitted)
Foley, B., Goldstein, H.: Measuring Success: League Tables in the Public Sector. British Academy, London (2012)
Greene, W.H.: Econometric Analysis. Prentice Hall, New York (2011)
INEP - Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira, ANRESC (Prova Brasil) (2015). http://portal.inep.gov.br/educacao-basica/saeb/sobre-a-anresc
Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C.: AAPOR Report on Big Data (2015). https://www.aapor.org/getattachment/Education-Resources/Reports/BigDataTaskForceReport_FINAL_2_12_15_b.pdf.aspx
Little, R.J.A.: A test of missing completely at random for multivariate data with missing values. J. Am. Stat. Assoc. 83(404), 1198–1202 (1988)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, Hoboken (2002)
Longford, N.T.: Missing Data and Small-Area Estimation. Springer, New York (2005). https://doi.org/10.1007/1-84628-195-4
Pampaka, M., Hutcheson, G., Williams, J.: Handling missing data: analysis of a challenging data set using multiple imputation. Int. J. Res. Method Educ. 39(1), 19–37 (2016). https://doi.org/10.1080/1743727X.2014.979146
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1999)
Schlomer, G.L., Bauman, S., Card, N.A.: Best practices for missing data management in counseling psychology. J. Couns. Psychol. 57(1), 1–10 (2010). https://doi.org/10.1037/a0018082
Shlomo, N., Goldstein, H.: Editorial: big data in social research. J. R. Stat. Soc. A 178, 787–790 (2015)
Su, Y.-S., Goodrich, B., Kropko, J.: Package “mi” (2015). https://cran.r-project.org/web/packages/mi/mi.pdf. Accessed 11 Feb 2019
Wamba, S.F., Akter, S., Edwards, A., Chopin, G., Gnanzou, D.: How “big data” can make big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015). https://doi.org/10.1016/j.ijpe.2014.12.031
Willmott, C., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82 (2005)
Acknowledgements
This work was partially funded by FCT- Fundação para a Ciência e a Tecnologia through project number CEMAPRE - UID/MULTI/00491/2019 and by FCT/MEC through national funds and when applicable co-funded by FEDER – PT2020 partnership agreement under the project UID/EEA/50008/2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ferrão, M.E., Prata, P. (2019). Computing Topics on Multiple Imputation in Big Identifiable Data Using R: An Application to Educational Research. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11621. Springer, Cham. https://doi.org/10.1007/978-3-030-24302-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-24302-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24301-2
Online ISBN: 978-3-030-24302-9
eBook Packages: Computer ScienceComputer Science (R0)