Abstract
Purpose
Item non-response (i.e., missing data) may mask the detection of differential item functioning (DIF) in patient-reported outcome measures or result in biased DIF estimates. Non-response can be challenging to address in ordinal data. We investigated an unsupervised machine-learning method for ordinal item-level imputation and compared it with commonly-used item non-response methods when testing for DIF.
Methods
Computer simulation and real-world data were used to assess several item non-response methods using the item response theory likelihood ratio test for DIF. The methods included: (a) list-wise deletion (LD), (b) half-mean imputation (HMI), (c) full information maximum likelihood (FIML), and (d) non-negative matrix factorization (NNMF), which adopts a machine-learning approach to impute missing values. Control of Type I error rates were evaluated using a liberal robustness criterion for α = 0.05 (i.e., 0.025–0.075). Statistical power was assessed with and without adoption of an item non-response method; differences > 10% were considered substantial.
Results
Type I error rates for detecting DIF using LD, FIML and NNMF methods were controlled within the bounds of the robustness criterion for > 95% of simulation conditions, although the NNMF occasionally resulted in inflated rates. The HMI method always resulted in inflated error rates with 50% missing data. Differences in power to detect moderate DIF effects for LD, FIML and NNMF methods were substantial with 50% missing data and otherwise insubstantial.
Conclusion
The NNMF method demonstrated comparable performance to commonly-used non-response methods. This computationally-efficient method represents a promising approach to address item-level non-response when testing for DIF.
Similar content being viewed by others
Data availability
Study data for the real-world analyses were secondary data. These data were provided under specific data sharing agreements only for approved use for this project. The original source data are not owned by the researchers and as such cannot be provided to a public repository. Where necessary and with appropriate approvals, the original source data for this project may be reviewed with the consent of the data providers and approval by the required privacy and ethical review bodies.
Code availability
Simulation codes are provided in the supplementary material.
References
Johnston, B. C., Patrick, D. L., Thorlund, K., Busse, J. W., da Costa, B. R., Schünemann, H. J., & Guyatt, G. H. (2013). Patient-reported outcomes in meta-analyses –part 2: Methods for improving interpretability for decision-makers. Health and Quality of Life Outcomes, 11(211), 1–9. https://doi.org/10.1186/1477-7525-11-211
Guyatt, G. H., Feeny, D. H., & Patrick, D. L. (1993). Measuring health-related quality of life. Annals of Internal Medicine, 118(8), 622–629.
Berzon, R., Hays, R. D., & Shumaker, S. A. (1993). International use, application and performance of health-related quality of life instruments. Quality of Life Research, 2(6), 367–368. https://doi.org/10.1007/BF00422214
Bulut, O., & Kim, D. (2021). The use of data imputation when investigating dimensionality in Sparse data from computerized adaptive tests. Journal of Applied Testing Technology, 22(2), 1.
Jia, F., & Wu, W. (2019). Evaluating methods for handling missing ordinal data in structural equation modeling. Behavior Research Methods, 51(5), 2337–2355. https://doi.org/10.3758/s13428-018-1187-4
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.
Bell, M. L., & Fairclough, D. L. (2014). Practical and statistical issues in missing data for longitudinal patient-reported outcomes. Statistical Methods in Medical Research, 23(5), 440–459. https://doi.org/10.1177/0962280213476378
Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(SUPPL. 1), 33–42. https://doi.org/10.1007/s11136-007-9184-6
Banks, K. (2015). An introduction to missing data in the context of differential item functioning. Practical Assessment, Research and Evaluation, 20(12), 1–10.
Finch, H. (2011). The use of multiple imputation for missing data in uniform DIF analysis: Power and type I error rates. Applied Measurement in Education, 24(4), 281–301. https://doi.org/10.1080/08957347.2011.607054
Donneau, A. F., Mauer, M., Molenberghs, G., & Albert, A. (2015). A simulation study comparing multiple imputation methods for incomplete longitudinal ordinal data. Communications in Statistics, 44(5), 1311–1338. https://doi.org/10.1080/03610918.2013.818690
Eekhout, I., De Vet, H. C. W., Twisk, J. W. R., Brand, J. P. L., De Boer, M. R., & Heymans, M. W. (2014). Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. Journal of Clinical Epidemiology, 67(3), 335–342. https://doi.org/10.1016/j.jclinepi.2013.09.009
Kombo, A. Y., Mwambi, H., & Molenberghs, G. (2017). Multiple imputation for ordinal longitudinal data with monotone missing data patterns. Journal of Applied Statistics, 44(2), 270–287. https://doi.org/10.1080/02664763.2016.1168370
Raghunathan, T. E., Lepkowski, J. M., & Van Hoewyk, J. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–95.
Enders, C. K. (2010). Applied missing data analysis. The Guilford Press.
Liu, Y., Millsap, R. E., West, S. G., Tein, J. Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506.
Chen, P. Y., Wu, W., Garnier-Villarreal, M., Kite, B. A., & Jia, F. (2020). Testing measurement invariance with ordinal missing data: A comparison of estimators and missing data techniques. Multivariate Behavioral Research, 55(1), 87–101.
Donneau, A. F., Mauer, M., Lambert, P., Molenberghs, G., & Albert, A. (2015). Simulation-based study comparing multiple imputation methods for non-monotone missing ordinal data in longitudinal settings. Journal of Biopharmaceutical Statistics, 25(3), 570–601.
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). CRC Press.
Lin, X. E., & Boutros, P. C. (2020). Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics, 21(1), 1–10. https://doi.org/10.1186/s12859-019-3312-5
Zhang, S., Wang, W., Ford, J., & Makedon, F. (2006). Learning from incomplete ratings using non-negative matrix factorization. In: Proceedings of the Sixth SIAM International Conference on Data Mining (pp. 549–553). https://doi.org/10.1137/1.9781611972764.58
Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, 2287–2322.
Wold, H. (1975). Soft modelling by latent variables: The nonlinear iterative partial least squares (NIPALS) approach. Journal of Applied Probability, 12(S1), 117–142.
Fairclough, A. D. L., & Cella, D. F. (1996). Functional assessment of cancer therapy (FACT-G): Non-response to individual questions. Quality of Life Research, 5(3), 321–329.
Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64(3), 419–436. https://doi.org/10.1177/0013164403261050
Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147
Ayilara, O. F., Zhang, L., Sajobi, T. T., Sawatzky, R., Bohm, E., & Lix, L. M. (2019). Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and Quality of Life Outcomes, 17(1), 106. https://doi.org/10.1186/s12955-019-1181-2
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
Pauca, V. P., Piper, J., & Plemmons, R. J. (2006). Nonnegative matrix factorization for spectral data analysis. Linear Algebra and Its Applications, 416(1), 29–47. https://doi.org/10.1016/j.laa.2005.06.025
Lin, X. E., & Boutros, P. (2019). NNLM: a package for fast and versatile nonnegative matrix factorization.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. https://doi.org/10.1037/a0015825
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2016.00109
Olsbjerg, M., & Christensen, K. B. (2015). Modeling local dependence in longitudinal IRT models. Behavior Research Methods, 47(4), 1413–1424. https://doi.org/10.3758/s13428-014-0553-0
De Ayala, R. J. (1994). The influence of multidimensionality on the graded response model. Applied Psychological Measurement, 18(2), 155–170.
Bulut, O., & Sunbul, Ö. (2017). Monte Carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266–287. https://doi.org/10.21031/epod.305821
Finch, H. W. (2011). The impact of missing data on the detection of nonuniform differential item functioning. Educational and Psychological Measurement, 71(4), 663–683.
Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purposes: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930. https://doi.org/10.1080/00949655.2018.1491577
Nassiri, V., Molenberghs, G., Verbeke, G., & Barbosa-Breda, J. (2020). Iterative multiple imputation: A framework to determine the number of imputed datasets. American Statistician, 74(2), 125–136. https://doi.org/10.1080/00031305.2018.1543615
Goretzko, D. (2021). Factor retention in exploratory factor analysis with missing data. Educational and Psychological Measurement. https://doi.org/10.1177/00131644211022031
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Bulut, O., & Suh, Y. (2017). Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression. Frontiers in Education, 2(October), 1–14. https://doi.org/10.3389/feduc.2017.00051
Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J. L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes. https://doi.org/10.1186/s12955-015-0365-7
Yadegari, I., Bohm, E., Ayilara, O. F., Zhang, L., Sawatzky, R., Sajobi, T. T., & Lix, L. M. (2019). Differential item functioning of the SF-12 in a population-based regional joint replacement registry. Health and Quality of Life Outcomes, 17(1), 1–11. https://doi.org/10.1186/s12955-019-1166-1
Lix, L. M., Wu, X., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., Prior, J. C., Papaioannou, A., Josse, R. G., Towheed, T. E., Davison, K. S., & Sawatzky, R. (2016). Differential item functioning in the SF-36 physical functioning and mental health sub scales: A population-based investigation in the Canadian multicentre osteoporosis study. PLoS ONE, 11(3), 1–13. https://doi.org/10.1371/journal.pone.0151519
Kwon, J. Y., & Sawatzky, R. (2017). Examining gender-related differential item functioning of the veterans rand 12-item health survey. Quality of Life Research, 26(10), 2877–2883. https://doi.org/10.1007/s11136-017-1638-x
Stout, W., Li, H. H., Nandakumar, R., & Bolt, D. (1997). MULTISIB: A procedure to investigate DIF when a test is intentionally two-dimensional. Applied Psychological Measurement, 21(3), 195–213. https://doi.org/10.1177/01466216970213001
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Bradley, J. V. (1978). Robustness. British Journal of Mathematical & Statistical Psychology, 31(2), 144–152.
Kaplan, D. (1989). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 24(1), 41–57.
Curran, P., & West, S. G. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16–29.
Zhang, L., Lix, L. M., Ayilara, O., Sawatzky, R., & Bohm, E. R. (2018). The effect of multimorbidity on changes in health-related quality of life following hip and knee arthroplasty. Bone and Joint Journal, 100B(9), 1168–1174. https://doi.org/10.1302/0301-620X.100B9.BJJ-2017-1372.R1
Salyers, M., Bosworth, H., Swanson, J., Lamb-Pagone, J., & Osher, F. (2000). Reliability and validity of the SF-12 health survey among people with severe mental illness. Medical Care, 38, 1141–1150.
Cernin, P., Cresci, K., Jankowski, T., & Lichtenberg, P. (2010). Reliability and validity testing of the short-form health survey in a sample of community-dwelling African American older adults. Journal of Nursing Measurement, 18, 49–59.
Cheak-Zamora, N., Wyrwich, K., & McBride, T. (2009). Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Quality of Life Research, 18, 727–735.
Yosef, H. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4), 800–802.
Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97(5), 1016–1031. https://doi.org/10.1037/a0027934
Sedivy, S. K., Zhang, B., & Traxel, N. M. (2006). Detection of differential item functioning with polytomous items in the presence of missing data. In: Annual meeting of the National Council on Measurement in Education
Rombach, I., Rivero-Arias, O., Gray, A. M., Jenkinson, C., & Burke, Ó. (2016). The current practice of handling and reporting missing outcome data in eight widely used PROMs in RCT publications: A review of the current literature. Quality of Life Research, 25(7), 1613–1623. https://doi.org/10.1007/s11136-015-1206-1
Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.
Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8(3), 361–378. https://doi.org/10.6339/jds.2010.08(3).612
Funding
Funding for this study was provided by the Canadian Institutes of Health Research (Grant # MOP-142404). OFA is supported by funding from the Visual and Automated Disease Analytics (VADA) Program at the University of Manitoba. LML is supported by a Tier 1 Canada Research Chair in Methods for Electronic Health Data Quality. MJJ acknowledges the research support of the Natural Sciences and Engineering Research Council of Canada (NSERC). RB acknowledges the research support of the Canadian Institutes of Health Research.
Author information
Authors and Affiliations
Contributions
All authors conceived the study and contributed to the design of the simulation study and analysis plan for the numeric example. OFA, MJJ and LML conducted the analysis and prepared the draft manuscript. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethical approval
This study received ethical approval from the University of Manitoba Health Research Ethics Board.
Consent to participate
Informed written consent was obtained from all participants whose information was used in the analyses of data from the Winnipeg Regional Health Authority Joint Replacement Registry.
Consent for publication
Not applicable. There is no identifying information for participants contained in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ayilara, O.F., Sajobi, T.T., Barclay, R. et al. A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures. Qual Life Res 31, 2837–2848 (2022). https://doi.org/10.1007/s11136-022-03129-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-022-03129-8