A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures

Ayilara, Olawale F.; Sajobi, Tolulope T.; Barclay, Ruth; Bohm, Eric; Jafari Jozani, Mohammad; Lix, Lisa M.

doi:10.1007/s11136-022-03129-8

A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures

Published: 07 April 2022

Volume 31, pages 2837–2848, (2022)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Olawale F. Ayilara¹,
Tolulope T. Sajobi²,
Ruth Barclay³,
Eric Bohm⁴,
Mohammad Jafari Jozani⁵ &
…
Lisa M. Lix ORCID: orcid.org/0000-0001-8685-3212¹

452 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Purpose

Item non-response (i.e., missing data) may mask the detection of differential item functioning (DIF) in patient-reported outcome measures or result in biased DIF estimates. Non-response can be challenging to address in ordinal data. We investigated an unsupervised machine-learning method for ordinal item-level imputation and compared it with commonly-used item non-response methods when testing for DIF.

Methods

Computer simulation and real-world data were used to assess several item non-response methods using the item response theory likelihood ratio test for DIF. The methods included: (a) list-wise deletion (LD), (b) half-mean imputation (HMI), (c) full information maximum likelihood (FIML), and (d) non-negative matrix factorization (NNMF), which adopts a machine-learning approach to impute missing values. Control of Type I error rates were evaluated using a liberal robustness criterion for α = 0.05 (i.e., 0.025–0.075). Statistical power was assessed with and without adoption of an item non-response method; differences > 10% were considered substantial.

Results

Type I error rates for detecting DIF using LD, FIML and NNMF methods were controlled within the bounds of the robustness criterion for > 95% of simulation conditions, although the NNMF occasionally resulted in inflated rates. The HMI method always resulted in inflated error rates with 50% missing data. Differences in power to detect moderate DIF effects for LD, FIML and NNMF methods were substantial with 50% missing data and otherwise insubstantial.

Conclusion

The NNMF method demonstrated comparable performance to commonly-used non-response methods. This computationally-efficient method represents a promising approach to address item-level non-response when testing for DIF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RespOnse Shift ALgorithm in Item response theory (ROSALI) for response shift detection with missing data in longitudinal patient-reported outcome studies

Article 05 December 2014

PROMIS Global Health item nonresponse: is it better to impute missing item responses before computing T-scores?

Article 19 October 2019

Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level

Article Open access 28 August 2018

Data availability

Study data for the real-world analyses were secondary data. These data were provided under specific data sharing agreements only for approved use for this project. The original source data are not owned by the researchers and as such cannot be provided to a public repository. Where necessary and with appropriate approvals, the original source data for this project may be reviewed with the consent of the data providers and approval by the required privacy and ethical review bodies.

Code availability

Simulation codes are provided in the supplementary material.

References

Johnston, B. C., Patrick, D. L., Thorlund, K., Busse, J. W., da Costa, B. R., Schünemann, H. J., & Guyatt, G. H. (2013). Patient-reported outcomes in meta-analyses –part 2: Methods for improving interpretability for decision-makers. Health and Quality of Life Outcomes, 11(211), 1–9. https://doi.org/10.1186/1477-7525-11-211
Article Google Scholar
Guyatt, G. H., Feeny, D. H., & Patrick, D. L. (1993). Measuring health-related quality of life. Annals of Internal Medicine, 118(8), 622–629.
Article CAS Google Scholar
Berzon, R., Hays, R. D., & Shumaker, S. A. (1993). International use, application and performance of health-related quality of life instruments. Quality of Life Research, 2(6), 367–368. https://doi.org/10.1007/BF00422214
Article CAS PubMed Google Scholar
Bulut, O., & Kim, D. (2021). The use of data imputation when investigating dimensionality in Sparse data from computerized adaptive tests. Journal of Applied Testing Technology, 22(2), 1.
Google Scholar
Jia, F., & Wu, W. (2019). Evaluating methods for handling missing ordinal data in structural equation modeling. Behavior Research Methods, 51(5), 2337–2355. https://doi.org/10.3758/s13428-018-1187-4
Article PubMed Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.
Book Google Scholar
Bell, M. L., & Fairclough, D. L. (2014). Practical and statistical issues in missing data for longitudinal patient-reported outcomes. Statistical Methods in Medical Research, 23(5), 440–459. https://doi.org/10.1177/0962280213476378
Article PubMed Google Scholar
Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(SUPPL. 1), 33–42. https://doi.org/10.1007/s11136-007-9184-6
Article PubMed Google Scholar
Banks, K. (2015). An introduction to missing data in the context of differential item functioning. Practical Assessment, Research and Evaluation, 20(12), 1–10.
Google Scholar
Finch, H. (2011). The use of multiple imputation for missing data in uniform DIF analysis: Power and type I error rates. Applied Measurement in Education, 24(4), 281–301. https://doi.org/10.1080/08957347.2011.607054
Article Google Scholar
Donneau, A. F., Mauer, M., Molenberghs, G., & Albert, A. (2015). A simulation study comparing multiple imputation methods for incomplete longitudinal ordinal data. Communications in Statistics, 44(5), 1311–1338. https://doi.org/10.1080/03610918.2013.818690
Article Google Scholar
Eekhout, I., De Vet, H. C. W., Twisk, J. W. R., Brand, J. P. L., De Boer, M. R., & Heymans, M. W. (2014). Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. Journal of Clinical Epidemiology, 67(3), 335–342. https://doi.org/10.1016/j.jclinepi.2013.09.009
Article PubMed Google Scholar
Kombo, A. Y., Mwambi, H., & Molenberghs, G. (2017). Multiple imputation for ordinal longitudinal data with monotone missing data patterns. Journal of Applied Statistics, 44(2), 270–287. https://doi.org/10.1080/02664763.2016.1168370
Article Google Scholar
Raghunathan, T. E., Lepkowski, J. M., & Van Hoewyk, J. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–95.
Google Scholar
Enders, C. K. (2010). Applied missing data analysis. The Guilford Press.
Google Scholar
Liu, Y., Millsap, R. E., West, S. G., Tein, J. Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506.
Article Google Scholar
Chen, P. Y., Wu, W., Garnier-Villarreal, M., Kite, B. A., & Jia, F. (2020). Testing measurement invariance with ordinal missing data: A comparison of estimators and missing data techniques. Multivariate Behavioral Research, 55(1), 87–101.
Article Google Scholar
Donneau, A. F., Mauer, M., Lambert, P., Molenberghs, G., & Albert, A. (2015). Simulation-based study comparing multiple imputation methods for non-monotone missing ordinal data in longitudinal settings. Journal of Biopharmaceutical Statistics, 25(3), 570–601.
Article CAS Google Scholar
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). CRC Press.
Book Google Scholar
Lin, X. E., & Boutros, P. C. (2020). Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics, 21(1), 1–10. https://doi.org/10.1186/s12859-019-3312-5
Article Google Scholar
Zhang, S., Wang, W., Ford, J., & Makedon, F. (2006). Learning from incomplete ratings using non-negative matrix factorization. In: Proceedings of the Sixth SIAM International Conference on Data Mining (pp. 549–553). https://doi.org/10.1137/1.9781611972764.58
Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, 2287–2322.
PubMed Google Scholar
Wold, H. (1975). Soft modelling by latent variables: The nonlinear iterative partial least squares (NIPALS) approach. Journal of Applied Probability, 12(S1), 117–142.
Article Google Scholar
Fairclough, A. D. L., & Cella, D. F. (1996). Functional assessment of cancer therapy (FACT-G): Non-response to individual questions. Quality of Life Research, 5(3), 321–329.
Article CAS Google Scholar
Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64(3), 419–436. https://doi.org/10.1177/0013164403261050
Article Google Scholar
Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351.
Article CAS Google Scholar
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147
Article PubMed Google Scholar
Ayilara, O. F., Zhang, L., Sajobi, T. T., Sawatzky, R., Bohm, E., & Lix, L. M. (2019). Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and Quality of Life Outcomes, 17(1), 106. https://doi.org/10.1186/s12955-019-1181-2
Article PubMed PubMed Central Google Scholar
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
Article CAS PubMed Google Scholar
Pauca, V. P., Piper, J., & Plemmons, R. J. (2006). Nonnegative matrix factorization for spectral data analysis. Linear Algebra and Its Applications, 416(1), 29–47. https://doi.org/10.1016/j.laa.2005.06.025
Article Google Scholar
Lin, X. E., & Boutros, P. (2019). NNLM: a package for fast and versatile nonnegative matrix factorization.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. https://doi.org/10.1037/a0015825
Article PubMed Google Scholar
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2016.00109
Article PubMed PubMed Central Google Scholar
Olsbjerg, M., & Christensen, K. B. (2015). Modeling local dependence in longitudinal IRT models. Behavior Research Methods, 47(4), 1413–1424. https://doi.org/10.3758/s13428-014-0553-0
Article PubMed Google Scholar
De Ayala, R. J. (1994). The influence of multidimensionality on the graded response model. Applied Psychological Measurement, 18(2), 155–170.
Article Google Scholar
Bulut, O., & Sunbul, Ö. (2017). Monte Carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266–287. https://doi.org/10.21031/epod.305821
Article Google Scholar
Finch, H. W. (2011). The impact of missing data on the detection of nonuniform differential item functioning. Educational and Psychological Measurement, 71(4), 663–683.
Article Google Scholar
Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purposes: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930. https://doi.org/10.1080/00949655.2018.1491577
Article Google Scholar
Nassiri, V., Molenberghs, G., Verbeke, G., & Barbosa-Breda, J. (2020). Iterative multiple imputation: A framework to determine the number of imputed datasets. American Statistician, 74(2), 125–136. https://doi.org/10.1080/00031305.2018.1543615
Article Google Scholar
Goretzko, D. (2021). Factor retention in exploratory factor analysis with missing data. Educational and Psychological Measurement. https://doi.org/10.1177/00131644211022031
Article PubMed PubMed Central Google Scholar
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Article Google Scholar
Bulut, O., & Suh, Y. (2017). Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression. Frontiers in Education, 2(October), 1–14. https://doi.org/10.3389/feduc.2017.00051
Article Google Scholar
Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J. L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes. https://doi.org/10.1186/s12955-015-0365-7
Article PubMed PubMed Central Google Scholar
Yadegari, I., Bohm, E., Ayilara, O. F., Zhang, L., Sawatzky, R., Sajobi, T. T., & Lix, L. M. (2019). Differential item functioning of the SF-12 in a population-based regional joint replacement registry. Health and Quality of Life Outcomes, 17(1), 1–11. https://doi.org/10.1186/s12955-019-1166-1
Article Google Scholar
Lix, L. M., Wu, X., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., Prior, J. C., Papaioannou, A., Josse, R. G., Towheed, T. E., Davison, K. S., & Sawatzky, R. (2016). Differential item functioning in the SF-36 physical functioning and mental health sub scales: A population-based investigation in the Canadian multicentre osteoporosis study. PLoS ONE, 11(3), 1–13. https://doi.org/10.1371/journal.pone.0151519
Article CAS Google Scholar
Kwon, J. Y., & Sawatzky, R. (2017). Examining gender-related differential item functioning of the veterans rand 12-item health survey. Quality of Life Research, 26(10), 2877–2883. https://doi.org/10.1007/s11136-017-1638-x
Article PubMed Google Scholar
Stout, W., Li, H. H., Nandakumar, R., & Bolt, D. (1997). MULTISIB: A procedure to investigate DIF when a test is intentionally two-dimensional. Applied Psychological Measurement, 21(3), 195–213. https://doi.org/10.1177/01466216970213001
Article Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Article Google Scholar
Bradley, J. V. (1978). Robustness. British Journal of Mathematical & Statistical Psychology, 31(2), 144–152.
Article Google Scholar
Kaplan, D. (1989). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 24(1), 41–57.
Article CAS Google Scholar
Curran, P., & West, S. G. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16–29.
Article Google Scholar
Zhang, L., Lix, L. M., Ayilara, O., Sawatzky, R., & Bohm, E. R. (2018). The effect of multimorbidity on changes in health-related quality of life following hip and knee arthroplasty. Bone and Joint Journal, 100B(9), 1168–1174. https://doi.org/10.1302/0301-620X.100B9.BJJ-2017-1372.R1
Article Google Scholar
Salyers, M., Bosworth, H., Swanson, J., Lamb-Pagone, J., & Osher, F. (2000). Reliability and validity of the SF-12 health survey among people with severe mental illness. Medical Care, 38, 1141–1150.
Article CAS Google Scholar
Cernin, P., Cresci, K., Jankowski, T., & Lichtenberg, P. (2010). Reliability and validity testing of the short-form health survey in a sample of community-dwelling African American older adults. Journal of Nursing Measurement, 18, 49–59.
Article Google Scholar
Cheak-Zamora, N., Wyrwich, K., & McBride, T. (2009). Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Quality of Life Research, 18, 727–735.
Article Google Scholar
Yosef, H. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4), 800–802.
Article Google Scholar
Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97(5), 1016–1031. https://doi.org/10.1037/a0027934
Article PubMed Google Scholar
Sedivy, S. K., Zhang, B., & Traxel, N. M. (2006). Detection of differential item functioning with polytomous items in the presence of missing data. In: Annual meeting of the National Council on Measurement in Education
Rombach, I., Rivero-Arias, O., Gray, A. M., Jenkinson, C., & Burke, Ó. (2016). The current practice of handling and reporting missing outcome data in eight widely used PROMs in RCT publications: A review of the current literature. Quality of Life Research, 25(7), 1613–1623. https://doi.org/10.1007/s11136-015-1206-1
Article PubMed PubMed Central Google Scholar
Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.
Article Google Scholar
Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8(3), 361–378. https://doi.org/10.6339/jds.2010.08(3).612
Article Google Scholar

Download references

Funding

Funding for this study was provided by the Canadian Institutes of Health Research (Grant # MOP-142404). OFA is supported by funding from the Visual and Automated Disease Analytics (VADA) Program at the University of Manitoba. LML is supported by a Tier 1 Canada Research Chair in Methods for Electronic Health Data Quality. MJJ acknowledges the research support of the Natural Sciences and Engineering Research Council of Canada (NSERC). RB acknowledges the research support of the Canadian Institutes of Health Research.

Author information

Authors and Affiliations

Department of Community Health Sciences, University of Manitoba, S113-750 Bannatyne Avenue, Winnipeg, MB, R3E 0W3, Canada
Olawale F. Ayilara & Lisa M. Lix
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Tolulope T. Sajobi
Department of Physical Therapy, University of Manitoba, Winnipeg, MB, Canada
Ruth Barclay
Department of Surgery, University of Manitoba, Winnipeg, MB, Canada
Eric Bohm
Department of Statistics, University of Manitoba, Winnipeg, MB, Canada
Mohammad Jafari Jozani

Authors

Olawale F. Ayilara
View author publications
You can also search for this author in PubMed Google Scholar
Tolulope T. Sajobi
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Barclay
View author publications
You can also search for this author in PubMed Google Scholar
Eric Bohm
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Jafari Jozani
View author publications
You can also search for this author in PubMed Google Scholar
Lisa M. Lix
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceived the study and contributed to the design of the simulation study and analysis plan for the numeric example. OFA, MJJ and LML conducted the analysis and prepared the draft manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Lisa M. Lix.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

This study received ethical approval from the University of Manitoba Health Research Ethics Board.

Consent to participate

Informed written consent was obtained from all participants whose information was used in the analyses of data from the Winnipeg Regional Health Authority Joint Replacement Registry.

Consent for publication

Not applicable. There is no identifying information for participants contained in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 318 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ayilara, O.F., Sajobi, T.T., Barclay, R. et al. A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures. Qual Life Res 31, 2837–2848 (2022). https://doi.org/10.1007/s11136-022-03129-8

Download citation

Accepted: 17 March 2022
Published: 07 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11136-022-03129-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures