Skip to main content

More flexible response functions for the PROMIS physical functioning item bank by application of a monotonic polynomial approach



In developing item banks for patient reported outcomes (PROs), nonparametric techniques are often used for investigating empirical item response curves, whereas final banks usually use parsimonious parametric models. A flexible approach based on monotonic polynomials (MP) provides a compromise by modeling items with both complex and simpler response curves. This paper investigates the suitability of MPs to PRO data.


Using PROMIS Wave 1 data (N = 15,725) for Physical Function, we fitted an MP model and the graded response model (GRM). We compared both models in terms of overall model fit, latent trait estimates, and item/test information. We quantified possible GRM item misfit using approaches that compute discrepancies with the MP. Through simulations, we investigated the ability of the MP to perform well versus the GRM under identical data collection conditions.


A likelihood ratio test (p < 0.001) and AIC (but not BIC) indicated better fit for the MP. Latent trait estimates and expected test scores were comparable between models, but we observed higher information for the MP in the lower range of physical functioning. Many items were flagged as possibly misfitting and simulations supported the performance of the MP. Yet discrepancies between the MP and GRM were small.


The MP approach allows inclusion of items with complex response curves into PRO item banks. Information for the physical functioning item bank may be greater than originally thought for low levels of physical functioning. This may translate into small improvements if an MP approach is used.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Data availability

As described in the method section, data used in this manuscript are available in the public domain.

Code availability

Examples of estimation of the monotonic polynomial model are available in Supplementary Materials.


  1. 1.

    For a recent discussion on the merits of collapsing categories, see Harel and Steele [25].

  2. 2.

    Estimation options were changed slightly to increase computational speed and are described in Supplementary Materials.


  1. 1.

    Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

    Google Scholar 

  2. 2.

    Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

    Google Scholar 

  3. 3.

    Fries, J. F., Bruce, B., & Cella, D. (2005). The promise of PROMIS: Using item response theory to improve assessment of patient-reported outcomes. Clinical and Experimental Rheumatology, 23(5 Suppl 39), S53–S57.

    CAS  PubMed  Google Scholar 

  4. 4.

    Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological Assessment, 26, 513–527.

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monographs.

    Article  Google Scholar 

  6. 6.

    Samejima, F. (1972). A general model of free-response data. Psychometric Monographs No. 18. Psychometric Society.

    Google Scholar 

  7. 7.

    Samejima, F. (2010). The general graded response model. In M. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models: Developments and applications (pp. 77–107). Taylor & Francis.

    Google Scholar 

  8. 8.

    Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E. (2014). The PROMIS physical function item bank was calibrated to a standardized metric and show to improve measurement efficiency. Journal of Clinical Epidemiology, 67, 516–526.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: A case for nonparametric item response theory modeling. Psychological Methods, 9, 354–368.

    Article  PubMed  Google Scholar 

  10. 10.

    Patient-Reported Outcomes Measurement Information System (2013). PROMIS instrument development and validation scientific standards version 2.0. Retrieved from,

  11. 11.

    Falk, C. F., & Cai, L. (2016). Semi-parametric item response functions in the context of guessing. Journal of Educational Measurement, 53, 229–247.

    Article  Google Scholar 

  12. 12.

    Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21, 22–40.

    Article  Google Scholar 

  13. 13.

    Falk, C. F. (2019). Model selection for monotonic polynomial item response models. Quantitative psychology: The 83rd Annual Meeting of the Psychometric Society, New York, NY, 2018 (pp. 75–85). Springer.

    Chapter  Google Scholar 

  14. 14.

    Falk, C. F. (2020). The monotonic polynomial graded response model: Implementation and a comparative study. Applied Psychological Measurement, 44, 465–481.

    Article  PubMed  Google Scholar 

  15. 15.

    Falk, C. F., & Cai, L. (2016). Maximum marginal likelihood estimation of a monotonic polynomial generalized partial credit model with applications to multiple group analysis. Psychometrika, 81, 434–460.

    Article  PubMed  Google Scholar 

  16. 16.

    Liang, L., & Browne, M. W. (2015). A quasi-parametric method for fitting flexible item response functions. Journal of Educational and Behavioral Statistics, 40, 5–34.

    Article  Google Scholar 

  17. 17.

    Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates.

    Google Scholar 

  18. 18.

    Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics with filtered monotonic polynomial IRT models (PhD thesis). Department of Psychology, University of Minnesota.

  19. 19.

    Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  20. 20.

    Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.

    Article  Google Scholar 

  21. 21.

    Feuerstahler, L. M. (2019). Metric transformations and the filtered monotonic polynomial item response model. Psychometrika, 84, 105–123.

    Article  PubMed  Google Scholar 

  22. 22.

    Choi, S. W., Reise, S. P., Pilkonis, P., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136.

    Article  PubMed  Google Scholar 

  23. 23.

    Cella, D. (2015). PROMIS 1 wave 1. Harvard Dataverse.

  24. 24.

    Liu, H. H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W., & Hays, R. D. (2010). Representativeness of the PROMIS internet panel. Journal of Clinical Epidemiology, 63, 1169–1178.

    Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Harel, D., & Steele, R. J. (2018). An information matrix test for the collapsing of categories under the partial credit model. Journal of Educational and Behavioral Statistics, 43, 721–750.

    Article  Google Scholar 

  26. 26.

    Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck depression inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255–270.

    Article  Google Scholar 

  27. 27.

    Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the patient-reported outcomes measurement information system (PROMIS). Journal of Clinical Epidemiology, 61, 17–33.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Sijtsma, K., & van der Ark, L. A. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research, 38, 505–528.

    Article  PubMed  Google Scholar 

  29. 29.

    van der Ark, L. A., & Sijtsma, K. (2005). The effect of missing data imputation on Mokken scale analysis. In L. A. van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 147–166). Lawrence Erlbaum.

    Google Scholar 

  30. 30.

    van Ginkel, J. R., van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation of item scores in test and questionnaire data, and influence on psychometric results. Multivariate Behavioral Research, 42, 387–414.

    Article  PubMed  Google Scholar 

  31. 31.

    Wind, S. A., & Patil, Y. J. (2018). Exploring incomplete rating designs with Mokken scale analysis. Educational and Psychological Measurement, 78, 319–342.

    Article  PubMed  Google Scholar 

  32. 32.

    Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kickpatrick, R. M., Estabrook, R., Bates, T. C., Maes, H. H., & Boker, S. M. (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535–549.

    Article  Google Scholar 

  33. 33.

    Pritikin, J. N., Hunter, M. D., & Boker, S. M. (2015). Modular open-source software for item factor analysis. Educational and Psychological Measurement, 75, 458–475.

    Article  PubMed  Google Scholar 

  34. 34.

    Pritikin, J. N. (2016). Rpf: Response probability functions. Retrieved from

  35. 35.

    Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.

    Article  Google Scholar 

  36. 36.

    Chalmers, R. P. (2018). Model-based measures for detecting and quantifying response bias. Psychometrika, 83, 696–732.

    Article  PubMed  Google Scholar 

  37. 37.

    Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76, 114–140.

    Article  PubMed  Google Scholar 

  38. 38.

    Edelen, M. O., Stucky, B. D., & Chandra, A. (2015). Quantifying “problematic” DIF within an IRT framework: Application to a cancer stigma index. Quality of Life Research, 24, 95–103.

    Article  PubMed  Google Scholar 

  39. 39.

    Organization for Economic Cooperation and Development. (2017). PISA 2015 technical report. Organization for Economic Cooperation and Development.

    Google Scholar 

  40. 40.

    Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52, 350–370.

    Article  PubMed  Google Scholar 

  41. 41.

    Feuerstahler, L. M. (2018). Sources of error in IRT trait estimation. Applied Psychological Measurement, 42, 359–375.

    Article  PubMed  Google Scholar 

  42. 42.

    Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113–141.

    Article  Google Scholar 

  43. 43.

    Douglas, J., & Cohen, A. (2001). Nonparametric item response function estimation for assessing parametric model fit. Applied Psychological Measurement, 25, 234–243.

    Article  Google Scholar 

  44. 44.

    Liang, T., & Wells, C. S. (2009). A model fit statistic for generalized partial credit model. Educational and Psychological Measurement, 69, 913–928.

    Article  Google Scholar 

  45. 45.

    Liang, T., & Wells, C. S. (2015). A nonparametric approach for assessing goodness-of-fit of IRT models in a mixed format test. Applied Measurement in Education, 28, 115–129.

    Article  Google Scholar 

  46. 46.

    Maydeu-Olivares, A. (2005). Further empirical results on parametric versus nonparametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.

    Article  Google Scholar 

  47. 47.

    R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

    Google Scholar 

Download references


We acknowledge the support of a research Grant from the Fonds de recherche du Quebec—Nature et technologies [2019-NC-255344] to the first author.

Author information



Corresponding author

Correspondence to Carl F. Falk.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Falk, C.F., Fischer, F. More flexible response functions for the PROMIS physical functioning item bank by application of a monotonic polynomial approach. Qual Life Res (2021).

Download citation


  • Patient reported outcomes
  • Physical functioning
  • Item response theory
  • Nonparametric methods
  • Monotonic polynomial
  • Graded response model