Abstract
Purpose
Measurement development in hard-to-reach populations can pose methodological challenges. Item response theory (IRT) is a useful statistical tool, but often requires large samples. We describe the use of longitudinal IRT models as a pragmatic approach to instrument development when large samples are not feasible.
Methods
The statistical foundations and practical benefits of longitudinal IRT models are briefly described. Results from a simulation study are reported to demonstrate the model’s ability to recover the generating measurement structure and parameters using a range of sample sizes, number of items, and number of time points. An example using early-phase clinical trial data in a rare condition demonstrates these methods in practice.
Results
Simulation study results demonstrate that the longitudinal IRT model’s ability to recover the generating parameters rests largely on the interaction between sample size and the number of time points. Overall, the model performs well even in small samples provided a sufficient number of time points are available. The clinical trial data example demonstrates that by using conditional, longitudinal IRT models researchers can obtain stable estimates of psychometric characteristics from samples typically considered too small for rigorous psychometric modeling.
Conclusion
Capitalizing on repeated measurements, it is possible to estimate psychometric characteristics for an assessment even when sample size is small. This allows researchers to optimize study designs and have increased confidence in subsequent comparisons using scores obtained from such models. While there are limitations and caveats to consider when using these models, longitudinal IRT modeling may be especially beneficial when developing measures for rare conditions and diseases in difficult-to-reach populations.
Similar content being viewed by others
Notes
These models could be estimated in any program capable of fitting truly high-dimensional multidimensional IRT models (e.g., IRTPRO, the ‘mirt’ package in R, WINBUGS).
References
Walton, M. K., Powers, J. H., Hobart, J., Patrick, D., Marquis, P., Vamvakas, S., Isaac, M., Molsen, E., et al. (2015). Clinical outcome assessments: Conceptual foundation—Report of the ispor clinical outcomes assessment—Emerging good practices for outcomes research. Value in Health, 18, 741–752.
Vernon, K., Benjamin, K., Burke, L., & Perfetto, E. (2014). Patient- and observer-reported outcomes measurement in rare disease clinical trials: Emerging good practices. Paper presented at 19th Annual International Meeting, Forum Presentation, Montreal, AB, Canada, June 4, 2014. Retrieved from http://www.ispor.org/meetings/montreal0614/presentations/PRO_and_OSBROForum-AllSpeakers.pdf. Accessed 13 Mar 2017.
Reeve, B. B., & Fayers, P. (2005). Applying item response theory modelling for evaluating questionnaire item and scale properties. In P. Fayers & R. Hay (Eds.), Assessing quality of life in clinical trials: Methods & practice (2nd ed.). Oxford: Oxford University Press.
Houts, C. R., Edwards, M. C., Wirth, R. J., & Deal, L. (2016). A review of empirical research related to the use of small quantitative samples in clinical outcome scale development. Quality of Life Research, 25, 2685–2269.
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144.
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd edn.). New York: Marcel Decker, Inc.
Thissen, D., & Wainer, H. (Eds.). (2001). Test Scoring. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New York: Psychology Press.
Linden, W. J. Van der, & Hambleton, R. K. (Eds.). Handbook of modern item response theory. New York: Springer.
Reckase, M. D. (2009). Multidimensional item response theory models. New York: Springer.
Cai, L. (2010). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35, 307–335.
Oort, F. (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research, 14, 587–598.
Millsap, R. E. (2010). Testing measurement invariance using item response theory in longitudinal data: An introduction. Child Development Perspectives, 4, 5–9.
Douglas, J. A. (1999). Item response models for longitudinal quality of life data in clinical trials. Statistics in Medicine, 18, 2917–2931.
Cai, L. (2015). flexMIRT® version 3: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Roberts, G. O., & Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science, 16, 351–367.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Wirth, R. J., Edwards, M. C., Henderson, M., Henderson, T., Olivares, G., & Houts, C. R. (2016). Development of the contact lens user experience: CLUE Scales. Optometry and Vision Science, 93, 801–808.
Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16, 5–18.
Havaei, F., & Dahinten, V. S. (2017). How well does the CWEQ II measure structural empowerment? Findings from applying item response theory. Administrative Sciences. https://doi.org/10.3390/admsci7020015.
Brown, R. L. (1991). The effect of collapsing ordered polytomous scales on parameter estimates in structural equation measurement models. Educational and Psychological Measurement, 51(2), 317–328.
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26, 339–352.
Hegade, V. S., Kendrick, S. F., Dobbins, R. L., Miller, S. R., Thompson, D., Richards, D., Storey, J., et al. (2017). Effect of ileal bile acid transporter inhibitor GSK2330672 on pruritus in primary biliary cholangitis: A double-blind, randomised, placebo-controlled, crossover, phase 2a study. The Lancet. https://doi.org/10.1016/S0140-6736(17)30319-7.
Talwalkar, J. A., & Lindor, K. D. (2003). Primary biliary cirrhosis. The Lancet, 362(9377), 53–61.
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). (2014). Primary Biliary Cirrhosis. Retrieved from https://www.niddk.nih.gov/health-information/health-topics/liver-disease/primary-biliary-cirrhosis/Pages/facts.aspx. Accessed 13 Mar 2017.
Bergasa, N. V. (2014). Pruritus of cholestasis. In E. Carstens & T. Akiyama (Eds.), Itch: Mechanisms and treatment. Boca Raton: CRC Press/Taylor & Francis.
Beuers, U., Kremer, A. E., Bolier, R., & Elferink, R. P. (2014). Pruritus in cholestasis: Facts and fiction. Hepatology, 60(1), 399–407.
Jones, E. A., & Bergasa, N. V. (1999). The pruritus of cholestasis. Hepatology, 29(4), 1003–1006.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Acknowledgements
The authors wish to express their gratitude to the patients, investigators, and research staff for their participation in the conduct of the GSK2330672 clinical trial and for the use of the data for these analyses.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors Carrie R. Houts, Michael C. Edwards, and R. J. Wirth are employees of Vector Psychometric Group, LLC, which received consulting fees from GlaxoSmithKline to conduct the clinical trial analyses. Steven I. Blum was an employee at GalxoSmithKline during the project and is shareholder of GlaxoSmithKline.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Houts, C.R., Morlock, R., Blum, S.I. et al. Scale development with small samples: a new application of longitudinal item response theory. Qual Life Res 27, 1721–1734 (2018). https://doi.org/10.1007/s11136-018-1801-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-018-1801-z