Scale development with small samples: a new application of longitudinal item response theory

Houts, Carrie R.; Morlock, Robert; Blum, Steven I.; Edwards, Michael C.; Wirth, R. J.

doi:10.1007/s11136-018-1801-z

Scale development with small samples: a new application of longitudinal item response theory

Special Section: Test Construction (by invitation only)
Published: 08 February 2018

Volume 27, pages 1721–1734, (2018)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Carrie R. Houts¹,
Robert Morlock²,
Steven I. Blum³,
Michael C. Edwards⁴ &
…
R. J. Wirth¹

1115 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

Purpose

Measurement development in hard-to-reach populations can pose methodological challenges. Item response theory (IRT) is a useful statistical tool, but often requires large samples. We describe the use of longitudinal IRT models as a pragmatic approach to instrument development when large samples are not feasible.

Methods

The statistical foundations and practical benefits of longitudinal IRT models are briefly described. Results from a simulation study are reported to demonstrate the model’s ability to recover the generating measurement structure and parameters using a range of sample sizes, number of items, and number of time points. An example using early-phase clinical trial data in a rare condition demonstrates these methods in practice.

Results

Simulation study results demonstrate that the longitudinal IRT model’s ability to recover the generating parameters rests largely on the interaction between sample size and the number of time points. Overall, the model performs well even in small samples provided a sufficient number of time points are available. The clinical trial data example demonstrates that by using conditional, longitudinal IRT models researchers can obtain stable estimates of psychometric characteristics from samples typically considered too small for rigorous psychometric modeling.

Conclusion

Capitalizing on repeated measurements, it is possible to estimate psychometric characteristics for an assessment even when sample size is small. This allows researchers to optimize study designs and have increased confidence in subsequent comparisons using scores obtained from such models. While there are limitations and caveats to consider when using these models, longitudinal IRT modeling may be especially beneficial when developing measures for rare conditions and diseases in difficult-to-reach populations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel method for expediting the development of patient-reported outcome measures and an evaluation of its performance via simulation

Article Open access 29 September 2015

Longitudinal Analysis of Patient-Reported Outcomes in Clinical Trials: Applications of Multilevel and Multidimensional Item Response Theory

Article Open access 17 June 2021

Establishing thresholds for meaningful within-individual change using longitudinal item response theory

Article Open access 23 July 2022

Notes

These models could be estimated in any program capable of fitting truly high-dimensional multidimensional IRT models (e.g., IRTPRO, the ‘mirt’ package in R, WINBUGS).

References

Walton, M. K., Powers, J. H., Hobart, J., Patrick, D., Marquis, P., Vamvakas, S., Isaac, M., Molsen, E., et al. (2015). Clinical outcome assessments: Conceptual foundation—Report of the ispor clinical outcomes assessment—Emerging good practices for outcomes research. Value in Health, 18, 741–752.
Article PubMed PubMed Central Google Scholar
Vernon, K., Benjamin, K., Burke, L., & Perfetto, E. (2014). Patient- and observer-reported outcomes measurement in rare disease clinical trials: Emerging good practices. Paper presented at 19th Annual International Meeting, Forum Presentation, Montreal, AB, Canada, June 4, 2014. Retrieved from http://www.ispor.org/meetings/montreal0614/presentations/PRO_and_OSBROForum-AllSpeakers.pdf. Accessed 13 Mar 2017.
Reeve, B. B., & Fayers, P. (2005). Applying item response theory modelling for evaluating questionnaire item and scale properties. In P. Fayers & R. Hay (Eds.), Assessing quality of life in clinical trials: Methods & practice (2nd ed.). Oxford: Oxford University Press.
Google Scholar
Houts, C. R., Edwards, M. C., Wirth, R. J., & Deal, L. (2016). A review of empirical research related to the use of small quantitative samples in clinical outcome scale development. Quality of Life Research, 25, 2685–2269.
Article PubMed Google Scholar
Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144.
Article Google Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd edn.). New York: Marcel Decker, Inc.
Book Google Scholar
Thissen, D., & Wainer, H. (Eds.). (2001). Test Scoring. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New York: Psychology Press.
Google Scholar
Linden, W. J. Van der, & Hambleton, R. K. (Eds.). Handbook of modern item response theory. New York: Springer.
Reckase, M. D. (2009). Multidimensional item response theory models. New York: Springer.
Book Google Scholar
Cai, L. (2010). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35, 307–335.
Article Google Scholar
Oort, F. (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research, 14, 587–598.
Article PubMed Google Scholar
Millsap, R. E. (2010). Testing measurement invariance using item response theory in longitudinal data: An introduction. Child Development Perspectives, 4, 5–9.
Article Google Scholar
Douglas, J. A. (1999). Item response models for longitudinal quality of life data in clinical trials. Statistics in Medicine, 18, 2917–2931.
Article PubMed CAS Google Scholar
Cai, L. (2015). flexMIRT® version 3: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Google Scholar
Roberts, G. O., & Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science, 16, 351–367.
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Article Google Scholar
Wirth, R. J., Edwards, M. C., Henderson, M., Henderson, T., Olivares, G., & Houts, C. R. (2016). Development of the contact lens user experience: CLUE Scales. Optometry and Vision Science, 93, 801–808.
Article PubMed PubMed Central CAS Google Scholar
Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16, 5–18.
Article PubMed Google Scholar
Havaei, F., & Dahinten, V. S. (2017). How well does the CWEQ II measure structural empowerment? Findings from applying item response theory. Administrative Sciences. https://doi.org/10.3390/admsci7020015.
Article Google Scholar
Brown, R. L. (1991). The effect of collapsing ordered polytomous scales on parameter estimates in structural equation measurement models. Educational and Psychological Measurement, 51(2), 317–328.
Article Google Scholar
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26, 339–352.
Article Google Scholar
Hegade, V. S., Kendrick, S. F., Dobbins, R. L., Miller, S. R., Thompson, D., Richards, D., Storey, J., et al. (2017). Effect of ileal bile acid transporter inhibitor GSK2330672 on pruritus in primary biliary cholangitis: A double-blind, randomised, placebo-controlled, crossover, phase 2a study. The Lancet. https://doi.org/10.1016/S0140-6736(17)30319-7.
Article Google Scholar
Talwalkar, J. A., & Lindor, K. D. (2003). Primary biliary cirrhosis. The Lancet, 362(9377), 53–61.
Article CAS Google Scholar
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). (2014). Primary Biliary Cirrhosis. Retrieved from https://www.niddk.nih.gov/health-information/health-topics/liver-disease/primary-biliary-cirrhosis/Pages/facts.aspx. Accessed 13 Mar 2017.
Bergasa, N. V. (2014). Pruritus of cholestasis. In E. Carstens & T. Akiyama (Eds.), Itch: Mechanisms and treatment. Boca Raton: CRC Press/Taylor & Francis.
Google Scholar
Beuers, U., Kremer, A. E., Bolier, R., & Elferink, R. P. (2014). Pruritus in cholestasis: Facts and fiction. Hepatology, 60(1), 399–407.
Article PubMed CAS Google Scholar
Jones, E. A., & Bergasa, N. V. (1999). The pruritus of cholestasis. Hepatology, 29(4), 1003–1006.
Article PubMed CAS Google Scholar
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

The authors wish to express their gratitude to the patients, investigators, and research staff for their participation in the conduct of the GSK2330672 clinical trial and for the use of the data for these analyses.

Author information

Authors and Affiliations

Vector Psychometric Group, LLC, 847 Emily Lane, Chapel Hill, NC, 27516, USA
Carrie R. Houts & R. J. Wirth
YourCareChoice, Ann Arbor, MI, USA
Robert Morlock
GlaxoSmithKline, Collegeville, PA, USA
Steven I. Blum
Arizona State University, Tempe, AZ, USA
Michael C. Edwards

Authors

Carrie R. Houts
View author publications
You can also search for this author in PubMed Google Scholar
Robert Morlock
View author publications
You can also search for this author in PubMed Google Scholar
Steven I. Blum
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Edwards
View author publications
You can also search for this author in PubMed Google Scholar
R. J. Wirth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carrie R. Houts.

Ethics declarations

Conflict of interest

Authors Carrie R. Houts, Michael C. Edwards, and R. J. Wirth are employees of Vector Psychometric Group, LLC, which received consulting fees from GlaxoSmithKline to conduct the clinical trial analyses. Steven I. Blum was an employee at GalxoSmithKline during the project and is shareholder of GlaxoSmithKline.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 83 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Houts, C.R., Morlock, R., Blum, S.I. et al. Scale development with small samples: a new application of longitudinal item response theory. Qual Life Res 27, 1721–1734 (2018). https://doi.org/10.1007/s11136-018-1801-z

Download citation

Accepted: 27 January 2018
Published: 08 February 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11136-018-1801-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scale development with small samples: a new application of longitudinal item response theory