Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Book
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Ravens-Sieberer, U., Herdman, M., Devine, J., Otto, C., Bullinger, M., Rose, M., et al. (2014). The European KIDSCREEN approach to measure quality of life and well-being in children: Development, current application, and future advances. Quality of Life Research, 23(3), 791–803. doi:10.1007/s11136-013-0428-3.
Article
PubMed
Google Scholar
Jones, P. W. (1998). Testing health status (“quality of life”) questionnaires for asthma and COPD. European Respiratory Journal, 11(1), 5–6.
Article
PubMed
CAS
Google Scholar
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press.
Google Scholar
Food and Drug Administration. (2009). Patient-reported outcome measures: use in medical product development to support labeling claims. Guidance for industry, US Department of Health and Human Services.
Foster, C. B., Gorga, D., Padial, C., Feretti, A. M., Berenson, D., Kline, R., et al. (2004). The development and validation of a screening instrument to identify hospitalized medical patients in need of early functional rehabilitation assessment. Quality of Life Research, 13(6), 1099–1108. doi:10.1023/B:QURE.0000031346.27185.8f.
Article
PubMed
Google Scholar
De Vet, H. C. W., Terwee, C. B., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine: A practical guide. Cambridge: Cambridge University Press.
Book
Google Scholar
Fayers, P. M., & Machin, D. (2015). Quality of life: The assessment, analysis and reporting of patient-reported outcomes. New York: Wiley.
Book
Google Scholar
Johnson, C., Aaronson, N., Blazeby, J. M., Bottomley, A., Fayers, P., Koller, M., et al. (2011). Guidelines for developing questionnaire modules (4th ed.). Belgium: EORTC Quality of Life Group.
Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
Article
Google Scholar
Kim, J.-O., & Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues. Beverly Hills, CA: SAGE Publications.
Book
Google Scholar
Embretson, S., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45, S22–31.
Article
PubMed
Google Scholar
Guttman, L. (1941). An outline of the statistical theory of prediction. In P. Horst et al. (Eds.), The prediction of personal adjustment (Supplementary study B-1). New York: Social Science Research Council.
Guttman, L. (1971). Measurement as structural theory. Psychometrika, 36(4), 329–347.
Article
Google Scholar
Finkelman, M. D., Smits, N., Kulich, R. J., Zacharoff, K. L., Magnuson, B. E., Chang, H., et al. (2016). Development of short-form versions of the screener and opioid assessment for patients with pain-revised (SOAPP-R): A proof-of-principle study. Pain Medicine, 18, 1292–1302. doi:10.1093/pm/pnw210.
Article
Google Scholar
Lin, A., Yung, A. R., Wigman, J. T. W., Killackey, E., Baksheev, G., & Wardenaar, K. J. (2014). Validation of a short adaptation of the mood and anxiety symptoms questionnaire (MASQ) in adolescents and young adults. Psychiatry Research, 215(3), 778–783. doi:10.1016/j.psychres.2013.12.018.
Article
PubMed
Google Scholar
Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart and Winston.
Google Scholar
Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. The Hague: Eleven International Publishing.
Google Scholar
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.
Article
Google Scholar
Landsheer, J. A., & Boeije, H. R. (2008). In search of content validity: Facet analysis as a qualitative method to improve questionnaire design. Quality & Quantity, 44, 59.
Article
Google Scholar
Brod, M., Tesler, L. E., & Christensen, T. L. (2009). Qualitative research and content validity: Developing best practices based on science and experience. Quality of Life Research, 18, 1263–1278.
Article
PubMed
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York: Springer.
Book
Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.
Google Scholar
Raykov, T. (2007a). Reliability if deleted, not ‘alpha if deleted’: Evaluation of scale reliability following component deletion. British Journal of Mathematical and Statistical Psychology, 60(2), 201–216.
Article
PubMed
Google Scholar
Raykov, T. (2007b). Alpha if item deleted: A note on loss of criterion validity in scale development if maximizing coefficient alpha. British Journal of Mathematical and Statistical Psychology, 61, 275–285.
Article
PubMed
Google Scholar
Oosterwijk, P. R., van der Ark, & L. A., Sijtsma, K. (2017). Using confidence intervals for assessing reliability of real tests. Assessment. Advance online publication. doi:10.1177/1073191117737375.
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285–1293.
Article
PubMed
CAS
Google Scholar
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana: University of Illinois Press.
Google Scholar
Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401. doi:10.1177/014662167700100306.
Article
Google Scholar
Sheehan, D. V., Lecrubier, Y., Sheehan, K. H., Amorim, P., Janavs, J., Weiller, E., et al. (1998). The mini-international neuropsychiatric interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59(suppl 20), 22–57.
PubMed
Google Scholar
Smits, N., Cuijpers, P., & van Straten, A. (2011). Applying computerized adaptive testing to the CES-D scale: A simulation study. Psychiatry Research, 188, 147–155. doi:10.1016/j.psychres.2010.12.001.
Article
PubMed
Google Scholar
Evers, A., Hagemeister, C., Höstmælingen, A., Lindley, P., Muñiz, J., & Sjöberg. (2013). EFPA review model for the description and evaluation of psychological and educational tests. Test review form and notes for reviewers, European Federation of Psychologists Associations.
Ten Berge, J. M. F. (2005). Tau-equivalent and congeneric measurements. Wiley StatsRef: Statistics Reference Online.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(10), 255–282.
Article
PubMed
CAS
Google Scholar
Windle, C. (1954). Test-retest effect on personality questionnaires. Educational and Psychological Measurement, 14(4), 617–636.
Article
Google Scholar
Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9(2), 195–212.
Article
Google Scholar
van der Ark, L. A., van der Palm, D. W., & Sijtsma, K. (2011). A latent class approach to estimating test-score reliability. Applied Psychological Measurement, 35(5), 380–392.
Article
Google Scholar
Cohen, R. J., Swerdlik, M. E., & Sturman, E. D. (2013). Psychological testing and assessment: An introduction to tests and measurement. New York: McGraw-Hill.
Google Scholar
Perrine, K. J., Hermann, B. P., Meador, K. J., Vickrey, B. G., Cramer, J. A., Hays, R. D., et al. (1995). The relationship of neuropsychological functioning to quality of life in epilepsy. Archives of Neurology, 52(10), 997–1003.
Article
PubMed
CAS
Google Scholar
Milanzi, E., Molenberghs, G., Alonso, A., Verbeke, G., & De Boeck, P. (2015). Reliability measures in item response theory: Manifest versus latent correlation functions. British Journal of Mathematical and Statistical Psychology, 68, 43–64.
Article
PubMed
Google Scholar
Revicki, D. A., Chen, W.-H., & Tucker, C. (2015). Developing item banks for patient-reported health outcomes. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 334–363). New York, NY: Routledge.
Google Scholar
Zijlmans, E. A. O., Tijmstra, J., van der Ark, L. A., & Sijtsma, K. (2017). Item-score reliability in empirical-data sets and its relationship with other item indices. Educational and Psychological Measurement. Advance online publication. doi:10.1177/0013164417728358.
Travers, R. M. W. (1951). Rational hypotheses in the construction of tests. Educational and Psychological Measurement, 11(1), 128–137.
Article
Google Scholar
Ware, J. E., Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36): I conceptual framework and item selection. Medical Care, pages 473–483.
Hand, D. J. (1987). Screening vs prevalence estimation. Journal of the Royal Statistical Society. Series C (Applied Statistics), 36(1), 1–7.
Google Scholar
Kroenke, K., & Spitzer, R. L. (2002). The PHQ-9: A new depression diagnostic and severity measure. Psychiatric Annals, 32(9), 509–515.
Article
Google Scholar
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine, 166(10), 1092–1097.
Article
PubMed
Google Scholar
Krebs, E. E., Carey, T. S., & Weinberger, M. (2007). Accuracy of the pain numeric rating scale as a screening test in primary care. Journal of General Internal Medicine, 22(10), 1453–1458.
Article
PubMed
PubMed Central
Google Scholar
Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Review of Clinical Psychology, 5, 27–48.
Article
Google Scholar
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391–418.
Article
Google Scholar
McCrae, R. R. (2015). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19(2), 97–112.
Article
PubMed
Google Scholar
Streiner, D. L. (2003). Being inconsistent about consistency: When coefficient alpha does and doesn’t matter. Journal of Personality Assessment, 80(3), 217–222.
Article
PubMed
Google Scholar
Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short-form development. Psychological Assessment, 12(1), 102–111.
Article
PubMed
CAS
Google Scholar
Devine, J., Fliege, H., Kocaleven, R., Mierkeand, A., Klapp, B. F., & Rose, M. (2016). Evaluation of computerized adaptive tests (CATs) for longitudinal monitoring of depression, anxiety, and stress reactions. Journal of Affective Disorders, 190, 846–853.
Article
PubMed
Google Scholar
Zheng, Y., Chang, C.-H., & Chang, H.-H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22(3), 491–499. doi:10.1007/s11136-012-0179-6.
Article
PubMed
Google Scholar