Abstract
Objective
We tested the item response theory (IRT) model assumptions of the original item bank, and evaluated the practical and psychometric adequacy, of a computerized adaptive test (CAT) for patients with foot or ankle impairments seeking rehabilitation in outpatient therapy clinics.
Methods
Data from 10,287 patients with foot or ankle impairments receiving outpatient physical therapy were analyzed. We first examined the unidimensionality, fit, and invariance IRT assumptions of the CAT item bank. Then we evaluated the efficiency of the CAT administration and construct validity and sensitivity of change of the foot/ankle CAT measure of lower-extremity functional status (FS).
Results
Results supported unidimensionality, model fit, and invariance of item parameters and patient ability estimates. On average, the CAT used seven items to produce precise estimates of FS that adequately covered the content range with negligible floor and ceiling effects. Patients who were older, had more chronic symptoms, had more surgeries, had more comorbidities, and did not exercise prior to receiving rehabilitation reported worse discharge FS. Seventy-one percent of patients obtained statistically significant change at follow-up. Change of 8 FS units (scale 0–100) represented minimal clinically important improvement.
Conclusions
We concluded that the foot/ankle item bank met IRT assumptions and that the CAT FS measure was precise, valid, and responsive, supporting its use in routine clinical application.
Similar content being viewed by others
Abbreviations
- ANCOVA:
-
Analyses of covariance
- AUC:
-
Area under the ROC curve
- CAT:
-
Computerized adaptive testing
- CI:
-
Confidence interval
- CPT:
-
Current procedural terminology
- CSEM:
-
Conditional standard error of measurement
- df:
-
Degrees of freedom
- DIF:
-
Differential item functioning
- FCI:
-
Functional comorbidity index
- FOTO:
-
Focus on Therapeutic Outcomes, Inc.
- FS:
-
Functional status
- GROC:
-
Global rating of change
- HMO:
-
Health maintenance organization
- ICD-9:
-
International classification of disease, 9th revision
- IRT:
-
Item response theory
- LEFS:
-
Lower-Extremity Functional Scale
- Max:
-
Maximum
- Min:
-
Minimum
- MDC:
-
Minimal detectable change
- MCII:
-
Minimal clinically important improvement
- P:
-
Probability
- PPO:
-
Preferred provider organization
- PRO:
-
Patient-reported outcome
- RCI:
-
Reliable change index
- ROC:
-
Receiver-operating-characteristic analysis
- SD:
-
Standard deviation
- SE:
-
Standard error
- SEM:
-
Standard error of measurement
- t :
-
t-Test
References
Wainer, H. (Ed.). (2000). Computerized adaptive testing. A primer (2nd ed.). Mahway, NJ: Lawrence Erlbaum.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care, 38(9 Suppl), II28–II42.
Sands, W. A., Waters, B. K., & McBride, J. R. (Eds.). (1997). Computerized adaptive testing. From inquiry to operation. Washington, DC: American Psychological Association.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Lord, F. (1970). Some test theory for tailored testing. In W. Holtzman (Ed.), Computer-assisted instruction, testing, and guidance (pp. 139–183). New York, NY: Harper and Row.
Ware, J. E., Bjorner, J. B., Jr., & Kosinski, M. (2000). Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales. Medical Care, 38(9 Suppl), II73–II82.
Ware, J. E., Jr., Kosinski, M., Bjorner, J. B., Bayliss, M. S., Batenhorst, A., Dahlof, C. G., et al. (2003). Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality of Life Research, 12(8), 935–952. doi:10.1023/A:1026115230284.
Jette, A. M., Haley, S. M., Tao, W., Ni, P., Moed, R., Meyers, D., et al. (2007). Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Physical Therapy, 87(4), 385–398.
McHorney, C. A. (1997). Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century. Annals of Internal Medicine, 127(8 Pt 2), 743–750.
Patrick, D. L., & Chiang, Y. P. (2000). Convening health outcomes methodologists. Medical Care, 38(9, Suppl), II3–II6.
Revicki, D. A., & Cella, D. F. (1997). Health status assessment for the twenty-first century: Item response theory, item banking and computer adaptive testing. Quality of Life Research, 6(6), 595–600. doi:10.1023/A:1018420418455.
Haley, S. M., Ni, P., Hambleton, R. K., Slavin, M. D., & Jette, A. M. (2006). Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank. Journal of Clinical Epidemiology, 59(11), 1174–1182. doi:10.1016/j.jclinepi.2006.02.010.
Hart, D. L., Cook, K. F., Mioduski, J. E., Teal, C. R., & Crane, P. K. (2006). Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. Journal of Clinical Epidemiology, 59(3), 290–298. doi:10.1016/j.jclinepi.2005.08.006.
Hart, D. L., Mioduski, J. E., & Stratford, P. W. (2005). Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. Journal of Clinical Epidemiology, 58(6), 629–638. doi:10.1016/j.jclinepi.2004.12.004.
Hart, D. L., Mioduski, J. E., Werneke, M. W., & Stratford, P. W. (2006). Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function. Journal of Clinical Epidemiology, 59(9), 947–956. doi:10.1016/j.jclinepi.2005.10.017.
Ware, J. E., Gandek, B., Sinclair, S. J., & Bjorner, J. (2005). Item response theory in computer adaptive testing: Implications for outcomes measurement in rehabilitation. Rehabilitation Psychology, 50, 71–78. doi:10.1037/0090-5550.50.1.71.
Deutscher, D., Hart, D. L., Dickstein, R., Horn, S. D., & Gutvirtz, M. (2008). Implementing an integrated electronic outcomes and electronic health record process to create a foundation for clinical practice improvement. Physical Therapy, 88(2), 270–285.
Hart, D. L., & Connolly, J. B. (2006). Pay-for-performance for physical therapy and occupational therapy: Medicare part B services. Grant #18-P-93066/9-01. Health & Human Services/Centers for Medicare & Medicaid Services.
Hart, D. L., Wang, Y. C., Stratford, P. W., & Mioduski, J. E. (2008). Computerized adaptive test for patients with knee impairments produced valid and responsive measures of function. Journal of Clinical Epidemiology, July 9. doi:10.1016/j.jclinepi.2008.01.005.
Hart, D. L., Wang, Y. C., Stratford, P. W., & Mioduski, J. E. (2008). Computerized adaptive test for patients with hip impairments produced valid and responsive measures of function. Archives of Physical Medicine and Rehabilitation, i.
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology, 61(1), 17–33. doi:10.1016/j.jclinepi.2006.06.025.
Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11. doi:10.1097/01.mlr.0000258615.42478.55.
Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research, 14(10), 2277–2291. doi:10.1007/s11136-005-6651-9.
Haley, S. M., Raczek, A. E., Coster, W. J., Dumas, H. M., & Fragala-Pinkham, M. A. (2005). Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory. Archives of Physical Medicine and Rehabilitation, 86(5), 932–939. doi:10.1016/j.apmr.2004.10.032.
Haley, S. M., Coster, W. J., Andres, P. L., Kosinski, M., & Ni, P. (2004). Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care. Archives of Physical Medicine and Rehabilitation, 85(4), 661–666. doi:10.1016/j.apmr.2003.08.097.
Haley, S. M., Coster, W. J., Andres, P. L., Ludlow, L. H., Ni, P., Bond, T. L., et al. (2004). Activity outcome measurement for postacute care. Medical Care, 42(1, Suppl), I49–I61.
American Physical Therapy Association. (2001). Guide to physical therapist practice. Physical Therapy, 81(1), 1–768.
Resnik, L., & Hart, D. L. (2003). Using clinical outcomes to identify expert physical therapists. Physical Therapy, 83(11), 990–1002.
Centers for Medicare and Medicaid Services. (2007). Physician quality reporting initiative (PQRI). Physician quality measures. Centers for Medicare and Medicaid Services.
Swinkels, I. C. S., van den Ende, C. H. M., de Bakker, D., van der Wees, J., Hart, D. L., Deutscher, D., et al. (2007). Clinical databases in physical therapy. Physiotherapy Theory and Practice, 23(3), 153–167. doi:10.1080/09593980701209097.
Alcock, G. K., & Stratford, P. W. (2002). Validation of the lower extremity functional scale on athletic subjects with ankle sprains. Physiotherapy Canada, 54, 233–240.
Binkley, J. M., Stratford, P. W., Lott, S. A., & Riddle, D. L. (1999). The lower extremity functional scale (LEFS): Scale development, measurement properties, and clinical application North American Orthopaedic Rehabilitation Research Network. Physical Therapy, 79(4), 371–383.
Stratford, P. W., Hart, D. L., Binkley, J. M., Kennedy, D. M., Alcock, G. K., & Hanna, S. E. (2005). Interpreting lower extremity functional status scores. Physiotherapy Canada, 57, 154–162. doi:10.2310/6640.2005.00023.
World Health Organization. (2001). International classification of functioning, disability and health. Geneva: World Health Organization.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. doi:10.1007/BF02293814.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 287–334.
Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59(5), 478–484. doi:10.1016/j.jclinepi.2005.10.007.
Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 101–134). Mahwah, NJ: Lawrence Erlbaum.
Hart, D. L., & Mioduski, J. E. (2006). CAT development and testing software user’s guide. Knoxville, TN: FOTO, Inc.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Linacre, J. M. (1998). Estimating measures with known polytomous item difficulties. Rasch Measurement Transactions, 12(2), 638.
Linacre, J. M. A. (2008). User’s guide to WINSTEPS. Chicago, IL: MESA.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model. Mahwah, NJ: Lawrence Erlbaum.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Steinberg, L., Thissen, D., & Wainer, H. (2000). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 185–229). Mahwah, NJ: Lawerence Erlbaum.
Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., et al. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16(Suppl 1), 69–84. doi:10.1007/s11136-007-9185-5.
Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23(2), 241–256. doi:10.1002/sim.1713.
Samejima, F. (1969). Estimation of ability using a response pattern of graded responses. Psycometrika, Monograph 17.
PARSCALE for Windows.v 4.1. (2003). Lincolnwood, IL: Scientific Software International.
Stata Statistical Software. (2007). Release 9.2. College Station, TX.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11, Suppl 3), S115–S123. doi:10.1097/01.mlr.0000245183.28384.ed.
Groll, D. L., To, T., Bombardier, C., & Wright, J. G. (2005). The development of a comorbidity index with physical function as the outcome. Journal of Clinical Epidemiology, 58(6), 595–602. doi:10.1016/j.jclinepi.2004.10.018.
Vickers, A. J., & Altman, D. G. (2001). Statistics notes: Analysing controlled trials with baseline and follow up measurements. BMJ (Clinical Research Ed.), 323(7321), 1123–1124. doi:10.1136/bmj.323.7321.1123.
Bland, J. M., & Altman, D. G. (1994). Regression towards the mean. BMJ (Clinical Research Ed.), 308(6942), 1499.
Resnik, L., Feng, Z., & Hart, D. L. (2006). State regulation and the delivery of physical therapy services. Health Services Research, 41(4 Pt 1), 1296–1316.
Wyrwich, K. W., Tierney, W. M., & Wolinsky, F. D. (1999). Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. Journal of Clinical Epidemiology, 52(9), 861–873. doi:10.1016/S0895-4356(99)00071-2.
Hsieh, Y. W., Wang, C. H., Wu, S. C., Chen, P. C., Sheu, C. F., & Hsieh, C. L. (2007). Establishing the minimal clinically important difference of the Barthel Index in stroke patients. Neurorehabilitation and Neural Repair, 21(3), 233–238. doi:10.1177/1545968306294729.
Schmitt, J. S., & Di Fabio, R. P. (2004). Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. Journal of Clinical Epidemiology, 57(10), 1008–1018. doi:10.1016/j.jclinepi.2004.02.007.
Hays, R. D., Brodsky, M., Johnston, M. F., Spritzer, K. L., & Hui, K. K. (2005). Evaluating the statistical significance of health-related quality-of-life change in individual patients. Evaluation and the Health Professions, 28(2), 160–171. doi:10.1177/0163278705275339.
Beaton, D. E., Bombardier, C., Katz, J. N., & Wright, J. G. (2001). A taxonomy for responsiveness. Journal of Clinical Epidemiology, 54(12), 1204–1217. doi:10.1016/S0895-4356(01)00407-3.
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status. Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407–415. doi:10.1016/0197-2456(89)90005-6.
Stratford, P. W., Binkley, J. M., Watson, J., & Heath-Jones, T. (2000). Validation of the LEFS on patients with total joint arthroplasty. Physiotherapy Canada, 52, 97–205.
Acknowledgments
The authors would like to thank Karon F. Cook, PhD for her insightful comments regarding statistical analyses, results, and manuscript edits.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hart, D.L., Wang, YC., Stratford, P.W. et al. Computerized adaptive test for patients with foot or ankle impairments produced valid and responsive measures of function. Qual Life Res 17, 1081–1091 (2008). https://doi.org/10.1007/s11136-008-9381-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-008-9381-y