Abstract
Surveys often add reverse-coded questions to monitor respondents with insufficient effort responses (IERs) but often wrongly assume that all respondents consistently answer all questions with full effort. By contrast, this study expanded the mixture model for IERs and ran a simulation via LatentGOLD to show the harmful consequences of ignoring IERs to positively and negatively worded questions: less test reliability, bias and less accuracy in slope and intercept parameters. We showed its practical application to two public data sets: Machiavellianism (five-point scale) and self-reported depression (four-point scale).
Similar content being viewed by others
Data Availability
Example 1 is available from https://openpsychometrics.org/_rawdata/. Example 2 is available from https://doi.org/10.5334/jopd.35. The LatentGOLD syntax for Example 1 is available from https://osf.io/rjc5p/.
Notes
Other models allow nonuniform distributions of random responses (Meade & Craig, 2012).
When simulees were generated with no IERs (i.e., πu = πv = [1, 0, 0]), EMMIER and GPCM are essentially equivalent.
Sign-reversed δ-parameter estimates in LatentGOLD can be comparable to their true values.
The data set is available from https://openpsychometrics.org/_rawdata/.
Only 19 questions were included in the analyses, because the slope parameter for Q17 (“P.T. Barnum was wrong when he said that there's a sucker born every minute”) from any analytical model was very close to zero.
References
Arias, V. B., Garrido, L. E., Jenaro, C., Martinez-Molina, A., & Arias, B. (2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 2489–2505. https://doi.org/10.3758/s13428-020-01401-8
Baumgartner, H., & Steenkamp, J.-B.E.M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143–156. https://doi.org/10.1509/jmkr.38.2.143.18840
Bijslma, H. J. E., Glas, C. A. W., & Visscher, A. J. (2022). Factors related to differences in digitally measured student perceptions of teaching quality. School Effectiveness and School Improvement, 33(3), 360–380. https://doi.org/10.1080/09243453.2021.2023584
Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665–678. https://doi.org/10.1037/a0028111
Bolt, D., Wang, Y. C., Meyer, R. H., & Pier, L. (2020). An IRT mixture model for rating scale confusion associated with negatively worded items in measures of social-emotional learning. Applied Measurement in Education, 33(4), 331–348. https://doi.org/10.1080/08957347.2020.1789140
Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E. (2016). Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. Journal of Personality and Social Psychology, 111(2), 218–229. https://doi.org/10.1037/pspp0000085
Bowling, N. A., Gibson, A. M., Houpt, J. W., & Brower, C. K. (2021). Will the questions ever end? Person-level increases in careless responding during questionnaire completion. Organizational Research Methods, 24(4), 718–738. https://doi.org/10.1177/1094428120947794
Bowling, N. A., Huang, J. L., Brower, C. K., & Bragg, C. B. (2023). The quick and the careless: The construct validity of page time as a measure of insufficient effort responding to surveys. Organizational Research Methods, 26(2), 323–352. https://doi.org/10.1177/10944281211056520
Chen, H.-F., & Jin, K.-Y. (2022). The impact of item feature and response preference in mixed-format design. Multivariate Behavioral Research, 57(2–3), 208–222. https://doi.org/10.1080/00273171.2020.1820308
Christie, R., & Geis, F. (1970). Studies in Machiavellianism. Academic Press.
Cole, K. L., Turner, R. C., & Gitchel, W. D. (2019). A study of polytomous IRT methods and item wording directionality effects on perceived stress items. Personality and Individual Differences, 147, 63–72. https://doi.org/10.1016/j.paid.2019.03.046
Conijn, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122–136. https://doi.org/10.1177/0146621613497568
DeSimone, J. A., Davison, H. K., Schoen, J. L., & Bing, M. N. (2020). Insufficient effort responding as a partial function of implicit aggression. Organizational Research Methods, 23(1), 154–180. https://doi.org/10.1177/1094428118799486
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polytomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Emons, W. H. M. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224–247. https://doi.org/10.1177/0146621607302479
Ferrando, P. J., & Lorenzo-Seva, U. (2010). Acquiescence as a source of bias and model and person misfit: A theoretical and empirical analysis. British Journal of Mathematical and Statistical Psychology, 63(2), 427–448. https://doi.org/10.1348/000711009X470740
Gibson, A. M., & Bowling, N. A. (2020). The effects of questionnaire length and behavioral consequences on careless responding. European Journal of Psychological Assessment, 36(2), 410–420. https://doi.org/10.1027/1015-5759/a000526
Grau, I., Ebbeler, C., & Banse, R. (2019). Cultural differences in careless responding. Journal of Cross-Cultural Psychology, 50(3), 336–357. https://doi.org/10.1177/0022022119827379
Hong, M., Steedle, J. T., & Cheng, Y. (2020). Methods of detecting insufficient effort responding: Comparisons and practical recommendations. Educational and Psychological Measurement, 80(2), 312–345. https://doi.org/10.1177/0013164419865316
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Jin, K.-Y., Chen, H.-F., & Wang, W.-C. (2018). Mixture item response models for inattentive responding behavior. Organizational Research Methods, 21(1), 197–225. https://doi.org/10.1177/1094428117725792
Jin, K.-Y., Wu, Y.-J., & Chen, H.-F. (2022). A new multi-process IRT model with ideal points for Likert-type items. Journal of Educational and Behavioral Statistics, 47(3), 297–321. https://doi.org/10.3102/10769986211057160
Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18(3), 512–541. https://doi.org/10.1177/1094428115571894
Koutsogiorgi, C. C., & Michaelides, M. P. (2022). Response tendencies due to item wording using eye-tracking methodology accounting for individual differences and item characteristics. Behavior Research Methods, 54(5), 2252–2270. https://doi.org/10.3758/s13428-021-01719-x
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085
Mokken, R. J. (1971). A theory and procedure of scale analysis. De Gruyter. https://doi.org/10.1515/9783110813203
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63(1), 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
Ou, X. (2022). Multidimensional structure or wording effect? Reexamination of the factor structure of the Chinese general self-efficacy scale. Journal of Personality Assessment, 104(1), 64–73. https://doi.org/10.1080/00223891.2021.1912059
Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401. https://doi.org/10.1177/014662167700100306
Schroeders, U., Schmidt, C., & Gnambs, T. (2022). Detecting careless responding in survey data using stochastic gradient boosting. Educational and Psychological Measurement, 82(1), 29–56. https://doi.org/10.1177/00131644211004708
Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2007). WinBUGS (version 1.4.3) [computer software]. MRC biostatistics unit, Institute of Public Health. https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf
Steinmann, I., van Sánchez, D., Laar, S., & Braeken, J. (2022). The impact of inconsistent responders to mixed-worded scales on inferences in international large-scale assessments. Assessment in Education: Principles, Policy & Practice, 29(1), 5–26. https://doi.org/10.1080/0969594X.2021.2005302
Sun, T., Zhang, B., Cao, M., & Drasgow, F. (2022). Faking detection improved: Adopting a Likert item response process tree model. Organizational Research Methods, 25(3), 490–512. https://doi.org/10.1177/10944281211002904
van Laar, S., & Braeken, J. (2022). Random responders in the TIMSS 2015 student questionnaire: A threat to validity? Journal of Educational Measurement, 59(4), 470–501. https://doi.org/10.1111/jedm.12317
Vermunt, J. K., & Magidson, J. (2016). Technical guide to Latent Gold 5.1: Basic, advanced, and syntax. Statistical Innovations.
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054
Wang, W.-C., Chen, H.-F., & Jin, K.-Y. (2015). Item response theory models for wording effects in mixed-format scales. Educational and Psychological Measurement, 75(1), 157–178. https://doi.org/10.1177/0013164414528209
Ward, M. K., Meade, A. W., Allred, C. M., Pappalardo, G., & Stoughton, J. W. (2017). Careless response and attrition as sources of bias in online survey assessments of personality traits and performance. Computers in Human Behavior, 76, 417–430. https://doi.org/10.1016/j.chb.2017.06.032
Wetzel, E., & Carstensen, C. H. (2014). Reversed thresholds in partial credit models: A reason for collapsing categories? Assessment, 21(6), 765–774. https://doi.org/10.1177/1073191114530775
Wind, S. A., & Wang, Y. (2022). Using Mokken scaling techniques to explore carelessness in survey research. Behavior Research Methods. Advanced Online Publication. https://doi.org/10.3758/s13428-022-01960-y
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2018). Data from, ‘web-based positive psychology interventions: A reexamination of effectiveness.’ Journal of Open Psychology Data, 6(1), 1. https://doi.org/10.5334/jopd.35
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, KY., Chiu, M.M. Modeling insufficient effort responses in mixed-worded scales. Behav Res 56, 2260–2272 (2024). https://doi.org/10.3758/s13428-023-02146-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-023-02146-w