Abstract
To detect bots in online survey data, there is a wealth of literature on statistical detection using only responses to Likert-type items. There are two traditions in the literature. One tradition requires labeled data, forgoing strong model assumptions. The other tradition requires a measurement model, forgoing collection of labeled data. In the present article, we consider the problem where neither requirement is available, for an inventory that has the same number of Likert-type categories for all items. We propose a bot detection algorithm that is both model-agnostic and unsupervised. Our proposed algorithm involves a permutation test with leave-one-out calculations of outlier statistics. For each respondent, it outputs a p value for the null hypothesis that the respondent is a bot. Such an algorithm offers nominal sensitivity calibration that is robust to the bot response distribution. In a simulation study, we found our proposed algorithm to improve upon naive alternatives in terms of 95% sensitivity calibration and, in many scenarios, in terms of classification accuracy.
Similar content being viewed by others
Open Practices Statement
Example R code for implementing the methods studied in the manuscript is available at an Open Science Framework repository (https://osf.io/e5v3s/). The simulation study was not preregistered.
Notes
Some studies blur the distinction between the two traditions. Person-fit statistics require measurement model assumptions but are often used (e.g., Beck et al., 2019) toward direct detection. Interestingly, the approach in Patton et al. (2019) iteratively re-estimates parameters of a measurement model by removing suspicious respondents directly detected. Regardless, our taxonomy remains instructive for the purposes of the present article.
Note then that i is ambiguous as an index—it may refer to a training set respondent or a test set respondent, depending on the presence of the superscript.
Schroeders et al. (2022) is an unusual case where the favored approach does not involve NRIs—the raw Likert-type item responses are themselves the features, a convenience afforded by their flexible (and complicated) classifier. Note that raw Likert-type item responses as features admits no obvious ideal point, in contrast to our proposed approach.
The setup for unsupervised learning differs slightly from common textbook examples (e.g., James et al., 2013) where the goal is to extract features that can reproduce the data at hand. In such examples, there is still no labeled data and the existence of separate classes is not assumed. Furthermore, in both supervised and unsupervised paradigms, a user-defined training/test split of the data at hand is possible in order to assess model performance and prevent overfitting. If such a split is done, both training and test sets in the supervised paradigm contain labeled data, and neither set has labeled data in the unsupervised paradigm. In either paradigm, the chosen model can then be applied to yet additional future observations. We do not employ or consider this alternative use of training/test set terminology and instead reserve such terms to signal the presence/absence of known class exemplars when constructing the classifier.
References
Beck, M. F., Albano, A. D., & Smith, W. M. (2019). Person-fit as an index of inattentive responding: A comparison of methods using polytomous survey data. Applied Psychological Measurement, 43 (5). https://doi.org/10.1177/0146621618798666
Bengtsson, H. (2021). A unifying framework for parallel and distributed processing in R using futures. Retrieved from https://journal.r-project.org/archive/2021/RJ-2021-048/index.html (R package version 1.21.0)
Bock, R. D., & Gibbons, R. D. (2021). Item response theory. Hoboken, NJ: Wiley.
Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50, 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Chmielewski, M., & Kucker, S. C. (2020). An MTurk crisis? Shifts in data quality and the impact on study results. Social Psychological and Personality Science, 11(4), 464–473. https://doi.org/10.1177/1948550619875149
Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70(4), 596–612. https://doi.org/10.1177/0013164410366686
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006
DeSimone, J. A., & Harms, P. D. (2018). Dirty data: The effects of screening respondents who provide low-quality data in survey research. Journal of Business and Psychology, 33, 559–577. https://doi.org/10.1007/s10869-017-9514-9
Dupuis, M., Meier, E., & Cuneo, F. (2019). Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods, 51, 2228–2237. https://doi.org/10.3758/s13428-018-1103-y
Hong, M. R., & Cheng, Y. (2018). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51, 573–588. https://doi.org/10.3758/s13428-018-1150-4
Hong, M. R., Steedle, J. T., & Cheng, Y. (2020). Methods of detecting insufficient effort responding: Comparisons and practical recommendations. Educational and Psychological Measurement, 80(2), 312–345. https://doi.org/10.1177/0013164419865316
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Huang, J. L., Liu, M., & Bowling, N. (2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100(3), 828–845. https://doi.org/10.1037/a0038510
Ilagan, M. J., & Falk, C. F. (2022). Supervised classes, unsupervised mixing proportions: Detection of bots in a Likert-type questionnaire. Educational and Psychological Measurement. https://doi.org/10.1177/00131644221104220
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer.
Jin, K.-Y., Chen, H.-F., & Wang, W.-C. (2018). Mixture item response models for inattentive responding behavior. Organizational Research Methods, 21(1), 197–225. https://doi.org/10.1177/1094428117725792
Johnson, J. A. (2005). Ascertaining the validity of individual protocols from Web-based personality inventories. Journal of Research in Personality, 39(1), 103–129. https://doi.org/10.1016/j.jrp.2004.09.009
Kroc, E., & Olvera Astivia, O. L. (2022). The importance of thinking multivariately when setting subscale cutoff scores. Educational and Psychological Measurement, 82(3), 517–538. https://doi.org/10.1177/00131644211023569
Kuiper, N. A. (2016). Humor Styles Questionnaire. In V. Zeigler-Hill & T. Shackelford (Eds.), Encyclopieda of personality and individual differences. Springer. https://doi.org/10.1007/978-3-319-28099-8_39-1
Lasko, T. A., Bhagwat, J. G., Zou, K. H., & Ohno-Machado, L. (2005). The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics, 38, 404–415. https://doi.org/10.1016/j.jbi.2005.02.008
Lohr, S. L. (2010). Sampling: Design and Analysis (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429296284
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 12, 49–55.
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
Nyblom, J. (2015). Permutation tests in linear regression. In K. Nordhausen & S. Taskinen (Eds.), Modern Nonparametric, Robust and Multivariate Methods. Cham: Springer. https://doi.org/10.1007/978-3-319-22404-6_5
Ojala, M., & Garriga, G. C. (2010). Permutation tests for studying classifier performance. Journal of Machine Learning Research, 11, 1833–1863.
Open Psychometrics Project. (n.d.). Raw data from online personality tests. Retrieved from https://openpsychometrics.org
Osborne, J. W., & Blanchard, M. R. (2011). Random responding from participants is a threat to the validity of social science research results. Frontiers in Psychology, 1(220), 1–7. https://doi.org/10.3389/fpsyg.2010.00220
Öztürk, N. K., & Karabastos, G. (2017). A Bayesian robust IRT outlier-detection model. Applied Psychological Measurement, 41(3), 195–208. https://doi.org/10.1177/014662
Patton, J. M., Cheng, Y., Hong, M., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309–341. https://doi.org/10.3102/1076998618825116
Perkel, J. M. (2020). Mischief-making bots attacked my scientific survey. Nature, 579, 461. https://doi.org/10.1038/d41586-020-00768-0
Roman, Z., Brandt, H., & Miller, J. M. (2022). Automated bot detection using Bayesian latent class models in online surveys. Frontiers in Psychology, 13(789223), 1–14. https://doi.org/10.3389/fpsyg.2022.789223
Schroeders, U., Schmidt, C., & Gnambs, T. (2022). Detecting careless responding in survey data using stochastic gradient boosting. Educational and Psychological Measurement, 82(1), 29–56. https://doi.org/10.1177/00131644211004708
Simone, M. (2019). How to Battle the Bots Wrecking Your Online Study. Behavioral Scientist. Retrieved 2022 June 27, from https://behavioralscientist.org/how-to-battle-the-bots-wrecking-your-online-study/
Storozuk, A., Ashley, M., Delage, V., & Maloney, E. A. (2020). Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology, 16 (5). https://doi.org/10.20982/tqmp.16.5.p472
Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239–259. https://doi.org/10.1111/jedm.12046
Ulitzsch, E., Yildirim-Erbasli, S. N., Gorgun, G., & Bulut, O. (2022). An explanatory mixture IRT model for careless and insufficient effort responding in self-report measures. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12272
Vaughan, D., & Dancho, M. (2021). Furrr: Apply mapping functions in parallel using futures [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=furrr (R package version 0.2.2)
Zhang, Z., Zhu, S., Mink, J., Xiong, A., Song, L., & Wang, G. (2022). Beyond bot detection: Combating fraudulent online survey takers. In Proceedings of the ACM Web Conference 2022 (WWW ’22). Lyon, France. https://doi.org/10.1145/3485447.3512230
Zijlstra, W. P., van der Ark, L. A., & Sijtsma, K. (2011). Outliers in questionnaire data: Can they be detected and should they be removed? Journal of Educational and Behavioral Statistics, 36 (2). https://doi.org/10.3102/1076998610366263
Acknowledgements
We acknowledge the support of the Natural Science and Engineering Research Council of Canada (NSERC), (funding reference number RGPIN-2018-05357 and DGECR-2018-00083), and the Fonds de recherche du Québec–Nature et technologies (2022-PR-298903). Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG), [numéro de référence RGPIN-2018-05357 et DGECR-2018-00083] et les Fonds de recherche du Québec–Nature et technologies (2022-PR-298903).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ilagan, M.J., Falk, C.F. Model-agnostic unsupervised detection of bots in a Likert-type questionnaire. Behav Res (2023). https://doi.org/10.3758/s13428-023-02246-7
Accepted:
Published:
DOI: https://doi.org/10.3758/s13428-023-02246-7