Model-agnostic unsupervised detection of bots in a Likert-type questionnaire

Ilagan, Michael John; Falk, Carl F.

doi:10.3758/s13428-023-02246-7

Model-agnostic unsupervised detection of bots in a Likert-type questionnaire

Original Manuscript
Published: 20 November 2023

(2023)
Cite this article

Behavior Research Methods Aims and scope Submit manuscript

Michael John Ilagan¹ &
Carl F. Falk¹

197 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

To detect bots in online survey data, there is a wealth of literature on statistical detection using only responses to Likert-type items. There are two traditions in the literature. One tradition requires labeled data, forgoing strong model assumptions. The other tradition requires a measurement model, forgoing collection of labeled data. In the present article, we consider the problem where neither requirement is available, for an inventory that has the same number of Likert-type categories for all items. We propose a bot detection algorithm that is both model-agnostic and unsupervised. Our proposed algorithm involves a permutation test with leave-one-out calculations of outlier statistics. For each respondent, it outputs a p value for the null hypothesis that the respondent is a bot. Such an algorithm offers nominal sensitivity calibration that is robust to the bot response distribution. In a simulation study, we found our proposed algorithm to improve upon naive alternatives in terms of 95% sensitivity calibration and, in many scenarios, in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices

Article 08 August 2018

Methods to detect low quality data and its implication for psychological research

Article 14 March 2018

Detecting non-content-based response styles in survey data: An application of mixture factor analysis

Article Open access 21 December 2023

Open Practices Statement

Example R code for implementing the methods studied in the manuscript is available at an Open Science Framework repository (https://osf.io/e5v3s/). The simulation study was not preregistered.

Notes

Some studies blur the distinction between the two traditions. Person-fit statistics require measurement model assumptions but are often used (e.g., Beck et al., 2019) toward direct detection. Interestingly, the approach in Patton et al. (2019) iteratively re-estimates parameters of a measurement model by removing suspicious respondents directly detected. Regardless, our taxonomy remains instructive for the purposes of the present article.
Note then that i is ambiguous as an index—it may refer to a training set respondent or a test set respondent, depending on the presence of the superscript.
Schroeders et al. (2022) is an unusual case where the favored approach does not involve NRIs—the raw Likert-type item responses are themselves the features, a convenience afforded by their flexible (and complicated) classifier. Note that raw Likert-type item responses as features admits no obvious ideal point, in contrast to our proposed approach.
The setup for unsupervised learning differs slightly from common textbook examples (e.g., James et al., 2013) where the goal is to extract features that can reproduce the data at hand. In such examples, there is still no labeled data and the existence of separate classes is not assumed. Furthermore, in both supervised and unsupervised paradigms, a user-defined training/test split of the data at hand is possible in order to assess model performance and prevent overfitting. If such a split is done, both training and test sets in the supervised paradigm contain labeled data, and neither set has labeled data in the unsupervised paradigm. In either paradigm, the chosen model can then be applied to yet additional future observations. We do not employ or consider this alternative use of training/test set terminology and instead reserve such terms to signal the presence/absence of known class exemplars when constructing the classifier.

References

Beck, M. F., Albano, A. D., & Smith, W. M. (2019). Person-fit as an index of inattentive responding: A comparison of methods using polytomous survey data. Applied Psychological Measurement, 43 (5). https://doi.org/10.1177/0146621618798666
Bengtsson, H. (2021). A unifying framework for parallel and distributed processing in R using futures. Retrieved from https://journal.r-project.org/archive/2021/RJ-2021-048/index.html (R package version 1.21.0)
Bock, R. D., & Gibbons, R. D. (2021). Item response theory. Hoboken, NJ: Wiley.
Book Google Scholar
Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50, 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Article PubMed Google Scholar
Chmielewski, M., & Kucker, S. C. (2020). An MTurk crisis? Shifts in data quality and the impact on study results. Social Psychological and Personality Science, 11(4), 464–473. https://doi.org/10.1177/1948550619875149
Article Google Scholar
Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70(4), 596–612. https://doi.org/10.1177/0013164410366686
Article Google Scholar
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006
Article Google Scholar
DeSimone, J. A., & Harms, P. D. (2018). Dirty data: The effects of screening respondents who provide low-quality data in survey research. Journal of Business and Psychology, 33, 559–577. https://doi.org/10.1007/s10869-017-9514-9
Article Google Scholar
Dupuis, M., Meier, E., & Cuneo, F. (2019). Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods, 51, 2228–2237. https://doi.org/10.3758/s13428-018-1103-y
Article PubMed Google Scholar
Hong, M. R., & Cheng, Y. (2018). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51, 573–588. https://doi.org/10.3758/s13428-018-1150-4
Article Google Scholar
Hong, M. R., Steedle, J. T., & Cheng, Y. (2020). Methods of detecting insufficient effort responding: Comparisons and practical recommendations. Educational and Psychological Measurement, 80(2), 312–345. https://doi.org/10.1177/0013164419865316
Article PubMed Google Scholar
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Article Google Scholar
Huang, J. L., Liu, M., & Bowling, N. (2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100(3), 828–845. https://doi.org/10.1037/a0038510
Article PubMed Google Scholar
Ilagan, M. J., & Falk, C. F. (2022). Supervised classes, unsupervised mixing proportions: Detection of bots in a Likert-type questionnaire. Educational and Psychological Measurement. https://doi.org/10.1177/00131644221104220
Article PubMed Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer.
Book Google Scholar
Jin, K.-Y., Chen, H.-F., & Wang, W.-C. (2018). Mixture item response models for inattentive responding behavior. Organizational Research Methods, 21(1), 197–225. https://doi.org/10.1177/1094428117725792
Article Google Scholar
Johnson, J. A. (2005). Ascertaining the validity of individual protocols from Web-based personality inventories. Journal of Research in Personality, 39(1), 103–129. https://doi.org/10.1016/j.jrp.2004.09.009
Article Google Scholar
Kroc, E., & Olvera Astivia, O. L. (2022). The importance of thinking multivariately when setting subscale cutoff scores. Educational and Psychological Measurement, 82(3), 517–538. https://doi.org/10.1177/00131644211023569
Article PubMed Google Scholar
Kuiper, N. A. (2016). Humor Styles Questionnaire. In V. Zeigler-Hill & T. Shackelford (Eds.), Encyclopieda of personality and individual differences. Springer. https://doi.org/10.1007/978-3-319-28099-8_39-1
Lasko, T. A., Bhagwat, J. G., Zou, K. H., & Ohno-Machado, L. (2005). The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics, 38, 404–415. https://doi.org/10.1016/j.jbi.2005.02.008
Article PubMed Google Scholar
Lohr, S. L. (2010). Sampling: Design and Analysis (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429296284
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 12, 49–55.
Google Scholar
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085
Article PubMed Google Scholar
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
Article Google Scholar
Nyblom, J. (2015). Permutation tests in linear regression. In K. Nordhausen & S. Taskinen (Eds.), Modern Nonparametric, Robust and Multivariate Methods. Cham: Springer. https://doi.org/10.1007/978-3-319-22404-6_5
Ojala, M., & Garriga, G. C. (2010). Permutation tests for studying classifier performance. Journal of Machine Learning Research, 11, 1833–1863.
Google Scholar
Open Psychometrics Project. (n.d.). Raw data from online personality tests. Retrieved from https://openpsychometrics.org
Osborne, J. W., & Blanchard, M. R. (2011). Random responding from participants is a threat to the validity of social science research results. Frontiers in Psychology, 1(220), 1–7. https://doi.org/10.3389/fpsyg.2010.00220
Article Google Scholar
Öztürk, N. K., & Karabastos, G. (2017). A Bayesian robust IRT outlier-detection model. Applied Psychological Measurement, 41(3), 195–208. https://doi.org/10.1177/014662
Article PubMed Google Scholar
Patton, J. M., Cheng, Y., Hong, M., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309–341. https://doi.org/10.3102/1076998618825116
Article Google Scholar
Perkel, J. M. (2020). Mischief-making bots attacked my scientific survey. Nature, 579, 461. https://doi.org/10.1038/d41586-020-00768-0
Article PubMed Google Scholar
Roman, Z., Brandt, H., & Miller, J. M. (2022). Automated bot detection using Bayesian latent class models in online surveys. Frontiers in Psychology, 13(789223), 1–14. https://doi.org/10.3389/fpsyg.2022.789223
Article Google Scholar
Schroeders, U., Schmidt, C., & Gnambs, T. (2022). Detecting careless responding in survey data using stochastic gradient boosting. Educational and Psychological Measurement, 82(1), 29–56. https://doi.org/10.1177/00131644211004708
Article PubMed Google Scholar
Simone, M. (2019). How to Battle the Bots Wrecking Your Online Study. Behavioral Scientist. Retrieved 2022 June 27, from https://behavioralscientist.org/how-to-battle-the-bots-wrecking-your-online-study/
Storozuk, A., Ashley, M., Delage, V., & Maloney, E. A. (2020). Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology, 16 (5). https://doi.org/10.20982/tqmp.16.5.p472
Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239–259. https://doi.org/10.1111/jedm.12046
Article Google Scholar
Ulitzsch, E., Yildirim-Erbasli, S. N., Gorgun, G., & Bulut, O. (2022). An explanatory mixture IRT model for careless and insufficient effort responding in self-report measures. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12272
Vaughan, D., & Dancho, M. (2021). Furrr: Apply mapping functions in parallel using futures [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=furrr (R package version 0.2.2)
Zhang, Z., Zhu, S., Mink, J., Xiong, A., Song, L., & Wang, G. (2022). Beyond bot detection: Combating fraudulent online survey takers. In Proceedings of the ACM Web Conference 2022 (WWW ’22). Lyon, France. https://doi.org/10.1145/3485447.3512230
Zijlstra, W. P., van der Ark, L. A., & Sijtsma, K. (2011). Outliers in questionnaire data: Can they be detected and should they be removed? Journal of Educational and Behavioral Statistics, 36 (2). https://doi.org/10.3102/1076998610366263

Download references

Acknowledgements

We acknowledge the support of the Natural Science and Engineering Research Council of Canada (NSERC), (funding reference number RGPIN-2018-05357 and DGECR-2018-00083), and the Fonds de recherche du Québec–Nature et technologies (2022-PR-298903). Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG), [numéro de référence RGPIN-2018-05357 et DGECR-2018-00083] et les Fonds de recherche du Québec–Nature et technologies (2022-PR-298903).

Author information

Authors and Affiliations

Department of Psychology, McGill University, 2001 McGill College, 7th Floor, H3A 1G1, Montreal, QC, Canada
Michael John Ilagan & Carl F. Falk

Authors

Michael John Ilagan
View author publications
You can also search for this author in PubMed Google Scholar
Carl F. Falk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carl F. Falk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 177 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ilagan, M.J., Falk, C.F. Model-agnostic unsupervised detection of bots in a Likert-type questionnaire. Behav Res (2023). https://doi.org/10.3758/s13428-023-02246-7

Download citation

Accepted: 07 September 2023
Published: 20 November 2023
DOI: https://doi.org/10.3758/s13428-023-02246-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-agnostic unsupervised detection of bots in a Likert-type questionnaire

Abstract

Access this article

Similar content being viewed by others

Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices

Methods to detect low quality data and its implication for psychological research

Detecting non-content-based response styles in survey data: An application of mixture factor analysis

Open Practices Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 177 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model-agnostic unsupervised detection of bots in a Likert-type questionnaire

Abstract

Access this article

Similar content being viewed by others

Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices

Methods to detect low quality data and its implication for psychological research

Detecting non-content-based response styles in survey data: An application of mixture factor analysis

Open Practices Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 177 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation