ConTheModel: Can We Modify Tweets to Confuse Classifier Models?

Ram Vinay, Aishwarya; Alawami, Mohsen Ali; Kim, Hyoungshick

doi:10.1007/978-3-030-72725-3_15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1383))

Included in the following conference series:

Silicon Valley Cybersecurity Conference

602 Accesses

Abstract

News on social media can significantly influence users, manipulating them for political or economic reasons. Adversarial manipulations in the text have proven to create vulnerabilities in classifiers, and the current research is towards finding classifier models that are not susceptible to such manipulations. In this paper, we present a novel technique called ConTheModel, which slightly modifies social media news to confuse machine learning (ML)-based classifiers under the black-box setting. ConTheModel replaces a word in the original tweet with its synonym or antonym to generate tweets that confuse classifiers. We evaluate our technique on three different scenarios of the dataset and perform a comparison between five well-known machine learning algorithms, which includes Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP) to demonstrate the performance of classifiers on the modifications done by ConTheModel. Our results show that the classifiers are confused after modification with the utmost drop of 16.36%. We additionally conducted a human study with 25 participants to validate the effectiveness of ConTheModel and found that the majority of participants (65%) found it challenging to classify the tweets correctly. We hope our work will help in finding robust ML models against adversarial examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Behzadan, V., Munir, A.: Vulnerability of deep reinforcement learning to policy induction attacks. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 262–275. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_19
Chapter Google Scholar
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Chapter Google Scholar
Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 513–530 (2016)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Google Scholar
Clement, J.: Number of social network users worldwide from 2017 to 2025 (2020)
Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)
Google Scholar
Kaliyar, R.K., Goswami, A., Narang, P.: Multiclass fake news detection using ensemble machine learning. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 103–107. IEEE (2019)
Google Scholar
Kim, Y., Kim, H.K., Kim, H., Hong, J.B.: Do many models make light work? evaluating ensemble solutions for improved rumor detection. IEEE Access 8, 150709–150724 (2020)
Article Google Scholar
Kochkina, E., Liakata, M., Zubiaga, A.: All-in-one: multi-task learning for rumour verification. arXiv preprint arXiv:1806.03713 (2018)
Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems (2018)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
Ma, J., Gao, W., Wong, K.F.: Detect rumors on twitter by promoting information campaigns with generative adversarial learning. In: The World Wide Web Conference, pp. 3049–3055 (2019)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Miller, G.A.: Princeton university “about wordnet.” (2010). https://wordnet.princeton.edu/
BBC News: Facebook bows to Singapore’s ‘fake news’ law with post ‘correction’, 30 November 2019. https://www.bbc.com/news/world-asia-50613341
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modelling. Comput. Speech Lang. 8(1), 1–38 (1994)
Article Google Scholar
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016–2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Sharma, Y., Chen, P.Y.: Attacking the madry defense model with \( l\_1 \)-based adversarial examples. arXiv preprint arXiv:1710.10733 (2017)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Taheri, R., Javidan, R., Shojafar, M., Vinod, P., Conti, M.: Can machine learning model with static features be fooled: an adversarial machine learning approach. Cluster Comput. 23(4), 3233–3253 (2020). https://doi.org/10.1007/s10586-020-03083-5
Article Google Scholar
Vijayaraghavan, P., Roy, D.: Generating black-box adversarial examples for text classifiers using a deep reinforced model. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 711–726. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_43
Chapter Google Scholar
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)
Zubiaga, A., Liakata, M., Procter, R., Wong Sak Hoi, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS One 11(3), e0150989 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by the ICT R&D Programs (no. 2017-0-00545) and the ITRC Support Program (IITP-2019-2015-0-00403).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Aishwarya Ram Vinay, Mohsen Ali Alawami & Hyoungshick Kim

Authors

Aishwarya Ram Vinay
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Ali Alawami
View author publications
You can also search for this author in PubMed Google Scholar
Hyoungshick Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyoungshick Kim .

Editor information

Editors and Affiliations

San Jose State University, San Jose, CA, USA
Younghee Park
IBM Almaden Research Center, San Jose, CA, USA
Divyesh Jadav
San Jose State University, San Jose, CA, USA
Thomas Austin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ram Vinay, A., Alawami, M.A., Kim, H. (2021). ConTheModel: Can We Modify Tweets to Confuse Classifier Models?. In: Park, Y., Jadav, D., Austin, T. (eds) Silicon Valley Cybersecurity Conference. SVCC 2020. Communications in Computer and Information Science, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-72725-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-72725-3_15
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72724-6
Online ISBN: 978-3-030-72725-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ConTheModel: Can We Modify Tweets to Confuse Classifier Models?