Abstract
News on social media can significantly influence users, manipulating them for political or economic reasons. Adversarial manipulations in the text have proven to create vulnerabilities in classifiers, and the current research is towards finding classifier models that are not susceptible to such manipulations. In this paper, we present a novel technique called ConTheModel, which slightly modifies social media news to confuse machine learning (ML)-based classifiers under the black-box setting. ConTheModel replaces a word in the original tweet with its synonym or antonym to generate tweets that confuse classifiers. We evaluate our technique on three different scenarios of the dataset and perform a comparison between five well-known machine learning algorithms, which includes Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP) to demonstrate the performance of classifiers on the modifications done by ConTheModel. Our results show that the classifiers are confused after modification with the utmost drop of 16.36%. We additionally conducted a human study with 25 participants to validate the effectiveness of ConTheModel and found that the majority of participants (65%) found it challenging to classify the tweets correctly. We hope our work will help in finding robust ML models against adversarial examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998 (2018)
Behzadan, V., Munir, A.: Vulnerability of deep reinforcement learning to policy induction attacks. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 262–275. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_19
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 513–530 (2016)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Clement, J.: Number of social network users worldwide from 2017 to 2025 (2020)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: TextFool: fool your model with natural adversarial text (2019)
Kaliyar, R.K., Goswami, A., Narang, P.: Multiclass fake news detection using ensemble machine learning. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 103–107. IEEE (2019)
Kim, Y., Kim, H.K., Kim, H., Hong, J.B.: Do many models make light work? evaluating ensemble solutions for improved rumor detection. IEEE Access 8, 150709–150724 (2020)
Kochkina, E., Liakata, M., Zubiaga, A.: All-in-one: multi-task learning for rumour verification. arXiv preprint arXiv:1806.03713 (2018)
Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial examples for natural language classification problems (2018)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
Ma, J., Gao, W., Wong, K.F.: Detect rumors on twitter by promoting information campaigns with generative adversarial learning. In: The World Wide Web Conference, pp. 3049–3055 (2019)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Miller, G.A.: Princeton university “about wordnet.” (2010). https://wordnet.princeton.edu/
BBC News: Facebook bows to Singapore’s ‘fake news’ law with post ‘correction’, 30 November 2019. https://www.bbc.com/news/world-asia-50613341
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modelling. Comput. Speech Lang. 8(1), 1–38 (1994)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016–2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Sharma, Y., Chen, P.Y.: Attacking the madry defense model with \( l\_1 \)-based adversarial examples. arXiv preprint arXiv:1710.10733 (2017)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Taheri, R., Javidan, R., Shojafar, M., Vinod, P., Conti, M.: Can machine learning model with static features be fooled: an adversarial machine learning approach. Cluster Comput. 23(4), 3233–3253 (2020). https://doi.org/10.1007/s10586-020-03083-5
Vijayaraghavan, P., Roy, D.: Generating black-box adversarial examples for text classifiers using a deep reinforced model. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 711–726. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_43
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)
Zubiaga, A., Liakata, M., Procter, R., Wong Sak Hoi, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS One 11(3), e0150989 (2016)
Acknowledgments
This work was supported by the ICT R&D Programs (no. 2017-0-00545) and the ITRC Support Program (IITP-2019-2015-0-00403).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ram Vinay, A., Alawami, M.A., Kim, H. (2021). ConTheModel: Can We Modify Tweets to Confuse Classifier Models?. In: Park, Y., Jadav, D., Austin, T. (eds) Silicon Valley Cybersecurity Conference. SVCC 2020. Communications in Computer and Information Science, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-72725-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-72725-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72724-6
Online ISBN: 978-3-030-72725-3
eBook Packages: Computer ScienceComputer Science (R0)