Asia Information Retrieval Symposium

Information Retrieval Technology pp 123-134 | Cite as

Detecting Automatically-Generated Arabic Tweets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9460)

Abstract

Recently, Twitter, one of the most widely-known social media platforms, got infiltrated by several automation programs, commonly known as “bots”. Bots can be easily abused to spread spam and hinder information extraction applications by posting lots of automatically-generated tweets that occupy a good portion of the continuous stream of tweets. This problem heavily affects users in the Arab region due to the recent developing political events as automated tweets can disturb communication and waste time needed in filtering such tweets.

To mitigate this problem, this research work addresses the classification of Arabic tweets into automated or manual. We proposed four categories of features including formality, structural, tweet-specific, and temporal features. Our experimental evaluation over about 3.5 k randomly sampled Arabic tweets shows that classification based on individual categories of features outperform the baseline unigram-based classifier in terms of classification accuracy. Additionally, combining tweet-specific and unigram features improved classification accuracy to 92 %, which is a significant improvement over the baseline classifier, constituting a very strong reference baseline for future studies.

Keywords

Tweet classification Arabic microblogs Bots Automated tweets Crowdsourcing 

References

  1. 1.
    Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: Phishari: automatic realtime phishing detection on twitter. In: IEEE eCrime Researchers Summit (eCrime), pp. 1–12. IEEE (2012)Google Scholar
  2. 2.
    Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)CrossRefGoogle Scholar
  3. 3.
    Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K.P.: Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st International Conference on World Wide Web (WWW), pp. 61–70. ACM (2012)Google Scholar
  4. 4.
    Hasanain, M., Elsayed, T., Magdy, W.: Identification of answer-seeking questions in arabic microblogs. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM), pp. 1839–1842. ACM (2014)Google Scholar
  5. 5.
    Hentschel, M., Alonso, O., Counts, S., Kandylas, V.: Finding users we trust: scaling up verified twitter users using their communication patterns. In: Eighth International AAAI Conference on Web and Social Media (ICWSM) (2014)Google Scholar
  6. 6.
    Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI) (2014)Google Scholar
  7. 7.
    Laboreiro, G., Sarmento, L., Oliveira, E.: Identifying automatic posting systems in microblogs. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 634–648. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: Fifth International AAAI Conference on Web and Social Media (ICWSM). Citeseer (2011)Google Scholar
  9. 9.
    Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst. Appl. 40(8), 2992–3000 (2013)CrossRefGoogle Scholar
  10. 10.
    Wald, R., Khoshgoftaar, T.M., Napolitano, A., Sumner, C.: Predicting susceptibility to social bots on twitter. In: IEEE 14th International Conference on Information Reuse and Integration (IRI), pp. 6–13. IEEE (2013)Google Scholar
  11. 11.
    Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web (WWW), pp. 71–80. ACM (2012)Google Scholar
  12. 12.
    Zhang, C.M., Paxson, V.: Detecting and analyzing automated activity on twitter. In: Spring, N., Riley, G.F. (eds.) PAM 2011. LNCS, vol. 6579, pp. 102–111. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Zhu, Y., Wang, X., Zhong, E., Liu, N.N., Li, H., Yang, Q.: Discovering spammers in social networks. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI) (2012)Google Scholar
  14. 14.
    Zubiaga, A., Spina, D., Martínez, R., Fresno, V.: Real-time classification of twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Computer Science and Engineering Department, College of EngineeringQatar UniversityDohaQatar
  2. 2.Qatar Computing Research InstituteDohaQatar

Personalised recommendations