Detecting Spam Tweets in Trending Topics Using Graph-Based Approach

  • Ramesh PaudelEmail author
  • Prajjwal Kandel
  • William Eberle
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1069)


In recent years, social media has changed the way people communicate and share information. For example, when some important and noteworthy event occurs, many people like to “tweet” (Twitter) or post information, resulting in the event trending and becoming more popular. Unfortunately, spammers can exploit trending topics to spread spam more quickly and to a wider audience. Recently, researchers have applied various machine learning techniques on accounts and messages to detect spam on Twitter. However, the features of typical tweets can be easily fabricated by the spammers. In this work, we propose a graph-based approach that leverages the relationship between the named entities present in the content of the tweet and the document referenced by the URL mentioned in the tweet for detecting possible spam. It is our hypothesis that by combining multiple, heterogeneous information together into a single graph representation, we can discover unusual patterns in the data that reveal spammer activities - structural features that are difficult for spammers to fabricate. We will demonstrate the usefulness of this approach by collecting tweets and documents referenced by the URL in the tweet related to Twitter trending topics, and running graph-based anomaly detection algorithms on a graph representation of the data, in order to effectively detect anomalies on trending tweets.


Twitter Spam detection Anomaly Detection Graph-based anomaly 


  1. 1.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)Google Scholar
  2. 2.
    Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst. Appl. 40(8), 2992–3000 (2013)CrossRefGoogle Scholar
  3. 3.
    Gayo Avello, D., Brenes Martínez, D.J.: Overcoming spammers in twitter–a tale of five algorithms. In: Spanish Conference on Information Retrieval. CERI (2010)Google Scholar
  4. 4.
    Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)Google Scholar
  5. 5.
    Verma, M., Sofat, S.: Techniques to detect spammers in twitter-a survey. Int. J. Comput. Appl. 85(10), 27–32 (2014)Google Scholar
  6. 6.
    Ameen, A.K., Kaya, B.: Detecting spammers in twitter network. Int. J. Appl. Math. Electron. Comput. 5(4), 71–75 (2017)CrossRefGoogle Scholar
  7. 7.
    Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time URL spam filtering service. In: 2011 IEEE Symposium on Security and Privacy (SP), pp. 447–462. IEEE (2011)Google Scholar
  8. 8.
    Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: International Conference on Autonomic and Trusted Computing, pp. 175–186. Springer (2011)Google Scholar
  9. 9.
    Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6. IEEE (2014)Google Scholar
  10. 10.
    Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: International Workshop on Recent Advances in Intrusion Detection, pp. 301–317. Springer (2011)Google Scholar
  11. 11.
    Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: 2015 IEEE International Conference on Communications (ICC), pp. 7065–7070. IEEE (2015)Google Scholar
  12. 12.
    Anantharam, P., Thirunarayan, K., Sheth, A.: Topical anomaly detection from twitter stream. In: Proceedings of the 4th Annual ACM Web Science Conference, pp. 11–14. ACM (2012)Google Scholar
  13. 13.
    Eberle, W., Holder, L.: Anomaly detection in data represented as graphs. Intell. Data Anal. 11(6), 663–689 (2007)CrossRefGoogle Scholar
  14. 14.
    Twitter: Report Spam on Twitter. Accessed 9 Oct 2018
  15. 15.
    Wu, T., Wen, S., Xiang, Y., Zhou, W.: Twitter spam detection: survey of new approaches and comparative study. Comput. Secur. 76, 265–284 (2018)CrossRefGoogle Scholar
  16. 16.
    Soman, S.J., Murugappan, S.: Detecting malicious tweets in trending topics using clustering and classification. In: 2014 International Conference on Recent Trends in Information Technology, pp. 1–6. IEEE (2014)Google Scholar
  17. 17.
    Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference, p. 3. ACM (2017)Google Scholar
  18. 18.
    Boididou, C., Papadopoulos, S., Apostolidis, L., Kompatsiaris, Y.: Learning to detect misleading content on twitter. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 278–286. ACM (2017)Google Scholar
  19. 19.
    Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013)CrossRefGoogle Scholar
  20. 20.
    Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)Google Scholar
  21. 21.
    Lee, S., Kim, J.: WarningBird: a near real-time detection system for suspicious URLs in twitter stream. IEEE Trans. Dependable Secure Comput. 10(3), 183–195 (2013)CrossRefGoogle Scholar
  22. 22.
    Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: NDSS (2013)Google Scholar
  23. 23.
    Gupta, H., Jamal, M.S., Madisetty, S., Desarkar, M.S.: A framework for real-time spam detection in twitter. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS), pp. 380–383. IEEE (2018)Google Scholar
  24. 24.
    Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2003)Google Scholar
  26. 26.
    Paudel, R., Harlan, P., Eberle, W.: Detecting the onset of a network layer dos attack with a graph-based approach. In: FLAIRS Conference, pp. 38–43 (2019)Google Scholar
  27. 27.
    Chaparro, C., Eberle, W.: Detecting anomalies in mobile telecommunication networks using a graph based approach. In: FLAIRS Conference, pp. 410–415 (2015)Google Scholar
  28. 28.
    Paudel, R., Eberle, W., Holder, L.B.: Anomaly detection of elderly patient activities in smart homes using a graph-based approach. In: Proceedings of the 2018 International Conference on Data Science, pp. 163–169. CSREA (2018)Google Scholar
  29. 29.
    Paudel, R., Eberle, W., Talbert, D.: Detection of anomalous activity in diabetic patients using graph-based approach. In: FLAIRS Conference, pp. 423–428 (2017)Google Scholar
  30. 30.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  31. 31.
    Jurisica, I., Mylopoulos, J., Yu, E.: Ontologies for knowledge management: an information systems perspective. Knowl. Inf. Syst. 6(4), 380–401 (2004)CrossRefGoogle Scholar
  32. 32.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
  33. 33.
    Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)CrossRefGoogle Scholar
  34. 34.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016)CrossRefGoogle Scholar
  35. 35.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2008)Google Scholar
  36. 36.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)Google Scholar
  37. 37.
    Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explor. Newslett. 6(1), 30–39 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ramesh Paudel
    • 1
    Email author
  • Prajjwal Kandel
    • 1
  • William Eberle
    • 1
  1. 1.Tennessee Technological UniversityCookevilleUSA

Personalised recommendations