Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets

Jan, Tabassum Gull

doi:10.1007/978-3-030-29407-6_48

Tabassum Gull Jan³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

2350 Accesses
3 Citations

Abstract

Twitter is one of the fastest growing microblogging and online social networking site that enables users to send and receive messages in the form of tweets. Twitter is the trend of today for news analysis and discussions. That is why Twitter has become the main target of attackers and cybercriminals. These attackers not only hamper the security of Twitter but also destroy the whole trust people have on it. Hence, making Twitter platform impure by misusing it. Misuse can be in the form of hurtful gossips, cyberbullying, cyber harassment, spams, pornographic content, identity theft, common Web attacks like phishing and malware downloading, etc. Twitter world is growing fast and hence prone to spams. So, there is a need for spam detection on Twitter. Spam detection using supervised algorithms is wholly and solely based on the labelled dataset of Twitter. To label the datasets manually is costly, time-consuming and a challenging task. Also, these old labelled datasets are nowadays not available because of Twitter data publishing policies. So, there is a need to design an approach to label the tweets as spam and non-spam in order to overcome the effect of spam drift. In this paper, we downloaded the recent dataset of Twitter and prepared an unlabelled dataset of tweets from it. Later on, we applied the cluster-then-label approach to label the tweets as spam and non-spam. This labelled dataset can then be used for spam detection in Twitter and categorization of different types of spams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ala’M, A.Z., Faris, H., et al.: Spam profile detection in social networks based on public features. In: 2017 8th International Conference on information and Communication Systems (ICICS). pp. 130–135. IEEE (2017)
Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). vol. 6, p. 12 (2010)
Google Scholar
Eshraqi, N., Jalali, M., Moattar, M.H.: Detecting spam tweets in twitter using a data stream clustering algorithm. In: 2015 International Congress on Technology, Communication and Knowledge (ICTCK). pp. 347–351. IEEE (2015)
Google Scholar
Fazil, M., Abulaish, M.: A hybrid approach for detecting automated spammers in twitter. IEEE Trans. Inf. Forensics Secur. 13(11), 2707–2719 (2018)
Article Google Scholar
Gautam, G., Yadav, D.: Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 Seventh International Conference on Contemporary Computing (IC3). pp. 437–442. IEEE (2014)
Google Scholar
Liu, C., Wang, G.: Analysis and detection of spam accounts in social networks. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC). pp. 2526–2530. IEEE (2016)
Google Scholar
Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST). pp. 1–6. IEEE (2014)
Google Scholar
Peikari, M., Salama, S., Nofech-Mozes, S., Martel, A.L.: A cluster-then-label semi-supervised learning approach for pathology image classification. Sci. Rep. 8(1), 7193 (2018)
Article Google Scholar
Perveen, N., Missen, M.M.S., Rasool, Q., Akhtar, N.: Sentiment based twitter spam detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 568–573 (2016)
Google Scholar
Sedhai, S., Sun, A.: Semi-supervised spam detection in twitter stream. IEEE Trans. Computational Soc. Syst. 5(1), 169–175 (2018)
Article Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: International workshop on recent advances in intrusion detection. pp. 301–317. Springer (2011)
Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. pp. 1–9. ACM (2010)
Google Scholar
Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference. p. 3. ACM (2017)
Google Scholar
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Info. Forensics Sec. 8(8), 1280–1293 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Technology Central University of Punjab, Bathinda, India
Tabassum Gull Jan

Authors

Tabassum Gull Jan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tabassum Gull Jan .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India
Pradeep Kumar Singh
Indian Institute of Technology Delhi, New Delhi, Delhi, India
Arpan Kumar Kar
Central University of Jammu, Jammu, Jammu and Kashmir, India
Yashwant Singh
Indian Institute of Technology Patna, Patna, Bihar, India
Maheshkumar H. Kolekar
Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jan, T.G. (2020). Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-29407-6_48
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics