Cluster Computing

, Volume 22, Supplement 6, pp 13965–13974 | Cite as

Feature extraction using LR-PCA hybridization on twitter data and classification accuracy using machine learning algorithms

  • N. Senthil MuruganEmail author
  • G. Usha Devi


Twitter, a social blogging site which became the tremendous topic in today’s environment, which made several organizations and public to develop their identity and overwhelming through this social website. But unfortunately, twitter facing great challenges due to spammers who break the reputation of the website from deliberate users to stop using it. Researchers have proposed many techniques to overcome the issues faced by the spammers. As far researchers find a new path so as the spammers develop new techniques to travel in that path. So far, many algorithms were proposed to detect the spammers and some extraction techniques have developed to increase the potential of detection rate. In this paper, the main focus is about feature extraction of our data with a hybrid approach of combining logistic regression with dimensional reduction technique using principal component analysis. Our dataset contains 17 million users’ tweets with 159 features included in it. Then we are going to extract particular features from it which would be helpful for the further process of increasing the classification accuracy. For the classification process, our work extended for the process of classification of data using some machine learning techniques. From the proposed work the detection rate could be increased by using particular features for the classification process.


Social networks Twitter PCA Logistic regression Machine learning 


  1. 1.
    Clark, E.M., Williams, J.R., Jones, C.A., Galbraith, R.A., Danforth, C.M., Dodds, P.S.: Sifting robotic from organic text: a natural language approach for detecting automation on Twitter. J. Comput. Sci. 16, 1–7 (2016). CrossRefGoogle Scholar
  2. 2.
    Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely Twitter spam detection. IEEE Int. Conf. Commun. (2015).
  3. 3.
    Lee, S., Kim, J.: WarningBird: a near real-time detection system for suspicious URLs in Twitter stream. IEEE Trans. Depend. Secur. Comput. 10(3), 183–195 (2013). CrossRefGoogle Scholar
  4. 4.
    Tsakalidis, A., Papadopoulos, S., Cristea, A.I., Kompatsiaris, Y.: Predicting elections for multiple countries using twitter and polls. IEEE Intell. Syst. 30(2), 10–17 (2015). CrossRefGoogle Scholar
  5. 5.
    Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. (2015). CrossRefGoogle Scholar
  6. 6.
    Manogaran, G., Lopez, D.: Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng. 65(1), 207–221 (2017)Google Scholar
  7. 7.
    Manogaran, G., Lopez, D.: A Gaussian process based big data processing framework in cluster computing environment. Cluster Comput. 1–16 (2017). CrossRefGoogle Scholar
  8. 8.
    Kumar, P.M., Gandhi, U., Varatharajan, R., Manogaran, G., Jidhesh, R., Vadivel, T.: Intelligent face recognition and navigation system using neural learning for smart security in Internet of Things. Cluster Comput. 1–12 (2017).
  9. 9.
    Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013). CrossRefGoogle Scholar
  10. 10.
    Varatharajan, R., Manogaran, G., Priyan, M.K., Balaş, V.E., Barna, C.: Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis. Multimed. Tools Appl. 1–21 (2017). CrossRefGoogle Scholar
  11. 11.
    Varatharajan, R., Vasanth, K., Gunasekaran, M., Priyan, M., Gao, X.Z.: An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images. Comput. Electr. Eng. (2017). CrossRefGoogle Scholar
  12. 12.
    Rawal, B.S., Vijayakumar, V., Manogaran, G., Varatharajan, R., Chilamkurti, N.: Secure disintegration protocol for privacy preserving cloud storage. Wireless Pers. Commun. 1–17. CrossRefGoogle Scholar
  13. 13.
    Gandhi, U.D., Kumar, P.M., Varatharajan, R., Manogaran, G., Sundarasekar, R., Kadu, S.: HIoTPOT: surveillance on IoT devices against recent threats. Wireless Pers. Commun. 1–16 (2018). CrossRefGoogle Scholar
  14. 14.
    Gao, D., Li, W., Cai, X., Zhang, R., Ouyang, Y.: Sequential summarization: a full view of twitter trending topics. Soc. Media Content Anal. (2017). CrossRefGoogle Scholar
  15. 15.
    Kotani, M., Ozawa, S.: Feature extraction using independent components of each category. Neural Process. Lett. 22(2), 113–124 (2005). CrossRefGoogle Scholar
  16. 16.
    Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P.M., Sundarasekar, R., Hsu, C.H.: Machine learning based big data processing framework for cancer diagnosis using hidden markov model and gm clustering. Wireless Pers. Commun. 1–18 (2017). CrossRefGoogle Scholar
  17. 17.
    Manogaran, C.T.G., Priyan, M.: Centralized fog computing security platform for IoT and cloud in healthcare system. In: Exploring the Convergence of Big Data and the Internet of Things, pp. 141. IGI Global (2017)Google Scholar
  18. 18.
    Tsapatsoulis, N., Djouvas, C.: Feature extraction for tweet classification: do the humans perform better? In: 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) (2017).
  19. 19.
    Siirtola, P., Koskimäki, H., Huikari, V., Laurinen, P., Röning, J.: Improving the classification accuracy of streaming data using SAX similarity features. Pattern Recognit. Lett. 32(13), 1659–1668 (2011). CrossRefGoogle Scholar
  20. 20.
    Manogaran, G., Varatharajan, R., Lopez, D., Kumar, P.M., Sundarasekar, R., Thota, C.: A new architecture of internet of things and big data ecosystem for secured smart healthcare monitoring and alerting system. Futur. Gener. Comput. Syst. (2017)Google Scholar
  21. 21.
    Zhu, T., Gao, H., Yang, Y., Bu, K., Chen, Y., Downey, D., Choudhary, A.N.: Beating the artificial chaos: fighting OSN spam using its own templates. IEEE/ACM Trans. Netw. 24(6), 3856–3869 (2016). CrossRefGoogle Scholar
  22. 22.
    Chen, C., Wang, Y., Zhang, J., Xiang, Y., Zhou, W., Min, G.: Statistical features-based real-time detection of drifted twitter spam. IEEE Trans. Inf. Forensics Secur. 12(4), 914–925 (2017). CrossRefGoogle Scholar
  23. 23.
    Zareapoor, M., Seeja, K.R.: Feature extraction or feature selection for text classification: a case study on phishing email detection. Int. J. Inform. Eng. Electron. Bus. 7(2), 60–65 (2015). CrossRefGoogle Scholar
  24. 24.
    Jaba, S., Shanthi, V.: An approach for discretization and feature selection of continuous-valued attributes in medical images for classification learning. Int. J. Comput. Electr. Eng. 1, 179–183 (2009)CrossRefGoogle Scholar
  25. 25.
    Varatharajan, R., Manogaran, G., Priyan, M.K.: A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimed. Tools Appl. 1–21 (2017). CrossRefGoogle Scholar
  26. 26.
    Varatharajan, R., Manogaran, G., Priyan, M. K., Sundarasekar, R.: Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm. Cluster Comput. 1–10 (2017). CrossRefGoogle Scholar
  27. 27.
    Bouazizi, M., Ohtsuki, T.O.: A pattern-based approach for sarcasm detection on twitter. IEEE Access 4, 5477–5488 (2016). CrossRefGoogle Scholar
  28. 28.
    Zhang, Y., Ruan, X., Wang, H., Wang, H., He, S.: Twitter trends manipulation: a first look inside the security of twitter trending. IEEE Trans. Inf. Forensics Secur. (2016). CrossRefGoogle Scholar
  29. 29.
    Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8(8), 1280–1293 (2013). CrossRefGoogle Scholar
  30. 30.
    Chen, L., Lu, C.: An improved independent component analysis algorithm based on artificial immune system. Int. J. Mach. Learn. Comput. (2013). CrossRefGoogle Scholar
  31. 31.
    Manogaran, G., Varatharajan, R., Priyan, M.K.: Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system. Multimed. Tools Appl. 77(4), 4379–4399 (2018)CrossRefGoogle Scholar
  32. 32.
    Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval—SIGIR 10 (2010).
  33. 33.
    Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference on—ACSAC 10 (2010).
  34. 34.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Proceedings of Collaboration, Electronic Messaging, Anti-Abuse and Spam Conf. (CEAS), Redmond, WA, USA (2010)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information Technology and EngineeringVIT UniversityVelloreIndia

Personalised recommendations