Abstract
Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character message called tweet, containing direct and unfiltered emotions by a large amount of user population. Twitter has attracted the attention of many researchers owing to the fact that every tweet is by default, public in nature which is not the case with Facebook. This paper proposes a model for multi-lingual (English and Roman Urdu) classification of tweets over diversely ranged classes (non-hierarchical architecture). Previous work in tweet classification is narrowly focused either on single language or either on uniform set of classes at most (Positive, Extremely Positive, Negative and Extremely Negative). The proposed model is based on semi-supervised learning and proposed feature selection approach makes it less dependent and highly adaptive for grabbing trending terms. This makes it a strong contender of choice for streaming data. In the methodology, using Naïve Bayes learning algorithm for each phase, obtained remarkable accuracy of up to 87.16% leading from both KNN and SVM models which are popular for NLP and Text classification domains.
Similar content being viewed by others
References
Bhavitha B, Rodrigues A, Chiplunkar N (2017) Comparative study of machine learning techniques in sentimental analysis. In: 2017 International conference on inventive communication and computational technologies (ICICCT), pp 216–221, https://doi.org/10.1109/ICICCT.2017.7975191
Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th international conference on discovery science, DS’10. http://dl.acm.org/citation.cfm?id=1927300.1927301. Springer, Berlin, pp 1–15
Bilal M, Israr H, Shahid M, Khan A (2016) Sentiment classification of roman-urdu opinions using naïve bayesian, decision tree and knn classification techniques. J King Saud Univ Comput Inform Sci 28 (3):330–344. https://doi.org/10.1016/j.jksuci.2015.11.003. http://www.sciencedirect.com/science/article/pii/S1319157815001330
Deshwal A, Sharma SK (2016) Twitter sentiment analysis using various classification algorithms. In: 2016 5Th international conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 251–257, https://doi.org/10.1109/ICRITO.2016.7784960
Essam Kazem Al-Yasiri AAA (2019) Improving arabic sentiment analysis on social media: a comparative study on applying different pre-processing techniques. COMPUSOFT Int J Adv Comput Technol 8(6):3150–3157
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. In: Processing. http://www.stanford.edu/alecmgo/papers/TwitterDistantSupervision09.pdf, pp 1–6
Gupta B, Negi M, Vishwakarma K, Rawat G, Badhani P (2017) Study of twitter sentiment analysis using machine learning algorithms on python. Int J Comput Appl 165:29–34. https://doi.org/10.5120/ijca2017914022
Harshita Mandloi SM (2018) Sentiment analysis using parallel computing through gpu. international journal of scientific research in computer science. Eng Inform Technol (IJSRCSEIT) 3(6):428–434
Hartmann T, Klenk S, Burkovski A, Heidemann G (2011) Sentiment detection with character n-grams. In: Stahlbock R (ed) Proceedings of the seventh international conference on data mining (DMIN’11)
Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564
Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Indust Inform 15 (7):3952–3961
Keramatfar A, Amirkhani H (2018) Bibliometrics of sentiment analysis literature. J Inform Sci https://doi.org/10.1177/0165551518761013
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets, 2nd edn. Cambridge University Press, USA
Lincy B, Nagarajan S (2019) A distributed support vector machine using apache spark for semi-supervised classification with data augmentation. In: Proceedings of ICSCSP 2018, vol 2, pp 395–405, https://doi.org/10.1007/978-981-13-3393-4_41
Liu YH, Chen YL (2018) A two-phase sentiment analysis approach for judgement prediction. J Inf Sci 44(5):594–607. https://doi.org/10.1177/0165551517722741
Nirmal V, Amalarethinam G (2017) Real-time sentiment prediction on streaming social network data using in-memory processing. In: 2017 World congress on computing and communication technologies (WCCCT), pp 69–72, https://doi.org/10.1109/WCCCT.2016.26
Pandarachalil R, Selvaraju S, GS M (2014) Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognit Comput 7:254–262. https://doi.org/10.1007/s12559-014-9310-z
Parveen H, Pandey S (2016) Sentiment analysis on twitter data-set using naive bayes algorithm. In: 2016 2nd International conference on applied and theoretical computing and communication technology (iCATcct), pp 416–419, https://doi.org/10.1109/ICATCCT.2016.7912034
Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online anomaly detection over big data streams. In: Proceedings of the 2015 IEEE international conference on big data (Big Data), BIG DATA ’15. IEEE Computer Society, Washington, pp 1113–1122, https://doi.org/10.1109/BigData.2015.7363865https://doi.org/10.1109/BigData.2015.7363865
Rodrigues A, Rao A, Chiplunkar N (2017) Sentiment analysis of real time twitter data using big data approach. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS), pp 1–6, https://doi.org/10.1109/CSITSS.2017.8447656
Singh R, Goel V (2019) Various machine learning algorithms for twitter sentiment analysis. In: Proceedings of third international conference on ICTCS 2017, pp 763–772, https://doi.org/10.1007/978-981-13-0586-3_74https://doi.org/10.1007/978-981-13-0586-3_74
Thiruvathukal GK, Christensen C, Jin X, Tessier F, Vishwanath V (2019) A benchmarking study to evaluate apache spark on large-scale supercomputers. CoRR abs/1904.11812. arXiv:1904.11812
Yang Y, Shafiq M (2018) Large scale and parallel sentiment analysis based on label propagation in twitter data. In: 2018 17th IEEE international conference on trust, security and privacy in computing and communications/ 12th IEEE international conference on big data science and engineering (trustcom/bigdataSE), pp 1791–1798, https://doi.org/10.1109/TrustCom/BigDataSE.2018.00270
Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd International conference of cloud computing technologies and applications (cloudtech), pp 1–6, https://doi.org/10.1109/CloudTech.2017.8284714
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 1–1
Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Zvarevashe K, Olugbara O (2018) A framework for sentiment analysis with opinion mining of hotel reviews. In: 2018 Conference on information communications technology and society (ICTAS), pp 1–4, https://doi.org/10.1109/ICTAS.2018.8368746
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khan, A.H., Zubair, M. Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning. Multimed Tools Appl 79, 32749–32767 (2020). https://doi.org/10.1007/s11042-020-09512-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09512-2