Big data analytics for critical information classification in online social networks using classifier chains

Silva, Douglas H.; Maziero, Erick G.; Saadi, Muhammad; Rosa, Renata L.; Silva, Juan C.; Rodriguez, Demostenes Z.; Igorevich, Kostromitin K.

doi:10.1007/s12083-021-01269-1

Big data analytics for critical information classification in online social networks using classifier chains

Published: 10 January 2022

Volume 15, pages 626–641, (2022)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Douglas H. Silva¹,
Erick G. Maziero¹,
Muhammad Saadi ORCID: orcid.org/0000-0001-7901-7435²,
Renata L. Rosa¹,
Juan C. Silva³,
Demostenes Z. Rodriguez¹ &
…
Kostromitin K. Igorevich⁴

427 Accesses
2 Citations
Explore all metrics

Abstract

Industrial and academic organizations are using online social network (OSN) for different purposes, such as social and economic aspects. Now, OSN is a new mean of obtaining information from people about their preferences, and interests. Due to the large volume of user-generated content, researchers use various techniques, such as sentiment analysis or data mining to evaluate this information automatically. However, the sentiment analysis of OSN content is performed by different methods, but there are some problems to obtain highly reliable results, mainly because of the lack of user profile information, such as gender and age. In this work, a novel dataset is built, which contains the writing characteristics of 160,000 users of the Twitter OSN. Before creating classification models with Machine Learning (ML) techniques, feature transformation and feature selection methods are applied to determine the most relevant set of characteristics. To create the models, the Classifier Chain (CC) transformation technique and different machine learning algorithms are applied to the training set. Simulation results show that the Random Forest, XGBoost and Decision Tree algorithms obtain the best performance results. In the testing phase, these algorithms reached Hamming Loss values of 0.033, 0.033, and 0.034, respectively, and all of them reached the same F1 micro-average value equal to 0.976. Therefore, our proposal based on a multidimensional learning technique using CC transformation overcomes other similar proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Social media analytics: a survey of techniques, tools and platforms

Article Open access 26 July 2014

References

Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167
Article Google Scholar
Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan FK (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1). https://doi.org/10.1111/exsy.12233
Guimaraes RG, Rosa RL, De Gaetano D, Rodriguez DZ, Bressan G (2017) Age groups classification in social network using deep learning. IEEE Access 5:10805–10816
Article Google Scholar
Nguyen D, Gravel R, Trieschnigg D, Meder T (2013) how old do you think i am?; a study of language and age in twitter. In: Proceedings of the seventh international AAAI conference on weblogs and social media. AAAI Press
Park G, Yaden DB, Schwartz HA, Kern ML, Eichstaedt JC, Kosinski M, Stillwell D, Ungar LH, Seligman ME (2016) Women are warmer but no less assertive than men: Gender and language on facebook. PLoS One 11(5):e0155885
Article Google Scholar
Li D, Li Y, Ji W (2017) Gender identification via reposting behaviors in social media. IEEE Access 6:2879–2888
Article Google Scholar
Romanov AS, Kurtukova AV, Sobolev AA, Shelupanov AA, Fedotova AM (2020) Determining the age of the author of the text based on deep neural network models. Information 11(12):589
Article Google Scholar
Srivastava DK, Roychoudhury B (2020) Words are important: A textual content based identity resolution scheme across multiple online social networks. Knowledge-Based Systems 195:105624
Article Google Scholar
Kiratsa P, Sidiropoulos G, Badeka E, Papadopoulou C, Nikolaou A, Papakostas GA (2018) Gender identification through facebook data analysis using machine learning techniques. In: Proceedings of the 22nd Pan-Hellenic Conference on Informatics, pp. 117–120
Keikha M, Hashemi S (2016) Ordered classifier chains for multi-label classification. Journal of Machine Intelligence 1(1):7–12
Article Google Scholar
Marquardt J, Farnadi G, Vasudevan G, Moens MF, Davalos S, Teredesai A, De Cock M (2014) Age and gender identification in social media. Proceedings of CLEF 2014 Evaluation Labs 1180:1129–1136
Read J, Martino L, Luengo D (2014) Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recogn 47(3):1535–1546
Article Google Scholar
Carmona MA, Pellegrin L, Montes M, Sánchez-Vega F, Escalante HJ, López-Monroy A, Villaseñor-Pineda L, Villatoro-Tello E (2018) A visual approach for age and gender identification on twitter. J Intell Fuzzy Syst 34:3133–3145. https://doi.org/10.3233/JIFS-169497
Guimarães R, Rodríguez DZ, Rosa RL, Bressan G (2016) Recommendation system using sentiment analysis considering the polarity of the adverb. In: 2016 IEEE International Symposium on Consumer Electronics (ISCE), pp. 71–72. IEEE
Rosa RL, De Silva MJ, Silva DH, Ayub MS, Carrillo D, Nardelli PHJ, Rodríguez DZ (2020) Event detection system based on user behavior changes in online social networks: Case of the covid-19 pandemic. IEEE Access 8:158806–158825. https://doi.org/10.1109/ACCESS.2020.3020391
Rosa RL, Rodriguez DZ, Bressan G (2013) Sentimeter-br: A new social web analysis metric to discover consumers’ sentiment. In: 2013 IEEE International Symposium on Consumer Electronics (ISCE), pp. 153–154. IEEE
Cardoso ONP (2004) Recuperação de informação. INFOCOMP J Comput Sci 2(1):33–38
Google Scholar
Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education India
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794
Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 616–623
Rosa RL, Rodriguez DZ, Bressan G (2013) Sentimeter-br: A social web analysis tool to discover consumers’ sentiment. In: 2013 IEEE 14th International Conference on Mobile Data Management 2:122–124. https://doi.org/10.1109/MDM.2013.80
Darwich M, Noah SAM, Omar N (2020) Deriving the sentiment polarity of term senses using dual-step context-aware in-gloss matching. Inf Process Manag 57(6):102273. https://doi.org/10.1016/j.ipm.2020.102273
Article Google Scholar
Ramos BL, Lasmar E, Rosa RL, Rodriguez DZ, Grutzman A (2018) Calculating the influence of tagging people on sentiment analysis. In: 2018 26th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1–6. IEEE
Rosa RL, Rodríguez DZ, Schwartz GM, de Campos Ribeiro I, Bressan G (2016) Monitoring system for potential users with depression using sentiment analysis. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 381–382. https://doi.org/10.1109/ICCE.2016.7430656
Jain A, Shakya A, Khatter H, Gupta AK (2019) A smart system for fake news detection using machine learning. In: 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) 1:1–4. https://doi.org/10.1109/ICICT46931.2019.8977659
Mandical RR, Mamatha N, Shivakumar N, Monica R, Krishna AN (2020) Identification of fake news using machine learning. In: 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), pp. 1–6. https://doi.org/10.1109/CONECCT50063.2020.9198610
Reis JCS, Correia A, Murai F, Veloso A, Benevenuto F (2019) Supervised learning for fake news detection. IEEE Intell Syst 34(2):76–81. https://doi.org/10.1109/MIS.2019.2899143
Article Google Scholar
Lasmar EL, de Paula FO, Rosa RL, Abrahão JI, Rodríguez DZ (2019) Rsrs: Ridesharing recommendation system based on social networks to improve the user’s qoe. IEEE Transactions on Intelligent Transportation Systems 20(12):4728–4740. https://doi.org/10.1109/TITS.2019.2945793
Margaris D, Vassilakis C, Spiliotopoulos D (2020) What makes a review a reliable rating in recommender systems? Inf Process Manag 57(6):102304. https://doi.org/10.1016/j.ipm.2020.102304
Rosa RL, Lasmar Junior EL, Zegarra Rodríguez D (2018) A recommendation system for shared-use mobility service through data extracted from online social networks. Journal of Communications Software and Systems 14(4):359–366
Google Scholar
Alhijawi B, Hriez S, Awajan A (2018) Text-based authorship identification-a survey. In: 2018 Fifth International Symposium on Innovation in Information and Communication Technology (ISIICT), pp. 1–7. IEEE
AlSukhni, E., Alequr, Q.: Investigating the use of machine learning algorithms in detecting gender of the arabic tweet
Affonso ET, Rodríguez DZ, Rosa RL, Andrade T, Bressan G (2016) Voice quality assessment in mobile devices considering different fading models. In: 2016 IEEE International Symposium on Consumer Electronics (ISCE), pp. 21–22. https://doi.org/10.1109/ISCE.2016.7797329
Al-Ghadir AI, Azmi AM (2019) A study of arabic social media users-posting behavior and author’s gender prediction. Cogn Comput 11(1):71–86
Article Google Scholar
Alrifai K, Rebdawi G, Ghneim N (2017) Arabic tweeps gender and dialect prediction. In: CLEF (Working Notes)
Aravantinou C, Simaki V, Mporas I, Megalooikonomou V (2015) Gender classification of web authors using feature selection and language models. In: International Conference on Speech and Computer, pp. 226–233. Springer
Bayot R, Gonçalves T (2016) Multilingual author profiling using word embedding averages and svms. In: 2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), pp. 382–386. IEEE
Briedienė M, Kapočiutė-Dzikienė J (2018) An automatic author profiling from non-normative lithuanian texts. In: CEUR Workshop proceedings [electronic resource]: IVUS 2018, International conference on information technologies, Kaunas, Lithuania, 27 April, 2018. Aachen: CEUR-WS, 2018, 2145
Bsir B, Zrigui M (2018) Bidirectional lstm for author gender identification. In: International Conference on Computational Collective Intelligence, pp. 393–402. Springer
Bsir B, Zrigui M (2018) Enhancing deep learning gender identification with gated recurrent units architecture in social text. Computación y Sistemas 22(3):757–766
Article Google Scholar
Cheng N, Chandramouli R, Subbalakshmi K (2011) Author gender identification from text. Digit Investig 8(1):78–88
Article Google Scholar
Cheng N, Chen X, Chandramouli R, Subbalakshmi K (2009) Gender identification from e-mails. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 154–158. IEEE
Ciobanu AM, Zampieri M, Malmasi S, Dinu LP (2017) Including dialects and language varieties in author profiling. arXiv preprint arXiv:1707.00621
Dwivedi VP, Singh DK, Jha S et al (2017) Gender classification of blog authors: With feature engineering and deep learning using lstm networks. In: 2017 Ninth International Conference on Advanced Computing (ICoAC), pp. 142–148. IEEE
Liu H, Cocea M (2018) Fuzzy rule based systems for gender classification from blog data. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), pp. 79–84. IEEE
Markov I, Gómez-Adorno H, Posadas-Durán JP, Sidorov G, Gelbukh A (2016) Author profiling with doc2vec neural network-based document embeddings. In: Mexican International Conference on Artificial Intelligence, pp. 117–131. Springer
Markov I, Gómez-Adorno H, Sidorov G (2017) Language-and subtask-dependent feature selection and classifier parameter tuning for author profiling. In: CLEF (Working Notes)
Modaresi P, Liebeck M, Conrad S (2016) Exploring the effects of cross-genre machine learning for author profiling in pan 2016. In: CLEF (Working Notes), pp. 970–977
Pandya A, Oussalah M, Monachesi P, Kostakos P, Lovén L (2018) On the use of urls and hashtags in age prediction of twitter users. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 62–69. IEEE
Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and mining user-generated contents, pp. 37–44
Reddy TR, Vardhan BV, Reddy PV (2017) N-gram approach for gender prediction. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 860–865. IEEE
Schaetti N (2017) Unine at clef 2017: Tf-idf and deep-learning for author profiling. In: CLEF (Working Notes)
Simaki V, Aravantinou C, Mporas I, Megalooikonomou V (2015) Using sociolinguistic inspired features for gender classification of web authors. In: International Conference on Text, Speech, and Dialogue, pp. 587–594. Springer
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME et al (2013) Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS One 8(9):e73791
Article Google Scholar
Alowibdi JS, Buy UA, Yu P (2013) Empirical evaluation of profile characteristics for gender classification on twitter. In: 2013 12th International Conference on Machine Learning and Applications 1:365–369. IEEE
Alowibdi JS, Buy UA, Yu P (2013) Language independent gender classification on twitter. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 739–743
Scholefield P (1966) On the correlation function of the chi-square process. Proceedings of the IEEE 54(11):1573–1574. https://doi.org/10.1109/PROC.1966.5191
Article Google Scholar
Adeniran A, Jadah H, Mohammed N (2020) Impact of information technology on strategic management in the banking sector of Iraq. Insights into Regional Development 2(2):592–601
Article Google Scholar
Nunes RD, Rosa RL, Rodríguez DZ (2019) Performance improvement of a non-intrusive voice quality metric in lossy networks. IET Commun 13(20):3401–3408
Article Google Scholar
Rodríguez DZ, Möller S (2019) Speech quality parametric model that considers wireless network characteristics. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. https://doi.org/10.1109/QoMEX.2019.8743346
Zhang ML, Zhou ZH (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Ceri S, Fraternali P, Bongio A, Brambilla M, Comai S, Matera M (2003) Morgan Kaufmann series in data management systems: Designing data-intensive Web applications. Morgan Kaufmann
Pereira RB, Plastino A, Zadrozny B, Merschmann LH (2018) Correlation analysis of performance measures for multi-label classification. Inf Process Manag 54(3):359–369
Article Google Scholar
Asim MN, Rehman A, Shoaib U (2017) Accuracy based feature ranking metric for multi-label text classification. Int J Adv Comput Sci Appl 8(10)
Szymański P, Kajdanowicz T (2017) A network perspective on stratification of multi-label data. In: First International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 22–35. PMLR
Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) Stac: A web platform for the comparison of algorithms using statistical tests. In: 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8. https://doi.org/10.1109/FUZZ-IEEE.2015.7337889
Beasley TM, Zumbo BD (2003) Comparison of aligned friedman rank and parametric methods for testing interactions in split-plot designs. Comput Stat Data Anal 42(4):569–593
Article MathSciNet Google Scholar
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88(423):920–923
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Departament of Computer Science, Federal University of Lavras, Lavras, 37200-900, Brazil
Douglas H. Silva, Erick G. Maziero, Renata L. Rosa & Demostenes Z. Rodriguez
Department of Electrical Engineering, Faculty of Engineering, University of Central Punjab, Lahore, Pakistan
Muhammad Saadi
Department of Sciences, Pontifical Catholic University of Peru, Lima, Peru
Juan C. Silva
South Ural State University, 76, Lenin Avenue, Chelyabinsk, 454080, Russia
Kostromitin K. Igorevich

Authors

Douglas H. Silva
View author publications
You can also search for this author in PubMed Google Scholar
Erick G. Maziero
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Saadi
View author publications
You can also search for this author in PubMed Google Scholar
Renata L. Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Juan C. Silva
View author publications
You can also search for this author in PubMed Google Scholar
Demostenes Z. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Kostromitin K. Igorevich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Saadi.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, D.H., Maziero, E.G., Saadi, M. et al. Big data analytics for critical information classification in online social networks using classifier chains. Peer-to-Peer Netw. Appl. 15, 626–641 (2022). https://doi.org/10.1007/s12083-021-01269-1

Download citation

Received: 04 March 2021
Accepted: 06 November 2021
Published: 10 January 2022
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12083-021-01269-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics for critical information classification in online social networks using classifier chains

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Social media analytics: a survey of techniques, tools and platforms

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Big data analytics for critical information classification in online social networks using classifier chains

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Social media analytics: a survey of techniques, tools and platforms

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation