Abstract
As web data evolves, new technological challenges arise and one of the contributing factors to these challenges is the online social networks. Although they have some benefits, their negative impact on vulnerable users such as the spread of suicidal ideation is concerning. As such, it is vital to fine tune the approaches and techniques in order to understand the users and their context for early intervention. Therefore, in this study, we measured the impact of data manipulation and feature extraction, specifically using N-grams, on suicide-related social network text (tweets). We propose a diversified ensemble approach (multi-classifier fusion) to improve the detection of suicide-related text classification. Four machine classifiers were used for the fusion: Support Vector Machine, Random Forest, Naïve Bayes and Decision Tree. The results of our proposed approach have shown that the multi-classifier fusion has improved the detection of suicide-related text and, also, that Support Vector Machine has shown some promising results when dealing with multi-class datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abboute, A., Boudjeriou, Y., Entringer, G., Azé, J., Bringay, S., Poncelet, P.: Mining twitter for suicide prevention. In: International Conference on Applications of Natural Language to Data Bases/Information Systems, pp 250–253. Springer (2014)
Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7(3), 176–204 (2015)
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. In: Proceedings of KDD Bigdas, Halifax, Canada, August 2017, p. 13 (2017)
Banks, J.: Regulating hate speech online. Int. Rev. Law Comput. Technol. 24(3), 233–239 (2010)
Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44 (2010)
Burnap, P., Williams, M.L.: Hate speech, machine classification and statistical modelling of information flows on twitter: interpretation and communication for policy decision making. Proc. IPP 2014, 1–18 (2014)
Burnap, P., Williams, M.L.: Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015)
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016)
Burnap, P., Colombo, W., Scourfield, J.: Machine classification and analysis of suicide-related communication on twitter. In: Proceedings of the 26th ACM Conference on Hypertext & Social Media, pp. 75–84. ACM (2015)
Cavazos-Rehg, P.A., Krauss, M.J., Sowles, S., Connolly, S., Rosas, C., Bharadwaj, M., Bierut, L.J.: A content analysis of depression-related tweets. Comput. Hum. Behav. 54, 351–357 (2016)
Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M.: Crime data mining: a general framework and some examples. Computer 37(4), 50–56 (2004)
Chiroma, F., Liu, H., Cocea, M.: Text classification for suicide related tweets. In: 2018 International Conference on Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 587–592. IEEE (2018)
Colombo, G.B., Burnap, P., Hodorog, A., Scourfield, J.: Analysing the connectivity and communication of suicidal users on twitter. Comput. Commun. 73, 291–300 (2016)
Corcoran, H., Smith, K.: Hate crime, England and Wales, 2015/16 (2016). https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/559319/hate-crime-1516-hosb1116.pdf
Dipnall, J.F., Pasco, J.A., Berk, M., Williams, L.J., Dodd, S., Jacka, F.N., Meyer, D.: Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression. PLoS ONE 11(2), 1–23 (2016)
Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013)
Jashinsky, J., Burton, S.H., Hanson, C.L., West, J., Giraud-Carrier, C., Barnes, M.D., Argyle, T.: Tracking suicide risk factors through Twitter in the US. Crisis 35(1), 51–59 (2014)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017)
McGovern, A., Milivojevic, S.: Social media and crime: the good, the bad and the ugly (2016). https://theconversation.com/social-media-and-crime-the-good-the-bad-and-the-ugly-66397
O’Dea, B., Wan, S., Batterham, P.J., Calear, A.L., Paris, C., Christensen, H.: Detecting suicidality on twitter. Internet Interv. 2(2), 183–188 (2015)
Picek, S., Heuser, A., Jović, A., Bhasin, S., Regazzoni, F.: The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans. Cryptographic Hardware Embed. Syst. 2019(1), 209–237 (2018)
Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion-a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 8(4), e1249 (2018)
Schmidt, P.: Human rights online (2018). http://www.inach.net/wp-content/uploads/2018/05/INACH_HumanRightsOnline.pdf
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
Steele, S.M.: Program evaluation-a broader definition. J. Extension 8(2), 5–17 (1970)
Sueki, H.: The association of suicide-related Twitter use with suicidal behaviour: a cross-sectional study of young internet users in Japan. J. Affect. Disord. 170(September 2014), 155–160 (2015)
Tang, B., He, H., Baggenstoss, P.M., Kay, S.: A bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28(6), 1602–1606 (2016)
Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based Syst. 24(7), 1024–1032 (2011)
Won, H.H., Myung, W., Song, G.Y., Lee, W.H., Kim, J.W., Carroll, B.J., Kim, D.K.: Predicting national suicide numbers with social media data. PLoS ONE 8(4) (2013). https://doi.org/10.1371/journal.pone.0061809
Yao, J., Zhang, J., Wang, L.: A financial statement fraud detection model based on hybrid data mining methods. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 57–61. IEEE (2018)
Acknowledgement
This is an independent research that is supported by the Petroleum Development Technology Fund (PTDF) and the Department of Health Policy Research Programme (Understanding the Role of Social Media in the Aftermath of Youth Suicides, Project Number 023/0165). The views expressed in this publication are those of the authors and not necessarily those of PTDF or Department of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chiroma, F., Cocea, M., Liu, H. (2020). Detection of Suicidal Twitter Posts. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-29933-0_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)