Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users

  • Lei Zhang
  • Xiaolei Huang
  • Tianli Liu
  • Ang Li
  • Zhenxiang ChenEmail author
  • Tingshao ZhuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8944)


If people with high risk of suicide can be identified through social media like microblog, it is possible to implement an active intervention system to save their lives. Based on this motivation, the current study administered the Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a leading microblog service provider in China. Two NLP (Natural Language Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count (LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract linguistic features from the Sina Weibo data. We trained predicting models by machine learning algorithm based on these two types of features, to estimate suicide probability based on linguistic features. The experiment results indicate that LDA can find topics that relate to suicide probability, and improve the performance of prediction. Our study adds value in prediction of suicidal probability of social network users with their behaviors.


Suicidal ideation Topic model LIWC Linguistic features Microblog 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adalı, S., Golbeck, J.: Predicting personality with social behavior: a comparative study. Social Network Analysis and Mining 4(1), 1–20 (2014)Google Scholar
  2. 2.
    Bai, S., Gao, R., Zhu, T.: Determining Personality Traits from Renren Status Usage Behavior. In: Hu, S.-M., Martin, R.R. (eds.) CVM 2012. LNCS, vol. 7633, pp. 226–233. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  3. 3.
    Bai, S., Hao, B., Li, A., Yuan, S., Gao, R., Zhu, T.: Predicting big five personality traits of microblog users. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 1, pp. 501–508, November 2013Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). zbMATHGoogle Scholar
  5. 5.
    Che, W., Li, Z., Liu, T.: Ltp: A chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16. Association for Computational Linguistics (2010)Google Scholar
  6. 6.
    Cull, J., Gill, W.: Suicide probability scale. Western Psychological Services, Los Angeles, CA. The Suicide Probability Scale is a proprietary instrument published by Western. Psychological Services 12031, 1997–2005 (1982)Google Scholar
  7. 7.
    Gao, R., Hao, B., Li, H., Gao, Y., Zhu, T.: Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog. In: Imamura, K., Usui, S., Shirao, T., Kasamatsu, T., Schwabe, L., Zhong, N. (eds.) BHI 2013. LNCS, vol. 8211, pp. 359–368. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  8. 8.
    Golbeck, J., Robles, C., Edmondson, M., Turner, K.: Predicting personality from twitter. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 149–156, October 2011Google Scholar
  9. 9.
    Golbeck, J., Hansen, D.: A method for computing political preference among twitter followers. Social Networks 36, 177–184 (2014). Special Issue on Political NetworksCrossRefGoogle Scholar
  10. 10.
    Golbeck, J., Robles, C., Turner, K.: Predicting personality with social media. In: CHI 2011 Extended Abstracts on Human Factors in Computing Systems, pp. 253–262. ACM, New York, NY, USA (2011).
  11. 11.
    Hao, B., Li, L., Gao, R., Li, A., Zhu, T.: Sensing subjective well-being from social media (2014). arXiv:1403.3807
  12. 12.
    Jashinsky, J., Burton, S.H., Hanson, C.L., West, J., Giraud-Carrier, C., Barnes, M.D., Argyle, T.: Tracking suicide risk factors through twitter in the us. Crisis: The Journal of Crisis Intervention and Suicide Prevention 35(1), 51 (2014)Google Scholar
  13. 13.
    from Jed Wing, M.K.C., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., the R Core Team: caret: Classification and Regression Training (2014),, r package version 6.0-35
  14. 14.
    Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15), 5802–5805 (2013). CrossRefGoogle Scholar
  15. 15.
    Pennebaker, J.W., Francis, M.F., Booth, R.J.: Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001)Google Scholar
  16. 16.
    Quercia, D., Kosinski, M., Stillwell, D., Crowcroft, J.: Our twitter profiles, our selves: Predicting personality with twitter. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), pp. 180–185, October 2011Google Scholar
  17. 17.
    Resnik, P., Garron, A., Resnik, R.: Using topic modeling to improve prediction of neuroticism and depression in college students. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1348–1353. Association for Computational Linguistics, Seattle, Washington, USA, October 2013.
  18. 18.
    Ruder, T.D., Hatch, G.M., Ampanozi, G., Thali, M.J., Fischer, N.: Suicide announcement on facebook. Crisis: The Journal of Crisis Intervention and Suicide Prevention 32(5), 280–282 (2011)Google Scholar
  19. 19.
    Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8(9), e73791 (09 2013)Google Scholar
  20. 20.
    Silenzio, V., Duberstein, P.R., Tang, W., Lu, N., Tu, X., Homan, C.M.: Connecting the invisible dots: Reaching lesbian, gay, and bisexual adolescents and young adults at risk for suicide through online social networks. Social Science & Medicine 69(3), 469–474 (2009)CrossRefGoogle Scholar
  21. 21.
    Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306–315. ACM (2004)Google Scholar
  22. 22.
    Wald, R., Khoshgoftaar, T.M., Napolitano, A., Sumner, C.: Using twitter content to predict psychopathy. In: 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 394–401. IEEE (2012)Google Scholar
  23. 23.
    Zhu, Y.X., Lu, L.Y.: Evaluation metrics for recommender systems. Journal of University of Electronic Science and Technology of China 41(2), 163–175 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute of Psychology, Chinese Academy of Sciences (CAS)BeijingChina
  2. 2.University of JinanShandongChina
  3. 3.China Networking Information Center, Chinese Academy of SciencesBeijingChina
  4. 4.Institute of Population Research, Peking UniversityBeijingChina
  5. 5.Black Dog Institute, University of New South WalesSydneyAustralia
  6. 6.Key Lab of Intelligent Information Processing, Institute of Computing Technology, CASBeijingChina

Personalised recommendations