Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog

  • Rui Gao
  • Bibo Hao
  • He Li
  • Yusong Gao
  • Tingshao Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8211)


The words that people use could reveal their emotional states, intentions, thinking styles, individual differences, etc. LIWC (Linguistic Inquiry and Word Count) has been widely used for psychological text analysis, and its dictionary is the core. The Traditional Chinese version of LIWC dictionary has been released, which is a translation of LIWC English dictionary. However, Simplified Chinese which is the world’s most widely used language has subtle differences with Traditional Chinese. Furthermore, both English LIWC dictionary and Traditional Chinese version dictionary were both developed for relatively formal text. Microblog has become more and more popular in China nowadays. Original LIWC dictionaries take less consideration on microblog popular words, which makes it less applicable for text analysis on microblog. In this study, a Simplified Chinese LIWC dictionary is established according to LIWC categories. After translating Traditional Chinese dictionary into Simplified Chinese, five thousand words most frequently used in microblog are added into the dictionary. Four graduate students of psychology rated whether each word belonged in a category. The reliability and validity of Simplified Chinese LIWC dictionary were tested by these four judges. This new dictionary could contribute to all the text analysis on microblog in future.


LIWC Traditional Chinese Simplified Chinese microblog text analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15), 5802–5805 (2013)CrossRefGoogle Scholar
  2. 2.
    Tumasjan, A., et al.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In: ICWSM, pp. 178–185 (2010)Google Scholar
  3. 3.
    Ding, X., et al.: De-anonymizing Dynamic Social Networks. In: 2011 IEEE Global Telecommunications Conference, Globecom 2011 (2011)Google Scholar
  4. 4.
    Ebner, M., et al.: Microblogs in Higher Education - A chance to facilitate informal and process-oriented learning? Computers & Education 55(1), 92–100 (2010)CrossRefGoogle Scholar
  5. 5.
    Eysenbach, G.: Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. Journal of Medical Internet Research 11(1) (2009)Google Scholar
  6. 6.
    Jansen, B.J., et al.: Twitter Power: Tweets as Electronic Word of Mouth. Journal of the American Society for Information Science and Technology 60(11), 2169–2188 (2009)CrossRefGoogle Scholar
  7. 7.
    Narayanan, A., Shmatikov, V.: De-anonymizing Social Networks. In: Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, pp. 173–187 (2009)Google Scholar
  8. 8.
    Pennebaker, J.W., et al.: The Development and Psychometric Properties of LIWC2007 (2007)Google Scholar
  9. 9.
    Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology 29(1), 24–54 (2010)CrossRefGoogle Scholar
  10. 10.
    Choy, M.: Effective Listings of Function Stop words for Twitter (IJACSA) International Journal of Advanced Computer Science and Applications 3(6), 8–11 (2012)Google Scholar
  11. 11.
    Golbeck, J., Robles, C., Turner, K.: Predicting personality with social media. In: CHI 2011 Extended Abstracts on Human Factors in Computing Systems, pp. 253–262. ACM, Vancouver (2011)CrossRefGoogle Scholar
  12. 12.
    Golbeck, J., Robler, J., Edmondson, M., Turner, K.: Predicting Personality from Twitter. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), Boston, USA, pp. 149–156 (2011)Google Scholar
  13. 13.
    Piolat, A., et al.: The French dictionary for LIWC: Modalities of construction and examples of use. Psychologie Francaise 56(3), 145–159 (2011)CrossRefGoogle Scholar
  14. 14.
    Huang, C.-L., et al.: The Development of the Chinese Linguistic Inquiry and Word Count Dictionary. Chinese Journal of Psychology 55(2), 185–201 (2012)Google Scholar
  15. 15.
    Lowe, W.: Software for content analysis–A review (2013)Google Scholar
  16. 16.
    Borelli, J.L., et al.: Experiential connectedness in children’s attachment interviews: An examination of natural word use. Personal Relationships 18(3), 341–351 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ireland, M.E., Pennebaker, J.W.: Language Style Matching in Writing: Synchrony in Essays, Correspondence, and Poetry. Journal of Personality and Social Psychology 99(3), 549–571 (2010)CrossRefGoogle Scholar
  18. 18.
    Ireland, M.E., et al.: Language Style Matching Predicts Relationship Initiation and Stability. Psychological Science 22(1), 39–44 (2011)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Tumasjan, A., et al.: Election Forecasts With Twitter: How 140 Characters Reflect the Political Landscape. Social Science Computer Review 29(4), 402–418 (2011)CrossRefGoogle Scholar
  20. 20.
    Zehrer, A., Crotts, J.C., Magnini, V.P.: The perceived usefulness of blog postings: An extension of the expectancy-disconfirmation paradigm. Tourism Management 32(1), 106–113 (2011)CrossRefGoogle Scholar
  21. 21.
    Peng, G., Minett, J.W., Wang, W.S.Y.: Cultural background influences the liminal perception of Chinese characters: An ERP study. Journal of Neurolinguistics 23(4), 416–426 (2010)CrossRefGoogle Scholar
  22. 22.
    Chung, F.H.-K., Leung, M.-T.: Data analysis of Chinese characters in primary school corpora of Hong Kong and mainland China: preliminary theoretical interpretations. Clinical Linguistics & Phonetics 22(4-5), 379–389 (2008)CrossRefGoogle Scholar
  23. 23.
    Chung, W.Y., et al.: Internet searching and browsing in a multilingual world: An experiment on the Chinese Business Intelligence Portal (CBizPort). Journal of the American Society for Information Science and Technology 55(9), 818–831 (2004)CrossRefGoogle Scholar
  24. 24.
    Ramirez-Esparza, N., et al.: The psychology of word use: A computer program that analyzes texts in Spanish. Revista Mexicana De Psicologia 24(1), 85–99 (2007)Google Scholar
  25. 25.
    Akers, G.A.: LogoMedia TRANSLATE (TM), version 2.0. In: Richardson, S.D. (ed.) Machine Translation: From Research to Real Users, pp. 220–223 (2002)Google Scholar
  26. 26.
    Al-Dubaee, S.A., Ahmad, N.: New Direction of Applied Wavelet Transform in Multilingual Web Information Retrieval. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008 (2008)Google Scholar
  27. 27.
    Zhang, H.-P., et al.: Chinese lexical analysis using hierarchical hidden markov model. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, vol. 17. Association for Computational Linguistics (2003)Google Scholar
  28. 28.
    Zhang, H.-P., et al.: HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, vol. 17. Association for Computational Linguistics (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Rui Gao
    • 1
  • Bibo Hao
    • 1
  • He Li
    • 2
  • Yusong Gao
    • 1
  • Tingshao Zhu
    • 1
  1. 1.Institute of PsychologyUniversity of Chinese Academy of Sciences, Chinese Academy of SciencesBeijingP.R. China
  2. 2.National Computer System Engineering Research Institute of ChinaBeijingP.R. China

Personalised recommendations