Abstract
In Author Profiling research, there is a growing interest in lexical resources providing various psychologically meaningful word categories. One of such instruments is Linguistic Inquiry and Word Count, which was compiled manually in English and translated into many other languages. We argue that the resource contains a lot of subjectivity, which is further increased in the translation process. As a result, the translated lexical resource is not linguistically transparent. In order to address this issue, we translate the resource from English to Russian semi-automatically, analyze the translation in terms of agreement and match the resulting translation with two Russian linguistic thesauri. One of the thesauri is chosen as a better match for the psychologically meaningful categories in question. We further apply the linguistic thesaurus to analyze the psychologically meaningful word categories in two Author Profiling tasks based on Russian texts. Our results indicate that linguistically-motivated thesauri not only provide objective and linguistically motivated content, but also result in significant correlates of certain psychological states, replicating evidence obtained with hand-crafted lexical resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Freely available for search and download at https://rusidiolect.rusprofilinglab.ru/.
References
Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. Trans. Am. Math. Soc. 77(6), 1296 (1999)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Boyd, R.L., Pennebaker, J.W.: Language-based personality: a new approach to personality in a digital world. Curr. Opin. Behav. Sci. 18, 63–68 (2017)
Kailer, A., Chung, C.K.: The Russian LIWC2007 dictionary. LIWC.net, Technical report (2011)
Gao, R., Hao, B., Li, H., Gao, Y., Zhu, T.: Developing simplified chinese psychological linguistic analysis dictionary for microblog. In: Imamura, K., Usui, S., Shirao, T., Kasamatsu, T., Schwabe, L., Zhong, N. (eds.) BHI 2013. LNCS (LNAI), vol. 8211, pp. 359–368. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02753-1_36
Bjekić, J., Lazarević, L.B., Živanović, M., Knežević, G.: Psychometric evaluation of the Serbian dictionary for automatic text analysis-LIWCser. Psihologija 47(1), 5–32 (2014)
Van Wissen, L., Boot, P.: An electronic translation of the LIWC Dictionary into Dutch. In: Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference, pp. 703–715. Lexical Computing (2017)
Meier, T., et al.: “LIWC auf Deutsch”: the development, psychometrics, and introduction of DE-LIWC2015. PsyArXiv (2019)
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of LIWC2015. The University of Texas at Austin (2015)
Litvinova, T., Litvinova, O., Seredin, P.: Dynamics of an idiostyle of a Russian suicidal blogger. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 158–167. Association for Computational Linguistics (2018)
Litvinova, T., Seredin, P., Litvinova, O., Dankova, T., Zagorovskaya, O.: On the stability of some idiolectal features. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 331–336. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_35
Pennebaker, J.W.: The secret life of pronouns. New Sci. 211(2828), 42–45 (2011)
Lukashevich, N.V.: Tezaurusy v zadachakh informatsionnogo poiska (Thesauri in Information Retrieval Problems), Moscow, Mosk. Gos. Univ (2011)
Loukachevitch, N., Dobrov, B.V.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)
Babenko, L.G.: Slovar’ sinonimov russkogo yazyka [Dictionary of synonyms of the Russian language]. Astrel, Moscow (2011)
Settanni, M., Azucar, D., Marengo, D.: Predicting individual characteristics from digital traces on social media: a meta-analysis. Cyberpsychol. Behav. Soc. Netw. 21(4), 217–228 (2018)
Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)
Yarkoni, T.: Personality in 100,000 words: a large-scale analysis of personality and word use among bloggers. J. Res. Pers. 44(3), 363–373 (2010)
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30, 457–500 (2007)
Luhmann, M.: Using big data to study subjective well-being. Curr. Opin. Behav. Sci. 18, 28–33 (2017)
Wang, N., Kosinski, M., Stillwell, D.J., Rust, J.: Can well-being be measured using Facebook status updates? Validation of Facebook’s Gross National Happiness Index. Soc. Indic. Res. 115(1), 483–491 (2014)
Settanni, M., Marengo, D.: Sharing feelings online: studying emotional well-being via automated text analysis of Facebook posts. Front. Psychol. 6, 1045 (2015)
Wojcik, S.P., Hovasapian, A., Graham, J., Motyl, M., Ditto, P.H.: Conservatives report, but liberals display, greater happiness. Science 347(6227), 1243–1246 (2015)
Jones, N.M., Wojcik, S.P., Sweeting, J., Silver, R.C.: Tweeting negative emotion: an investigation of Twitter data in the aftermath of violence on college campuses. Psychol. Methods 21(4), 526 (2016)
Hofmann, S.G., Moore, P.M., Gutner, C., Weeks, J.W.: Linguistic correlates of social anxiety disorder. Cogn. Emot. 26(4), 720–726 (2012)
Coppersmith, G., Dredze, M., Harman, C.: Quantifying mental health signals in Twitter. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 51–60 (2014)
Wang, W., Hernandez, I., Newman, D.A., He, J., Bian, J.: Twitter analysis: studying US weekly trends in work stress and emotion. Appl. Psychol. 65(2), 355–378 (2016)
Doré, B., Ort, L., Braverman, O., Ochsner, K.N.: Sadness shifts to anxiety over time and distance from the national tragedy in Newtown, Connecticut. Psychol. Sci. 26(4), 363–373 (2015)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Yu., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31
McCrae, R.R., Costa Jr., P.T.: Personality trait structure as a human universal. Am. Psychol. 52(5), 509 (1997)
Snaith, R.P.: The hospital anxiety and depression scale. Health Qual. Life Outcomes 1(1), 29 (2003)
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020)
Litvinova, T., Litvinova, O., Zagorovskaya, O., Seredin, P., Sboev, A., Romanchenko, O.: Ruspersonality: a Russian corpus for authorship profiling and deception detection. In: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), pp. 1–7. IEEE (2016)
Litvinova, T., Seredin, P., Litvinova, O., Ryzhkova, E.: Estimating the similarities between texts of right-handed and left-handed males and females. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 119–124. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_11
Acknowledgement
The authors acknowledge support of this study by the Russian Science Foundation grant №18-78-10081. The authors are grateful for the considerations provided by the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Panicheva, P., Litvinova, T. (2020). Matching LIWC with Russian Thesauri: An Exploratory Study. In: Filchenkov, A., Kauttonen, J., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham. https://doi.org/10.1007/978-3-030-59082-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59082-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59081-9
Online ISBN: 978-3-030-59082-6
eBook Packages: Computer ScienceComputer Science (R0)