Skip to main content

Writer Profiling Without the Writer’s Text

  • Conference paper
  • First Online:
Social Informatics (SocInfo 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10540))

Included in the following conference series:

Abstract

Social network users may wish to preserve their anonymity online by masking their identity and not using language associated with any particular demographics or personality. However, they have no control over the language in incoming communications. We show that linguistic cues in public comments directed at a user are sufficient for an accurate inference of that user’s gender, age, religion, diet, and even personality traits. Moreover, we show that directed communication is even more predictive of a user’s profile than the user’s own language. We then conduct a nuanced analysis of what types of social relationships are most predictive of users’ attributes, and propose new strategies on how individuals can modulate their online social relationships and incoming communications to preserve their anonymity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We note that both gender and extroversion may also be considered along a spectrum [36, 37]. We opt to study these as binary variables here due to lack of continuous-valued gender and extroversion ratings for social media users.

  2. 2.

    We note that other diets are possible, such as kosher or halal; however, these are closely related to religion, which we also study, so we intentionally exclude them.

  3. 3.

    Additional queries were formed for Sikhism and Jainism which did not return sufficient numbers of English speaking individuals to be included.

  4. 4.

    Due to Twitter API rate limits, full edge information was gathered only for 1.7M pairs.

  5. 5.

    One possibility for testing this hypothesis in future work is to identify a cohort of individuals who publicly signal these variables in an explicit way (e.g., including religious imagery in their profile picture) and then test for effects of tie strength on their peers’ predictiveness.

  6. 6.

    This risk is valid even if the individual themselves does not engage with others, as platforms such as Twitter allow anyone to directly message another unless banned.

  7. 7.

    We note that while we measure topical difference using our LDA model for messages, the peers selected by maximizing topical difference would be easily identified as such by the layperson (e.g., a peer discussing completely different topics).

References

  1. Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of Twitter users from neighbors. In: Proceedings of ICWSM (2012)

    Google Scholar 

  2. Almishari, M., Oguz, E., Tsudik, G.: Fighting authorship linkability with crowdsourcing. In: Proceedings of COSN, pp. 69–82. ACM (2014)

    Google Scholar 

  3. Altenburger, K.M., Ugander, J.: Bias and variance in the social structure of gender. arXiv preprint arXiv:1705.04774 (2017)

  4. Anderson, C., John, O.P., Keltner, D., Kring, A.M.: Who attains social status? effects of personality and physical attractiveness in social groups. J. Pers. Soc. Psychol. 81(1), 116 (2001)

    Article  Google Scholar 

  5. Baker, W., Bowie, D.: Religious affiliation as a correlate of linguistic behavior. Univ. Pennsylvania Work. Pap. Linguist. 15(2), 2 (2010)

    Google Scholar 

  6. Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Sociolinguist. 18(2), 135–160 (2014)

    Article  Google Scholar 

  7. Barbieri, F.: Patterns of age-based linguistic variation in American English. J. Sociolinguist. 12(1), 58–88 (2008)

    Article  MathSciNet  Google Scholar 

  8. Beller, C., Knowles, R., Harman, C., Bergsma, S., Mitchell, M., Van Durme, B.: I’m a belieber: social roles via self-identification and conceptual attributes. In: Proceedings of ACL, pp. 181–186 (2014)

    Google Scholar 

  9. Benton, A., Mitchell, M., Hovy, D.: Multitask learning for mental health conditions with limited social media data. In: Proceedings of EACL (2017)

    Google Scholar 

  10. Bergsma, S., Van Durme, B.: Using conceptual class attributes to characterize social media users. In: Proceedings of ACL (2013)

    Google Scholar 

  11. Best, P., Manktelow, R., Taylor, B.: Online communication, social media and adolescent wellbeing: a systematic narrative review. Child Youth Serv. Rev. 41, 27–36 (2014)

    Article  Google Scholar 

  12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)

    MATH  Google Scholar 

  13. Bogardus, E.S.: A social distance scale. Sociol. Soc. Res. 17, 265–271 (1933)

    Google Scholar 

  14. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  15. Brennan, M., Afroz, S., Greenstadt, R.: Adversarial stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. (TISSEC) 15(3), 12 (2012)

    Article  Google Scholar 

  16. Brysbaert, M., Warriner, A.B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46(3), 904–911 (2014)

    Article  Google Scholar 

  17. Bucholtz, M., Hall, K.: Identity and interaction: a sociocultural linguistic approach. Discourse Stud. 7(4–5), 585–614 (2005)

    Article  Google Scholar 

  18. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of EMNLP, pp. 1301–1309 (2011)

    Google Scholar 

  19. Carpenter, J., Preotiuc-Pietro, D., Flekova, L., Giorgi, S., Hagan, C., Kern, M.L., Buffone, A.E., Ungar, L., Seligman, M.E.: Real men don’t say “cute” using automatic language analysis to isolate inaccurate aspects of stereotypes. Soc. Psychol. Pers. Sci. 8, 310–322 (2016)

    Article  Google Scholar 

  20. Cesare, N., Grant, C., Nsoesie, E.O.: Detection of user demographics on social media: a review of methods and recommendations for best practices. arXiv preprint arXiv:1702.01807 (2017)

  21. Chen, L., Weber, I., Okulicz-Kozaryn, A.: U.S. religious landscape on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 544–560. Springer, Cham (2014). doi:10.1007/978-3-319-13734-6_38

    Google Scholar 

  22. Chen, X., Wang, Y., Agichtein, E., Wang, F.: A comparative study of demographic attribute inference in Twitter. In: Proceedings of ICWSM, vol. 15, pp. 590–593 (2015)

    Google Scholar 

  23. Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of Twitter users in non-English contexts. In: Proceedings of EMNLP, pp. 1136–1145 (2013)

    Google Scholar 

  24. Coates, J.: Language and Gender: A Reader. Wiley-Blackwell, Oxford (1998)

    Google Scholar 

  25. Coates, J.: Women, Men and Language: A Sociolinguistic Account of Gender Differences in Language. Routledge, Abingdon (2015)

    Google Scholar 

  26. Danescu-Niculescu-Mizil, C., Gamon, M., Dumais, S.: Mark my words!: linguistic style accommodation in social media. In: Proceedings of WWW, pp. 745–754. ACM (2011)

    Google Scholar 

  27. De Choudhury, M., De, S.: Mental health discourse on reddit: self-disclosure, social support, and anonymity. In: Proceedings of ICWSM (2014)

    Google Scholar 

  28. De Choudhury, M., Kiciman, E.: The language of social support in social media and its effect on suicidal ideation risk. In: Proceedings of ICWSM, pp. 32–41 (2017)

    Google Scholar 

  29. Derlega, V.J., Harris, M.S., Chaikin, A.L.: Self-disclosure reciprocity, liking and the deviant. J. Exp. Soc. Psychol. 9(4), 277–284 (1973)

    Article  Google Scholar 

  30. Dewaele, J.M.: Individual differences in the use of colloquial vocabulary: the effects of sociobiographical and psychological factors. In: Learning Vocabulary in a Second Language: Selection, Acquisition and Testing, pp. 127–153 (2004)

    Google Scholar 

  31. Duggan, M.: Mobile messaging and social media 2015. Pew Res. Center, 13 (2015)

    Google Scholar 

  32. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79228-4_1

    Chapter  Google Scholar 

  33. Eagly, A.H., Mladinic, A.: Gender stereotypes and attitudes toward women and men. Pers. Soc. Psychol. Bull. 15(4), 543–558 (1989)

    Article  Google Scholar 

  34. Eckert, P.: Jocks and Burnouts: Social Categories and Identity in the High School. Teachers College Press, New York (1989)

    Google Scholar 

  35. Eckert, P.: Age as a sociolinguistic variable. In: The Handbook of Sociolinguistics, pp. 151–167 (1997)

    Google Scholar 

  36. Eckert, P.: Variation and the indexical field. J. Sociolinguist. 12(4), 453–476 (2008)

    Article  Google Scholar 

  37. Eckert, P., McConnell-Ginet, S.: Language and Gender. Cambridge University Press, New York (2003)

    Book  Google Scholar 

  38. El-Arini, K., Paquet, U., Herbrich, R., Van Gael, J., Agüera y Arcas, B.: Transparent user models for personalization. In: Proceedings of KDD, pp. 678–686. ACM (2012)

    Google Scholar 

  39. Elgin, B., Robison, P.: How despots use Twitter to hunt dissidents. BloombergBusinessweek (2016). https://www.bloomberg.com/news/articles/2016-10-27/twitter-s-firehose-of-tweets-is-incredibly-valuable-and-just-as-dangerous

  40. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res 15(1), 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  41. Flekova, L., Gurevych, I.: Can we hide in the web? large scale simultaneous age and gender author profiling in social media. In: Proceedings of CLEF (2013)

    Google Scholar 

  42. Friedkin, N.: A test of structural features of Granovetter’s strength of weak ties theory. Soc. Netw. 2(4), 411–422 (1980)

    Article  Google Scholar 

  43. Garimella, A., Mihalcea, R.: Zooming in on gender differences in social media. In: Proceedings of the Workshop on Computational Modeling of Peoples Opinions, Personality, and Emotions in Social Media, pp. 1–10 (2016)

    Google Scholar 

  44. Gilbert, E., Karahalios, K.: Predicting tie strength with social media. In: Proceedings of CHI, pp. 211–220. ACM (2009)

    Google Scholar 

  45. Golbeck, J., Robles, C., Edmondson, M., Turner, K.: Predicting personality from Twitter. In: Proceedings of SocialCom, pp. 149–156. IEEE (2011)

    Google Scholar 

  46. Goldin, C., Rouse, C.: Orchestrating impartiality: the impact of “blind” auditions on female musicians. Technical report, National Bureau of Economic Research (1997)

    Google Scholar 

  47. Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers age and gender. In: Proceedings of ICWSM (2009)

    Google Scholar 

  48. Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. 78(6), 1360–1380 (1973)

    Article  Google Scholar 

  49. Hovy, D., Søgaard, A.: Tagging performance correlates with author age. In: Proceedings of ACL, pp. 483–488 (2015)

    Google Scholar 

  50. Hovy, D., Spruit, S.L.: The social impact of natural language processing. In: Proceedings of ACL, vol. 2, pp. 591–598 (2016)

    Google Scholar 

  51. John, O.P., Srivastava, S.: The big five trait taxonomy: history, measurement, and theoretical perspectives. In: Handbook of Personality: Theory and Research, vol. 2, pp. 102–138 (1999)

    Google Scholar 

  52. Kendall, S., Tannen, D., et al.: Gender and language in the workplace. In: Gender and Discourse, pp. 81–105. Sage, London (1997)

    Google Scholar 

  53. Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proc. Nat. Acad. Sci. (PNAS) 110(15), 5802–5805 (2013)

    Article  Google Scholar 

  54. Krackhardt, D., Nohria, N., Eccles, B.: The strength of strong ties. Netw. Knowl. Econ., 82 (2003)

    Google Scholar 

  55. Labov, W.: Sociolinguistic Patterns. University of Pennsylvania Press, Philadelphia (1972)

    Google Scholar 

  56. Lakoff, R.T., Bucholtz, M.: Language and Woman’s Place: Text and Commentaries, vol. 3. Oxford University Press, USA (2004)

    Google Scholar 

  57. Lea, M., Spears, R., de Groot, D.: Knowing me, knowing you: anonymity effects on social identity processes within groups. Pers. Soc. Psychol. Bull. 27(5), 526–537 (2001)

    Article  Google Scholar 

  58. Lin, N., Ensel, W.M., Vaughn, J.C.: Social resources and strength of ties: structural factors in occupational status attainment. Am. Sociol. Rev., 393–405 (1981)

    Google Scholar 

  59. Liviatan, I., Trope, Y., Liberman, N.: Interpersonal similarity as a social distance dimension: Implications for perception of others actions. J. Exp. Soc. Psychol. 44(5), 1256–1269 (2008)

    Article  Google Scholar 

  60. Lu, X., Ai, W., Liu, X., Li, Q., Wang, N., Huang, G., Mei, Q.: Learning from the ubiquitous language: an empirical analysis of emoji usage of smartphone users. In: Proceedings of Ubicomp, pp. 770–780. ACM (2016)

    Google Scholar 

  61. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. (JAIR) 30, 457–500 (2007)

    MATH  Google Scholar 

  62. Marder, B., Joinson, A., Shankar, A., Thirlaway, K.: Strength matters: self-presentation to the strongest audience rather than lowest common denominator when faced with multiple audiences in social network sites. Comput. Hum. Behav. 61, 56–62 (2016)

    Article  Google Scholar 

  63. Marwick, A.E., Boyd, D.: I tweet honestly, i tweet passionately: Twitter users, context collapse, and the imagined audience. New Media Soc. 13(1), 114–133 (2011)

    Article  Google Scholar 

  64. McCandless, M.: Accuracy and performance of Google’s compact language detector. Blog post (2010)

    Google Scholar 

  65. McCrae, R.R., Costa, P.T.: Reinterpreting the Myers-Briggs type indicator from the perspective of the five-factor model of personality. J. Pers. 57(1), 17–40 (1989)

    Article  Google Scholar 

  66. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)

    Article  Google Scholar 

  67. Milroy, J.: Linguistic variation and change: on the historical sociolinguistics of English. B. Blackwell (1992)

    Google Scholar 

  68. Minkus, T., Liu, K., Ross, K.W.: Children seen but not heard: when parents compromise children’s online privacy. In: Proceedings of WWW, pp. 776–786. ACM (2015)

    Google Scholar 

  69. Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Artif. Intell. 29(3), 436–465 (2013)

    MathSciNet  Google Scholar 

  70. Monroe, B.L., Colaresi, M.P., Quinn, K.M.: Fightin’ words: lexical feature selection and evaluation for identifying the content of political conflict. Polit. Anal. 16(4), 372–403 (2008)

    Article  Google Scholar 

  71. Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4(2), 133–142 (2013)

    Article  Google Scholar 

  72. Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)

    Google Scholar 

  73. Nguyen, D.P., Gravel, R., Trieschnigg, R., Meder, T.: “how old do you think I am?” a study of language and age in Twitter. In: Proceedings of ICWSM (2013)

    Google Scholar 

  74. Nguyen, D.P., Trieschnigg, R., Doğruöz, A.S., Gravel, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING (2014)

    Google Scholar 

  75. Nguyen, M.T., Lim, E.P.: On predicting religion labels in microblogging networks. In: Proceedings of SIGIR, pp. 1211–1214. ACM (2014)

    Google Scholar 

  76. Niederhoffer, K.G., Pennebaker, J.W.: Linguistic style matching in social interaction. J. Lang. Soc. Psychol. 21(4), 337–360 (2002)

    Article  Google Scholar 

  77. Oomen, I., Leenes, R.: Privacy risk perceptions and privacy protection strategies. In: de Leeuw, E., Fischer-Hübner, S., Tseng, J., Borking, J. (eds.) IDMAN 2007. TIFIP, vol. 261, pp. 121–138. Springer, Boston, MA (2008). doi:10.1007/978-0-387-77996-6_10

    Chapter  Google Scholar 

  78. Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)

    Google Scholar 

  79. Pennacchiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. In: Proceedings of ICWSM, pp. 281–288 (2011)

    Google Scholar 

  80. Pennebaker, J.W., Stone, L.D.: Words of wisdom: language use over the life span. J. Pers. Soc. Psychol. 85(2), 291 (2003)

    Article  Google Scholar 

  81. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP, vol. 14, pp. 1532–1543 (2014)

    Google Scholar 

  82. Phelan, C., Lampe, C., Resnick, P.: It’s creepy, but it doesn’t bother me. In: Proceedings of CHI, pp. 5240–5251. ACM (2016)

    Google Scholar 

  83. Plank, B., Hovy, D.: Personality traits on TwitterorHow to get 1,500 personality tests in a week. In: Proceedings of WASSA (2015)

    Google Scholar 

  84. Postmes, T., Spears, R., Lea, M.: Breaching or building social boundaries? SIDE-effects of computer-mediated communication. Commun. Res. 25(6), 689–715 (1998)

    Article  Google Scholar 

  85. Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Proceedings of CLEF (Working Notes), pp. 716–749 (2016)

    Google Scholar 

  86. Quercia, D., Kosinski, M., Stillwell, D., Crowcroft, J.: Our Twitter profiles, our selves: predicting personality with Twitter. In: Proceedings of SocialCom, pp. 180–185. IEEE (2011)

    Google Scholar 

  87. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44. ACM (2010)

    Google Scholar 

  88. Reddy, S., Knight, K.: Obfuscating gender in social media writing. In: Proceedings of Workshop on Natural Language Processing and Computational Social Science, pp. 17–26 (2016)

    Google Scholar 

  89. Reed, P.J., Spiro, E.S., Butts, C.T.: Thumbs up for privacy?: differences in online self-disclosure behavior across national cultures. Soc. Sci. Res. 59, 155–170 (2016)

    Article  Google Scholar 

  90. Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of ACL, pp. 763–772. Association for Computational Linguistics (2011)

    Google Scholar 

  91. Rossi, L., Magnani, M.: Conversation practices and network structure in Twitter. In: Proceedings of ICWSM (2012)

    Google Scholar 

  92. Ryan, E.B., Hummert, M.L., Boich, L.H.: Communication predicaments of aging patronizing behavior toward older adults. J. Lang. Soc. Psychol. 14(1–2), 144–166 (1995)

    Article  Google Scholar 

  93. Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Ungar, L., Schwartz, H.A.: Developing age and gender predictive lexica over social media. In: Proceedings of EMNLP, pp. 1146–1151. Association for Computational Linguistics (2014)

    Google Scholar 

  94. Schnoebelen, T.J.: Emotions are relational: positioning and the use of affective linguistic resources. Ph.D. thesis, Stanford University (2012)

    Google Scholar 

  95. Schrammel, J., Köffel, C., Tscheligi, M.: Personality traits, usage patterns and information disclosure in online communities. In: Proceedings of HCI, pp. 169–174. British Computer Society (2009)

    Google Scholar 

  96. Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E.S., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)

    Article  Google Scholar 

  97. Shelton, M., Lo, K., Nardi, B.: Online media forums as separate social lives: a qualitative study of disclosure within and beyond Reddit. In: Proceedings of iConference (2015)

    Google Scholar 

  98. Snefjella, B., Kuperman, V.: Concreteness and psychological distance in natural language use. Psychol. Sci. 26(9), 1449–1460 (2015)

    Article  Google Scholar 

  99. Soderberg, C., Callahan, S., Kochersberger, A., Amit, E., Ledgerwood, A.: The effects of psychological distance on abstraction: two meta-analyses. Psychol. Bull. 141(3), 525–548 (2015)

    Article  Google Scholar 

  100. Spears, R., Lea, M.: Social influence and the influence of the “social” in computer-mediated communication. In: Lea, M. (ed.) Contexts of Computer-Mediated Communication, pp. 30–65. Harvester Wheatsheaf (1992)

    Google Scholar 

  101. Steinpreis, R.E., Anders, K.A., Ritzke, D.: The impact of gender on the review of the curricula vitae of job applicants and tenure candidates: a national empirical study. Sex Roles 41(7), 509–528 (1999)

    Article  Google Scholar 

  102. Strater, K., Lipford, H.R.: Strategies and struggles with privacy in an online social networking community. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction, vol. 1, pp. 111–119. British Computer Society (2008)

    Google Scholar 

  103. Stutzman, F., Vitak, J., Ellison, N.B., Gray, R., Lampe, C.: Privacy in interaction: exploring disclosure and social capital in Facebook. In: Proceedings of ICWSM (2012)

    Google Scholar 

  104. Tannen, D.: You Just Don’t Understand: Women and Men in Conversation. Virago, London (1991)

    Google Scholar 

  105. Tannen, D.: Gender and Conversational Interaction. Oxford University Press, Oxford (1993)

    Google Scholar 

  106. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)

    Article  Google Scholar 

  107. Tchokni, S.E., Séaghdha, D.O., Quercia, D.: Emoticons and phrases: status symbols in social media. In: Proceedings of ICWSM (2014)

    Google Scholar 

  108. Thomas, K., Grier, C., Nicol, D.M.: unFriendly: multi-party privacy risks in social networks. In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 236–252. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14527-8_14

    Chapter  Google Scholar 

  109. Trepte, S., Reinecke, L., Ellison, N.B., Quiring, O., Yao, M.Z., Ziegele, M.: A cross-cultural perspective on the privacy calculus. Soc. Media+ Soc. 3(1), 2056305116688035 (2017)

    Google Scholar 

  110. Trope, Y., Liberman, N.: Construal-level theory of psychological distance. Psychol. Rev. 117(2), 440 (2010)

    Article  Google Scholar 

  111. Volkova, S., Bachrach, Y., Armstrong, M., Sharma, V.: Inferring latent user properties from texts published in social media. In: Proceedings of AAAI, pp. 4296–4297 (2015)

    Google Scholar 

  112. Wienberg, C., Gordon, A.S.: Privacy considerations for public storytelling. In: Proceedings of ICWSM (2014)

    Google Scholar 

  113. Yaeger-Dror, M.: Religion as a sociolinguistic variable. Language and Linguistics Compass 8(11), 577–589 (2014)

    Article  Google Scholar 

  114. Youn, S., Hall, K.: Gender and online privacy among teens: risk perception, privacy concerns, and protection behaviors. Cyberpsychol. Behav. 11(6), 763–765 (2008)

    Article  Google Scholar 

  115. Zhang, K., Kizilcec, R.F.: Anonymity in social media: effects of content controversiality and social endorsement on sharing behavior. In: Proceedings of ICWSM (2014)

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers, SocInfo organizers, the Stanford Data Science Initiative, and Twitter and Gnip for providing access to part of data used in this study. This work was supported by the National Science Foundation through awards IIS-1159679 and IIS-1526745.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Jurgens .

Editor information

Editors and Affiliations

Appendices

Appendix

A Classification Metrics

Macro-averaged F1 denotes the average F1 for each class, independent of how many instances were seen for that label. Micro-averaged F1 denotes the F1 measured from all instances and is sensitive to the skew in the distribution of classes in the dataset.

B Additional Classifier Results

Table 4. Predictive accuracy for categorical attributes, reported as Micro-F1 and Macro-F1.
Table 5. Predictive accuracy for age, reported Mean Squared Error and Correlation.

C Additional Measures of Tie Strength

We initially considered two other potential proxies for tie strength based on textual analysis. First, we replicated the approach of Gilbert and Karahalios [44] which counted words occurring in ten LIWC categories to approximate intimacy in communication. Second, we attempted to measure social distance [58] by drawing upon Construal Theory [59, 110] which conjectures that individuals with low social distance typically use more concrete language, whereas those with high social distance use more abstract language [98, 99]; here, communication concreteness was measured using the word concreteness ratings of [16]. However, we found that the ratings for each approach did not match our judgments for their respective intended attributes and their use in the regression models produced non-significant results. Without ground truth for intimacy and social distance to validate their ratings, we therefore omitted these proxies based on our judgment of their unreliability to avoid drawing false conclusions about these dimensions of tie strength.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jurgens, D., Tsvetkov, Y., Jurafsky, D. (2017). Writer Profiling Without the Writer’s Text. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67256-4_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67255-7

  • Online ISBN: 978-3-319-67256-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics