PRISM: Profession Identification in Social Media with Personal Information and Community Structure

  • Cunchao Tu
  • Zhiyuan Liu
  • Maosong Sun
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 568)


User profession plays an important role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this paper, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges which make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework of PRofession Identification in Social Media (PRISM). It takes advantages of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidences of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate significant effectiveness of our method compared with other baseline methods.


Social Medium Personal Information Base Classifier Unlabeled Data Name Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported by the National Natural Science Foundation of China under Grant Nos. 61170196 and 61202140 and the Major Project of the National Social Science Foundation of China under Grant No. 13&ZD190.


  1. 1.
    Antoniades, D., Polakis, I., Kontaxis, G., Athanasopoulos, E., Ioannidis, S., Markatos, E.P., Karagiannis, T.: we.b: the web of short URLs. In: Proceedings of WWW, pp. 715–724 (2011)Google Scholar
  2. 2.
    Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of EMNLP, pp. 1301–1309 (2011)Google Scholar
  3. 3.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM TIST 2(3), 27 (2011)Google Scholar
  4. 4.
    Chaudhari, G., Avadhanula, V., Sarawagi, S.: A few good predictions: selective node labeling in a social network. In: Proceedings of WSDM, pp. 353–362 (2014)Google Scholar
  5. 5.
    Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., Kleinberg, J.: Echoes of power: language effects and power differences in social interaction. In: Proceedings of WWW, pp. 699–708 (2012)Google Scholar
  6. 6.
    Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PLoS ONE 6(12), e26752 (2011)CrossRefGoogle Scholar
  7. 7.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  8. 8.
    Feng, W., Wang, J.: Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. In: Proceedings of KDD, pp. 1276–1284 (2012)Google Scholar
  9. 9.
    Fink, C., Kopecky, J., Morawski, M.: Inferring gender from the content of tweets: a region specific example. In: Proceedings of ICWSM (2012)Google Scholar
  10. 10.
    Forman, G.: An extensive empirical study of feature selection metrics for text classification. JMLR 3, 1289–1305 (2003)zbMATHGoogle Scholar
  11. 11.
    Golbeck, J., Robles, C., Turner, K.: Predicting personality with social media. In: Proceedings of CHI, pp. 253–262 (2011)Google Scholar
  12. 12.
    Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Proceedings of ICWSM (2009)Google Scholar
  13. 13.
    Jacob, Y., Denoyer, L., Gallinari, P.: Learning latent representations of nodes for classifying in heterogeneous social networks. In: Proceedings WSDM, pp. 373–382 (2014)Google Scholar
  14. 14.
    Kong, X., Cao, B., Yu, P.S.: Multi-label classification by mining label and instance correlations from heterogeneous information networks. In: Proceedings of KDD, pp. 614–622 (2013)Google Scholar
  15. 15.
    Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of KDD, pp. 1023–1031 (2012)Google Scholar
  16. 16.
    Liu, Z., Tu, C., Sun, M.: Tag dispatch model with social network regularization for microblog user tag suggestion. In: Proceedings of COLING (2012)Google Scholar
  17. 17.
    McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)CrossRefGoogle Scholar
  18. 18.
    Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of twitter users. In: Proceedings of ICWSM (2011)Google Scholar
  19. 19.
    Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of WSDM, pp. 251–260 (2010)Google Scholar
  20. 20.
    Newman, M.E.: Modularity and community structure in networks. PNAS 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  21. 21.
    Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of Workshop on Search and Mining User-Generated Contents, pp. 37–44 (2010)Google Scholar
  22. 22.
    Sachan, M., Dubey, A., Srivastava, S., Xing, E.P., Hovy, E.: Spatial compactness meets topical consistency: jointly modeling links and content for community detection. In: Proceedings of WSDM, pp. 503–512 (2014)Google Scholar
  23. 23.
    Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)CrossRefGoogle Scholar
  24. 24.
    Volti, R.: An Introduction to the Sociology of Work and Occupations. Pine Forge Press, Thousand Oaks (2011)Google Scholar
  25. 25.
    Yang, S.H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., Zha, H.: Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of WWW, pp. 537–546 (2011)Google Scholar
  26. 26.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. Proc. ICML 97, 412–420 (1997)Google Scholar
  27. 27.
    Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  1. 1.State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations