Skip to main content
Log in

A multi-source integration framework for user occupation inference in social media systems

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the rapid development of social media applications, lots of users are connected with friends online, and their daily life and opinions are recorded. Social media provides us an unprecedented way to collect and analyze billions of users’ information. Proper user attribute identification or profile inference becomes increasingly attractive and feasible. However, the flourishing social records also pose great challenge in effective feature selection and integration for user profile inference, which is mainly caused by the text diversity and complex community structures. In this paper, we propose a comprehensive framework to infer the user occupation from his/her social activities recorded in the micro-blog system, which is a multi-source integration framework that combines both content and network information. We first identify some beneficial content features, and propose a machine learning classification model, named content model. We proceed to exploit the social network information, which tailors a community discovery based latent dimension solution to extract community-based feature, and utilizes the neighbor predictions for inference updating. Extensive empirical studies are conducted on a large real-life micro-blog dataset. The experimental results demonstrate the superiority of our integrated model for the occupation inference task, verify the effect of homophily in user interaction records, and reveal different effects of heterogeneous interactive networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahn, D., Kim, T.J., Hyun, S., Lee, D.: Inferring user interest using familiarity and topic similarity with social neighbors in facebook. In: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, pp. 196–200. IEEE (2012)

  2. Backstrom, L., Sun, E., Marlow, C.: Find me if you can: improving geographical prediction with social and spatial proximity. In: Proceedings of the 19th international conference on World wide web, pp. 61–70. ACM (2010)

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. mach. Learn. res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P.: Measuring user influence in twitter: The million follower fallacy. In: Proceedings of the 4th international aaai conference on weblogs and social media (icwsm), pp. 10–17. AAAI (2010)

  5. Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E.: predict depression from social media. In: Proceedings of the 17th international AAAI conference on Webblogs and Social Media, pp. 128–137. AAAI (2013)

  6. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. rev. E 70 (6), 066–111 (2004)

    Article  Google Scholar 

  7. Conover, M.D., Ratkiewicz, J., Francisco, M., Gonċalves, B., Flammini, A., Menczer, F.: Political polarization on twitter. In: Proceedings 5th Intl. Conference on Weblogs and Social Media (2011)

  8. Girvan, M., Newman, M.E.: Community structure in social and biological networks, Vol. 99 (2002)

  9. Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. Comput.-aided des. Integr. Circ. syst., ieee trans. 11 (9), 1074–1085 (1992)

    Article  Google Scholar 

  10. Han, J.: Mining heterogeneous information networks by exploring the power of links. In: Discovery Science, pp. 13–30. Springer (2009)

  11. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc (1999)

  12. Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1023–1031. ACM (2012)

  13. McAuley, J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: Proceedings of the 22th international conference on World Wide Web, pp. 897–908. ACM (2013)

  14. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Annu. Rev. Sociol., 415–444 (2001)

  15. Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the third ACM international conference on Web search and data mining, pp. 251–260. ACM (2010)

  16. Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Phys. rev. E 74 (3), 036–104 (2006)

    Article  Google Scholar 

  17. Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103 (23), 8577–8582 (2006)

    Article  Google Scholar 

  18. Otterbacher, J.: Inferring gender of movie reviewers: exploiting writing style, content and metadata. In: Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 369–378. ACM (2010)

  19. Pennacchiotti, M., Popescu, A.M.: Democrats, republicans and starbucks afficionados: user classification in twitter. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 430–438. ACM (2011)

  20. Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11 (3), 430–452 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  21. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents, pp. 37–44. ACM (2010)

  22. Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI mag. 29 (3), 93 (2008)

    Google Scholar 

  23. Sun, Y., Norick, B., Han, J., Yan, X., Yu, P.S., Yu, X.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1348–1356. ACM (2012)

  24. Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 817–826. ACM (2009)

  25. Tinati, R., Carr, L., Hall, W., Bentwood, J.: Identifying communicator roles in twitter. In: Proceedings of the 21st international conference companion on World Wide Web, pp. 1161–1168. ACM (2012)

  26. Wang, X., Yu, L., Yao, J., Cui, B.: A multiple feature integration to infer occupation from social media records. In: Proceedings of the 14th international conference on Web Information Systems Engineering, pp. 137–150. Springer (2013)

  27. Weinsberg, U., Bhagat, S., Ioannidis, S., Taft, N.: Blurme: inferring and obfuscating user gender based on ratings. In: Proceedings of the sixth ACM conference on Recommender systems, pp. 195–202. ACM (2012)

  28. Wen, Z., Lin, C.Y.: Improving user interest inference from social neighbors. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 1001–1006. ACM (2011)

  29. Yang, S.H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., Zha, H.: Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th international conference on World wide web, pp. 537–546. ACM (2011)

  30. Zuo, Y., Wang, J., You, F.: Personal user or organizational user? behavior on microblog can tell. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining, pp. 706–707. IEEE (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Cui.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Yu, L., Wang, X. et al. A multi-source integration framework for user occupation inference in social media systems. World Wide Web 18, 1247–1267 (2015). https://doi.org/10.1007/s11280-014-0300-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0300-6

Keywords

Navigation