Abstract
This paper studies the problem of gender prediction of users in social media using their interest tags. The challenge is that the tag feature vector is extremely sparse and short, i.e., less than 10 tags for each user. We present a novel conceptual class based method which enriches and centralizes the feature space. We first identify the discriminating tags based on the tag distribution. We then build the initial conceptual class by taking the advantage of the generalization and specification operations on these tags. For example, “Kobe” is a specialized instance of “basketball”. Finally, we model class expansion as a problem of computing the similarity between one tag and a set of tags in one conceptual class in the embedding space.
We conduct extensive experiments on a real dataset from Sina Weibo. Results demonstrate that our proposed method significantly enhances the quality of the feature space and improves the performance of gender classification. Its accuracy reaches 82.25 % while that for the original tag vector is only 62.75 %.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Alowibdi, J.S., Buy, U.A., Yu, P.: Empirical evaluation of profile characteristics for gender classification on twitter. In: Proceedings of ICMLA, pp. 365–369 (2013)
Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Sociolinguistics 18, 135–160 (2014)
Bergsma, S., Durme, B.V.: Using conceptual class attributes to characterize social media users. In: Proceedings of ACL, pp. 710–720 (2013)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of EMNLP, pp. 1301–1309 (2011)
Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from e-mails. In: Proceedings of CIDM, pp. 154–158 (2009)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-CoNLL, pp. 1478–1488 (2012)
Garera, N., Yarowsky, D.: Modeling latent biographic attributes in conversational genres. In: Proceedings of ACL and IJCNLP, pp. 710–718 (2009)
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers age and gender. In: Proceedings of ICWSM, pp. 214–217 (2009)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean., J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP, pp. 207–217 (2010)
Peersman, C., Daelemans, W., Vaerenbergh, L.V.: Predicting age and gender in online social networks. In: Proceedings of SMUC, pp. 37–44 (2011)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of SMUC, pp. 37–44 (2010)
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pp. 199–205 (2005)
Sun, X., Xiao, Y., Wang, H., Wang, W.: On conceptual labeling of a bag of words. In: Proceedings of IJCAI, pp. 1326–1332 (2015)
Tang, C., Ross, K., Saxena, N., Chen, R.: What’s in a name: a study of names, gender inference, and gender behavior in facebook. In: Proceedings of SNSMW (2011)
Acknowledgment
The work described in this paper has been supported in part by the NSFC Projects (61272275, 61572376, 61272110), the Wuhan Science and Technology Bureau “Chenguang Jihua” (2014072704011250).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhu, P., Qian, T., Zhong, M., Li, X. (2016). Inferring Users’ Gender from Interests: A Tag Embedding Approach. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9950. Springer, Cham. https://doi.org/10.1007/978-3-319-46681-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-46681-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46680-4
Online ISBN: 978-3-319-46681-1
eBook Packages: Computer ScienceComputer Science (R0)