Abstract
Microblog messages display gender tendency to some extent, so automatic identification of gender of microblog users with message content mining techniques is studied. A novel approach is proposed to identify microblog user gender. The proposed approach extracts three types of features, i.e., characteristic item features, stylometry features and medium diversity features, from microblog messages with high gender-relatedness, and utilizes a series of pattern recognition techniques, such as feature normalization, feature selection and SVM, to detect microblogger gender. Massive experiments demonstrate that the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Köse, C., Özyurt, Ö., Amanmyradov, G.: Mining Chat Conversations for sex Identification. In: Washio, T., Zhou, Z.-H., Huang, J.Z., Hu, X., Li, J., Xie, C., He, J., Zou, D., Li, K.-C., Freire, M.M. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4819, pp. 45–55. Springer, Heidelberg (2007)
Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identification from text. Digital Investigation 8, 78–88 (2011)
Schler, J., Koppel, M., Argamon, S., et al.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205 (2006)
Miller, Z., Dickinson, B., Hu, W.: Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features. International Journal of Intelligence Science 2(24), 143–148 (2012)
Sriram, B., Fuhry, D., Demir, E., et al.: Short text classification in twitter to improve information filtering. In: SIGIR, pp. 841–842 (2010)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW 2008, pp. 91–100 (2008)
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI 2011, pp. 1776–1781 (2011)
Sun, A.: Short text classification using very few words. In: SIGIR 2012, pp. 1145–1146 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, F., Li, C., Lin, L. (2014). Identifying Gender of Microblog Users Based on Message Mining. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_54
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)