Abstract
In the present article, we address the problem of automatic gender classification of web blog authors. More specifically, we employ eight widely used machine learning algorithms, in order to study the effectiveness of feature selection on improving the accuracy of gender classification. The feature ranking is performed over a set of statistical, part-of-speech tagging and language model features. In the experiments, we employed classification models based on decision trees, support vector machines and lazy-learning algorithms. The experimental evaluation performed on blog author gender classification data demonstrated the importance of language model features for this task and that feature selection significantly improves the accuracy of gender classification, regardless of the type of the machine learning algorithm used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ansari, Y.Z., Azad, S.A., Akhtar, H.: Gender classification of blog authors. Int. J. Sustain. Dev. Green Econ. 2(1) (2013). ISSN No: 2315–4721
Argamon, S., Koppel, M., Pennebaker, W., Schler, J.: Mining the Blogosphere: age, gender and the varieties of self-expression. First Monday 12, 9 (2007)
Burger, J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics, Stroudsburg (2011)
Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identification from text. Int. J. Digit. Forensics Incident Response 8(1), 78–88 (2011)
Company, J.S., Wanner, L.: How to use less features and reach better performance in author gender identification. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland (2014)
Holmgren, J., Shyu, E.: Gender Classification of Facebook Posts (2013)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Kobayashi, D., Matsumura, N., Ishizuka, M.: Automatic estimation of Bloggers’ gender. In: Proceedings of International Conference on Weblogs and Social Media (2007)
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2003)
Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Van Alstyne, M.: Life in the network: the coming age of computational social science. Science 323(5915), 721 (2009). (New York, NY)
Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M., Davalos, S., Teredesai, A., De Cock, M.: Age and Gender Identification in Social Media. Author Profiling Task at PAN (2014)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP (2010)
Peersman, C., Daelemans, W., Van Vaerenbergh, L: Predicting age and gender in online social networks. In: Proceedings of the 3rd Workshop on Search and Mining User-Generated Contents, Glasgow, UK (2011)
Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Proceedings of the 10th International Workshop on Natural Language Processing and Cognitive Science (2013)
Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the 15th Conference on Computational Natural Language Learning, pp. 78–86. Association for Computational Linguistics, Stroudsburg (2011)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan-Kaufman Series of Data Management Systems, 2nd edn. Elsevier, San Francisco (2005)
Yan, X., Yan, L.: Gender Classification of Weblog Authors. Computational Approaches to Analyzing Weblogs, AAAI (2006)
Zhang, C., Zhang, P.: Predicting gender from blog posts. Technical report. University of Massachusetts Amherst, USA (2010)
NLTK. http://www.nltk.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Aravantinou, C., Simaki, V., Mporas, I., Megalooikonomou, V. (2015). Gender Classification of Web Authors Using Feature Selection and Language Models. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)