Abstract
With the growth of social media in recent years, there has been an increasing interest in the automatic characterization of users based on the informal content they generate. In this context, the labeling of users in demographic categories, such as age, ethnicity, origin and race, among the investigation of other attributes inherent to users, such as political preferences, personality and gender expression, has received a great deal of attention, especially based on Twitter data. The present paper focuses on the task of gender classification by using 60 textual meta-attributes, commonly used on text attribution tasks, for the extraction of gender expression linguistic cues in tweets written in Portuguese. Therefore, taking into account characters, syntax, words, structure and morphology of short length, multi-genre, content free texts posted on Twitter to classify author’s gender via three different machine-learning algorithms as well as evaluate the influence of the proposed meta-attributes in this process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kaplan, A. M., Haenlein, M.: Users of the World, Unite! The Challenges and Opportunities of Social Media. Business Horizons, 53, pp. 59 – 68 (2010)
Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science, 2(1), pp. 1–8 (2011)
Tumasjan, A., Sprenger, T. O., Sandner, P. G., Welpe, I. M.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 10, pp. 178–185 (2010)
Pankong, N., Prakancharoen, S.: Combining Algorithms for Recommendation System on Twitter. Advanced Materials Research, 403, pp. 3688–3692 (2012)
Tripathy, R. M., Bagchi A., Mehta S.: A Study of Rumor Control Strategies on Social Networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM ‘10). ACM, pp. 1817–1820 (2010)
Nguyen, D., Gravel, R., Trieschnigg D., Meder T.: How Old do You Think I Am? A Study of Language and Age in Twitter. In: Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM), pp. 439–448 (2013)
Bergsma, S., Dredze M., Van Durme B., Wilson T., Yarowsky D.: Broadly Improving User Classification Via Communication-Based Name and Location Clustering on Twitter”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1010–1019 (2013)
Golbeck J., Hansen D.: Computing Political Preference among Twitter Followers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1105–1108 (2011)
Lima, A. C. E., De Castro, L. N. A Multi-Label, Semi-Supervised Classification Approach Applied to Personality Prediction in Social Media. Neural Networks, 58, pp. 122–130 (2014)
Rao D., Yarowsky D., Shreevats A., Gupta. M.: Classifying Latent User Attributes in Twitter. In: Proceedings of the 2nd. International Workshop on Search and Mining User-generated Contents (SMUC), pp. 37–44 (2010)
Burger J. D., Henderson J., Kim G., Zarrella G.: Discriminating Gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1301–1309 (2011)
Deitrick W., Miller Z., Valyou B., Dickinson B., Munson T., Hu W.: Gender Identification on Twitter Using the Modified Balanced Winnow. Communications and Network, Vol. 4 No. 3, pp. 189–195 (2012)
Ciot M., Sonderegger M., Ruths D.: Gender Inference of Twitter Users in Non-English Contexts. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1136–1145 (2013)
Filho R. M., Carvalho A. I. R., Pappa G. L.: Inferência de Sexo e Idade de Usuários no Twitter. In: Proceedings of the III Brazilian Workshop on Social Networks Analysis and Mining (BraSNAM), pp. 200–211 (2014)
Cheng N., Chandramouli R., Subbalakshmi K. P.: Author Gender Identification from Text. Digital Investigation 8, 1, July, pp. 78–88 (2011)
American Psychology Association: The Guidelines for Psychological Practice with Lesbian, Gay, and Bisexual Clients, Adopted by the APA Council of Representatives. February 18-20 (2011)
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Filho, J.A.B.L., Pasti, R., de Castro, L.N. (2016). Gender Classification of Twitter Data Based on Textual Meta-Attributes Extraction. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Mendonça Teixeira, M. (eds) New Advances in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 444. Springer, Cham. https://doi.org/10.1007/978-3-319-31232-3_97
Download citation
DOI: https://doi.org/10.1007/978-3-319-31232-3_97
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31231-6
Online ISBN: 978-3-319-31232-3
eBook Packages: EngineeringEngineering (R0)