Skip to main content

Gender Classification of Twitter Data Based on Textual Meta-Attributes Extraction

  • Conference paper
  • First Online:
New Advances in Information Systems and Technologies

Abstract

With the growth of social media in recent years, there has been an increasing interest in the automatic characterization of users based on the informal content they generate. In this context, the labeling of users in demographic categories, such as age, ethnicity, origin and race, among the investigation of other attributes inherent to users, such as political preferences, personality and gender expression, has received a great deal of attention, especially based on Twitter data. The present paper focuses on the task of gender classification by using 60 textual meta-attributes, commonly used on text attribution tasks, for the extraction of gender expression linguistic cues in tweets written in Portuguese. Therefore, taking into account characters, syntax, words, structure and morphology of short length, multi-genre, content free texts posted on Twitter to classify author’s gender via three different machine-learning algorithms as well as evaluate the influence of the proposed meta-attributes in this process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kaplan, A. M., Haenlein, M.: Users of the World, Unite! The Challenges and Opportunities of Social Media. Business Horizons, 53, pp. 59 – 68 (2010)

    Google Scholar 

  2. Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science, 2(1), pp. 1–8 (2011)

    Google Scholar 

  3. Tumasjan, A., Sprenger, T. O., Sandner, P. G., Welpe, I. M.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 10, pp. 178–185 (2010)

    Google Scholar 

  4. Pankong, N., Prakancharoen, S.: Combining Algorithms for Recommendation System on Twitter. Advanced Materials Research, 403, pp. 3688–3692 (2012)

    Google Scholar 

  5. Tripathy, R. M., Bagchi A., Mehta S.: A Study of Rumor Control Strategies on Social Networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM ‘10). ACM, pp. 1817–1820 (2010)

    Google Scholar 

  6. Nguyen, D., Gravel, R., Trieschnigg D., Meder T.: How Old do You Think I Am? A Study of Language and Age in Twitter. In: Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM), pp. 439–448 (2013)

    Google Scholar 

  7. Bergsma, S., Dredze M., Van Durme B., Wilson T., Yarowsky D.: Broadly Improving User Classification Via Communication-Based Name and Location Clustering on Twitter”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1010–1019 (2013)

    Google Scholar 

  8. Golbeck J., Hansen D.: Computing Political Preference among Twitter Followers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1105–1108 (2011)

    Google Scholar 

  9. Lima, A. C. E., De Castro, L. N. A Multi-Label, Semi-Supervised Classification Approach Applied to Personality Prediction in Social Media. Neural Networks, 58, pp. 122–130 (2014)

    Google Scholar 

  10. Rao D., Yarowsky D., Shreevats A., Gupta. M.: Classifying Latent User Attributes in Twitter. In: Proceedings of the 2nd. International Workshop on Search and Mining User-generated Contents (SMUC), pp. 37–44 (2010)

    Google Scholar 

  11. Burger J. D., Henderson J., Kim G., Zarrella G.: Discriminating Gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1301–1309 (2011)

    Google Scholar 

  12. Deitrick W., Miller Z., Valyou B., Dickinson B., Munson T., Hu W.: Gender Identification on Twitter Using the Modified Balanced Winnow. Communications and Network, Vol. 4 No. 3, pp. 189–195 (2012)

    Google Scholar 

  13. Ciot M., Sonderegger M., Ruths D.: Gender Inference of Twitter Users in Non-English Contexts. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1136–1145 (2013)

    Google Scholar 

  14. Filho R. M., Carvalho A. I. R., Pappa G. L.: Inferência de Sexo e Idade de Usuários no Twitter. In: Proceedings of the III Brazilian Workshop on Social Networks Analysis and Mining (BraSNAM), pp. 200–211 (2014)

    Google Scholar 

  15. Cheng N., Chandramouli R., Subbalakshmi K. P.: Author Gender Identification from Text. Digital Investigation 8, 1, July, pp. 78–88 (2011)

    Google Scholar 

  16. American Psychology Association: The Guidelines for Psychological Practice with Lesbian, Gay, and Bisexual Clients, Adopted by the APA Council of Representatives. February 18-20 (2011)

    Google Scholar 

  17. Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Ahirton Batista Lopes Filho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Filho, J.A.B.L., Pasti, R., de Castro, L.N. (2016). Gender Classification of Twitter Data Based on Textual Meta-Attributes Extraction. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Mendonça Teixeira, M. (eds) New Advances in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 444. Springer, Cham. https://doi.org/10.1007/978-3-319-31232-3_97

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31232-3_97

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31231-6

  • Online ISBN: 978-3-319-31232-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics