Skip to main content

Learning Age and Gender Using Co-occurrence of Non-dictionary Words from Stylistic Variations

  • Conference paper
Rough Sets and Current Trends in Computing (RSCTC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6086))

Included in the following conference series:

Abstract

This work attempts to report the stylistic differences in blogging for gender and age group variations using slang word co-occurrences. We have mainly focused on co-occurrence of non dictionary words across bloggers of different gender and age groups. For this analysis, we have focused on the feature use of slang words to study the stylistic variations of bloggers across various age groups and gender. We have modeled the co-occurrences of slang words used by bloggers as graph based model where nodes are slang words and edges represent the number of cooccurrences and studied the variations in predicting age groups and gender. We have used demographically tagged blog corpus from ICWSM Spinner dataset for these experiments and used Naive Bayes classifier with 10 fold cross validations. Preliminary results shows that the concurrence of of slang words could be a better choice for predicting age and gender.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McMenamin, G.R.: Forensic Linguistics: Advances in Forensic Stylistic. CRC Press, Boca Raton (2002)

    Book  Google Scholar 

  2. Leximancer Manual V.3: Leximancer (2009), http://www.leximancer.com (last accessed on January 22, 2009)

  3. Argamon, S., Koppel, M., Avneri, G.: Routing documents according to style. In: Proc. of First Int. Workshop on Innovative Inform. Syst. (1998)

    Google Scholar 

  4. Burger, J.D., Henderson, J.C.: An exploration of observable features related to blogger age. In: Proc. of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (2006)

    Google Scholar 

  5. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: Proc. of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs (April 2006)

    Google Scholar 

  6. Yan, R.: Gender classification of weblog authors with bayesian analysis. In: Proc. of the AAAI Spring Symp. on Computational Approaches to Analyzing Weblogs (2006)

    Google Scholar 

  7. Pennebaker, J.W., Francis, M.E., Booth, R.J.: Liwc 2001, Linguistic Inquiry and Word Count (2001)

    Google Scholar 

  8. Pennebaker, J.W., Stone, L.D.: Words of wisdom: Language use over the lifespan. Journal of Personality and Social Psychology 85, 291–301 (2003)

    Article  Google Scholar 

  9. Holmes, J.: Women’s talk: The question of sociolinguistic universals. Australian Journal of Communications 20(3) (1993)

    Google Scholar 

  10. Palander-Collin, M.: Male and female styles in 17th century correspondence: I think. Language Variation and Change 11, 123–141 (1999)

    Article  Google Scholar 

  11. Herring, S.: Two variants of an electronic message schema. In: Herring, S. (ed.) Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives, vol. 11, pp. 81–106 (1996)

    Google Scholar 

  12. Patton, J.M., Can, F.: A stylometric analysis of yaşar kemal’s İnce memed tetralogy. Computers and the Humanities 38, 457–467 (2004)

    Article  Google Scholar 

  13. Can, F., Patton, J.M.: Change of writing style with time. Computers and the Humanities 38, 61–82 (2004)

    Article  Google Scholar 

  14. Simkins-Bullock, J., Wildman, B.: An investigation into relationship between gender and language Sex Roles, vol. 24. Springer, Netherlands (1991)

    Google Scholar 

  15. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4), 401–412 (2002)

    Article  Google Scholar 

  16. Corney, M., de Vel, O., Anderson, A., Mohay, G.: Gender-preferential text mining of e-mail discourse. In: 18th Annual Computer Security Appln. Conference (2002)

    Google Scholar 

  17. Brank, J., Grobelnik, M., Milic-Frayling, N., Mladenic, D.: Feature selection using support vector machines. In: Proc. of the 3rd Int. Conf. on Data Mining Methods and Databases for Engg., Finance, and Other Fields, pp. 84–89 (2002)

    Google Scholar 

  18. Rustagi, M., Prasath, R.R., Goswami, S., Sarkar, S.: Learning age and gender of blogger from stylistic variation. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 205–212. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Spinn3r: Spinn3r - indexing blogosphere, http://www.spinn3r.com (last accessed on March 01, 2009)

  20. ICWSM 2009: Icwsm 2009 (May 2009); ICWSM 2009 Spinn3r Dataset

    Google Scholar 

  21. Datta, S., Sarkar, S.: A comparative study of statistical features of language in blogs-vs-splogs. In: AND 2008: Proc. of the second workshop on Analytics for noisy unstructured text data, pp. 63–66. ACM, New York (2008)

    Chapter  Google Scholar 

  22. Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. To appear in Proc. of ICWSM (2009)

    Google Scholar 

  23. Ispell: Ispell (2009), http://www.gnu.org/software/ispell/ (last accessed on March 02, 2010)

  24. Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)

    Article  Google Scholar 

  25. Witten, I.H., Frank, E.: DataMining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  26. Estival, D., Gaustad, T., Pham, S.B., Radford, W., Hutchinson, B.: Tat: an author profiling tool with application to arabic emails. In: Proc. of the Australasian Language Technology Workshop, pp. 21–30 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prasath, R.R. (2010). Learning Age and Gender Using Co-occurrence of Non-dictionary Words from Stylistic Variations. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds) Rough Sets and Current Trends in Computing. RSCTC 2010. Lecture Notes in Computer Science(), vol 6086. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13529-3_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13529-3_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13528-6

  • Online ISBN: 978-3-642-13529-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics