Advertisement

Statistics of Online User-Generated Short Documents

  • Giacomo Inches
  • Mark J. Carman
  • Fabio Crestani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5993)

Abstract

User-generated short documents assume an important role in online communication due to the established utilization of social networks and real-time text messaging on the Internet. In this paper we compare the statistics of different online user-generated datasets and traditional TREC collections, investigating their similarities and differences. Our results support the applicability of traditional techniques also to user-generated short documents albeit with proper preprocessing.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  2. 2.
    Codina, J., Kaltenbrunner, A., Grivolla, J., Banchs, R.E., Baeza-Yates, R.: Content analysis in web 2.0. In: 18th International World Wide Web Conference (2009)Google Scholar
  3. 3.
    Serrano, M., Flammini, A., Menczer, F.: Modeling statistical properties of written text. PLoS ONE 4(4), e5372 (2009)Google Scholar
  4. 4.
    Evert, S., Baroni, M.: zipfR: Statistical models for word frequency distributions. R package version 0.6-5 (2008)Google Scholar
  5. 5.
    Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. In: CAW 2.0 2009: Proceedings of the 1st Content Analysis in Web 2.0 Workshop, Madrid, Spain (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Giacomo Inches
    • 1
  • Mark J. Carman
    • 1
  • Fabio Crestani
    • 1
  1. 1.Faculty of InformaticsUniversity of LuganoLuganoSwitzerland

Personalised recommendations