Skip to main content

Hierarchical Clustering in Improving Microblog Stream Summarization

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7817))

Abstract

Microblogging has shown a massive increase in use over the past couple of years. According to recent statistics, Twitter (the most popular microblogging platform) has over 500 million posts per day. In order to help users manage this information overload or to assess the full information potential of microblogging streams, a few summarization algorithms have been proposed. However, they are designed to work on a stream of posts filtered on a particular keyword, whereas most streams suffer from noise or have posts referring to more than one topic. Because of this, the generated summary is incomplete and even meaningless. We approach the problem of summarizing a stream and propose adding a layer of text clustering before the summarizing step. We first identify the events users are talking about in the stream, we group posts by event and then we continue by clustering each group hierarchically. We show how, by generating an agglomerative hierarchical cluster tree based on the posts and applying a summarization algorithm, the quality of the summary improves.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alvanaki, F., Michel, S., Ramamritham, K., Weikum, G.: See what’s enblogue: real-time emergent topic identification in social media. In: Proceedings of the 15th International Conference on Extending Database Technology, EDBT 2012, pp. 336–347. ACM, New York (2012)

    Chapter  Google Scholar 

  2. Benhardus, J.: Streaming trend detection in twitter. Information Retrieval, 1–7 (2010)

    Google Scholar 

  3. Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD 2010, pp. 4:1–4:10. ACM, New York (2010)

    Google Scholar 

  4. Chakrabarti, D., Punera, K.: Event summarization using tweets. In: Proceedings of the 5th Int’l AAAI Conference on Weblogs and Social Media, ICWSM (2011)

    Google Scholar 

  5. Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, Stroudsburg, PA, USA, pp. 322–330. Association for Computational Linguistics (2010)

    Google Scholar 

  6. Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, Stroudsburg, PA, USA, pp. 340–348. Association for Computational Linguistics (2010)

    Google Scholar 

  7. Gu, H., Xie, X., Lv, Q., Ruan, Y., Shang, L.: Etree: Effective and efficient event modeling for real-time online social media networks. In: Proceedings of the, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, WI-IAT 2011, vol. 01, pp. 300–307. IEEE Computer Society, Washington, DC (2011)

    Chapter  Google Scholar 

  8. Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. In: Proceedings of the 8th International Conference on Natural Language Processing, ICON 2010. Macmillan India, Chennai (2010)

    Google Scholar 

  9. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the International Conference on Management of Data, SIGMOD 2010, pp. 1155–1158. ACM, New York (2010)

    Chapter  Google Scholar 

  10. Mosquera, A., Lloret, E., Moreda, P.: Towards facilitating the accessibility of web 2.0 texts through text normalisation. In: Proceedings of the LREC Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey, pp. 9–14 (2012)

    Google Scholar 

  11. Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using twitter. In: Proceedings of the ACM International Conference on Intelligent User Interfaces, IUI 2012, pp. 189–198. ACM, New York (2012)

    Chapter  Google Scholar 

  12. O’Connor, B., Krieger, M., Ahn, D.: TweetMotif: Exploratory Search and Topic Summarization for Twitter. In: Cohen, W.W., Gosling, S., Cohen, W.W., Gosling, S. (eds.) ICWSM, The AAAI Press (2010)

    Google Scholar 

  13. Olariu, A.: Clustering to improve microblog stream summarization. In: Proceedings of the 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) (2012) (to appear)

    Google Scholar 

  14. Sharifi, B., Hutton, M.-A., Kalita, J.: Summarizing microblogs automatically. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, Stroudsburg, PA, USA, pp. 685–688. Association for Computational Linguistics (2010)

    Google Scholar 

  15. Shorut, P.E., Fleiss, J.L.: Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 86(2), 420–428 (1979)

    Article  Google Scholar 

  16. Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 177–188. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Weng, J., Yao, Y., Leonardi, E., and Lee, F. Event Detection in Twitter. Tech. rep., HP Labs (2011)

    Google Scholar 

  18. Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing twitter feeds. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 370–378. ACM, New York (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Olariu, A. (2013). Hierarchical Clustering in Improving Microblog Stream Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37256-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37256-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37255-1

  • Online ISBN: 978-3-642-37256-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics