Efficient Term Cloud Generation for Streaming Web Content

  • Odysseas Papapetrou
  • George Papadakis
  • Ekaterini Ioannou
  • Dimitrios Skoutas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6189)


Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources from which to get their information, they often also want to get a wider overview and glimpse of what is being reported and discussed in the news and the blogosphere. In this paper, we present an approach for supporting this discovery and exploration process by exploiting term clouds. In particular, we provide an efficient method for dynamically computing the most frequently appearing terms in the posts of monitored online sources, for time intervals specified at query time, without the need to archive the actual published content. An experimental evaluation on a large-scale real-world set of blogs demonstrates the accuracy and the efficiency of the proposed method in terms of computational time and memory requirements.


  1. 1.
    Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: SIGMOD, pp. 1047–1050 (2009)Google Scholar
  2. 2.
    Bansal, N., Koudas, N.: Blogscope: spatio-temporal analysis of the blogosphere. In: WWW, pp. 1269–1270 (2007)Google Scholar
  3. 3.
    Bansal, N., Koudas, N.: Searching the blogosphere. In: WebDB (2007)Google Scholar
  4. 4.
    Berlocher, I., Lee, K.-I., Kim, K.: TopicRank: bringing insight to users. In: SIGIR, pp. 703–704 (2008)Google Scholar
  5. 5.
    Chi, Y., Tseng, B.L., Tatemura, J.: Eigen-trend: trend analysis in the blogosphere based on singular value decompositions. In: CIKM, pp. 68–77 (2006)Google Scholar
  6. 6.
    Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: PVLDB, pp. 1530–1541 (2008)Google Scholar
  7. 7.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA, pp. 28–36 (2003)Google Scholar
  8. 8.
    He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: CIKM, pp. 1063–1072 (2008)Google Scholar
  9. 9.
    Jatowt, A., Kawai, Y., Tanaka, K.: Visualizing historical content of web pages. In: WWW, pp. 1221–1222 (2008)Google Scholar
  10. 10.
    Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294 (2003)Google Scholar
  11. 11.
    Juffinger, A., Lex, E.: Crosslanguage blog mining and trend visualisation. In: WWW, pp. 1149–1150 (2009)Google Scholar
  12. 12.
    Kendall, M., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)Google Scholar
  13. 13.
    Koutrika, G., Zadeh, Z.M., Garcia-Molina, H.: Data clouds: summarizing keyword search results over structured data. In: EDBT, pp. 391–402 (2009)Google Scholar
  14. 14.
    Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: WWW, pp. 1203–1204 (2007)Google Scholar
  15. 15.
    Leskovec, J., Backstrom, L., Kleinberg, J.M.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)Google Scholar
  16. 16.
    Manerikar, N., Palpanas, T.: Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)CrossRefGoogle Scholar
  17. 17.
    Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)Google Scholar
  18. 18.
    Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: KDD, pp. 1275–1284 (2009)Google Scholar
  19. 19.
    Platakis, M., Kotsakos, D., Gunopulos, D.: Searching for events in the blogosphere. In: WWW, pp. 1225–1226 (2009)Google Scholar
  20. 20.
    Tantono, F.I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: SSDBM, pp. 222–239 (2008)Google Scholar
  21. 21.
    Wong, R.C.-W., Fu, A.W.-C.: Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery 13, 193–217Google Scholar
  22. 22.
    Zhang, W., Yu, C.T., Meng, W.: Opinion retrieval from blogs. In: CIKM, pp. 831–840 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Odysseas Papapetrou
    • 1
  • George Papadakis
    • 1
  • Ekaterini Ioannou
    • 1
  • Dimitrios Skoutas
    • 1
  1. 1.L3S Research CenterHannoverGermany

Personalised recommendations