Advertisement

Online Change Estimation Models for Dynamic Web Resources

A Case-Study of RSS Feed Refresh Strategies
  • Roxana Horincar
  • Bernd Amann
  • Thierry Artières
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7387)

Abstract

Modern web 2.0 applications have transformed the Internet into an interactive, dynamic and alive information space. Personal weblogs, commercial web sites, news portals and social media applications generate highly dynamic information streams which have to be propagated to millions of users. This article focuses on the problem of estimating the publication frequency of highly dynamic web resources. We illustrate the importance of developing efficient online estimation techniques for improving the refresh strategies of RSS feed aggregators like Google Reader [8], Datasift [7] or Roses [11]. We study the temporal publication characteristics of a large collection of real world RSS feeds and we define and evaluate several online estimation methods in cohesion with different refresh strategies. We show the benefit of using periodical source publication patterns for change estimation and we highlight the challenges imposed by the application context.

Keywords

Time Slot Publication Activity Source Publication Publication Behavior Online Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Adam, G., Bouras, C., Poulopoulos, V.: Utilizing RSS Feeds for Crawling the Web. In: 2009 Fourth International Conference on Internet and Web Applications and Services, pp. 211–216. IEEE (2009)Google Scholar
  2. 2.
    Brewington, B.E., Cybenko, G.: How dynamic is the web? Computer Networks 33(1-6), 257–276 (2000)CrossRefGoogle Scholar
  3. 3.
    Chatfield, C.: The Analysis of Time Series: An Introduction. CRC Press (2004)Google Scholar
  4. 4.
    Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. SIGMOD Rec. 29(2), 117–128 (2000)CrossRefGoogle Scholar
  5. 5.
    Cho, J., Garcia-Molina, H.: Effective page refresh policies for web crawlers. ACM Trans. Database Syst. 28(4), 390–426 (2003)CrossRefGoogle Scholar
  6. 6.
    Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Trans. Internet Technol. 3(3), 256–290 (2003)CrossRefGoogle Scholar
  7. 7.
  8. 8.
  9. 9.
    Gruhl, D., Guha, R.V., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: Feldman, S.I., Uretsky, M., Najork, M., Wills, C.E. (eds.) WWW, pp. 491–501. ACM (2004)Google Scholar
  10. 10.
    Hmedeh, Z., Vouzoukidou, N., Travers, N., Christophides, V., du Mouza, C., Scholl, M.: Characterizing Web Syndication Behavior and Content. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds.) WISE 2011. LNCS, vol. 6997, pp. 29–42. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Horincar, R., Amann, B., Artières, T.: Best-Effort Refresh Strategies for Content-Based RSS Feed Aggregation. In: Chen, L., Triantafillou, P., Suel, T. (eds.) WISE 2010. LNCS, vol. 6488, pp. 262–270. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 437–446. ACM, New York (2008)CrossRefGoogle Scholar
  13. 13.
    Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM, New York (2002)CrossRefGoogle Scholar
  14. 14.
    Pandey, S., Olston, C.: User-centric web crawling. In: WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pp. 401–411. ACM, New York (2005)CrossRefGoogle Scholar
  15. 15.
    Saporta, G.: Probabilités, analyse des données et statistique. Technip (2006)Google Scholar
  16. 16.
    Sia, K.C., Cho, J., Cho, H.-K.: Efficient monitoring algorithm for fast news alerts. IEEE Trans. on Knowl. and Data Eng. 19(7), 950–961 (2007)CrossRefGoogle Scholar
  17. 17.
    Sia, K.C., Cho, J., Hino, K., Chi, Y., Zhu, S., Tseng, B.L.: Monitoring rss feeds based on user browsing pattern. In: Proceedings of the International Conference on Weblogs and Social Media, Boulder Colorado, pp. 161–168 (March 2007)Google Scholar
  18. 18.
    Zimmer, C., Tryfonopoulos, C., Berberich, K., Koubarakis, M., Weikum, G.: Approximate Information Filtering in Peer-to-Peer Networks. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 6–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Zimmer, C., Tryfonopoulos, C., Berberich, K., Weikum, G., Koubarakis, M.: Node behavior prediction for large-scale approximate information filtering. In: 1st International Workshop on Large Scale Distributed Systems for Information Retrieval, LSDS-IR 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Roxana Horincar
    • 1
  • Bernd Amann
    • 1
  • Thierry Artières
    • 1
  1. 1.LIP6 - University Pierre et Marie CurieParisFrance

Personalised recommendations