Abstract
During the past several years RSS-based content syndication has become a standard technique for efficiently and timely disseminating information on the web. From a data processing perspective RSS feeds are standard XML resources which are periodically refreshed by feed aggregators for generating continuous streams of items. In this article, we study the problem of information loss in the context of a content-based feed aggregation system and we propose a new best-effort refresh strategy for RSS feeds under limited bandwidth. This strategy is evaluated experimentally and compared to other state-of-the-art crawling strategies for web pages.
The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under grant ROSES (ANR-07-MDCO-011) “Really Open, Simple and Efficient Syndication”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acharya, S., Franklin, M., Zdonik, S.: Balancing push and pull for data broadcast. In: SIGMOD 1997: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, Arizona, United States, pp. 183–194. ACM, New York (1997)
Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. SIGMOD Rec. 29(2), 117–128 (2000)
Cho, J., Garcia-Molina, H.: Effective page refresh policies for web crawlers. ACM Trans. Database Syst. 28(4), 390–426 (2003)
Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Trans. Interent Techonol. 3(3), 256–290 (2003)
Cho, J., Ntoulas, A.: Effective change detection using sampling. In: VLDB 2002: Proceedings of the 28th international conference on Very Large Data Bases, Hong Kong, China, pp. 514–525. ACM, New York (2002)
google_reader, http://www.google.com/reader
Network Working Group: The atom publishing protocol, http://tools.ietf.org/html/rfc5023
Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW 2008: Proceeding of the 17th international conference on World Wide Web, Beijing, China, pp. 437–446. ACM, New York (2008)
Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: SIGMOD 2002: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin, pp. 73–84. ACM, New York (2002)
Pandey, S., Olston, C.: User-centric Web Crawling. In: WWW 2005: Proceedings of the 14th international conference on World Wide Web, Chiba, Japan, pp. 401–411. ACM, New York (2005)
Pandey, S., Ramamritham, K., Chakrabarti, S.: Monitoring the dynamic web to respond to continuous queries. In: WWW 2003: Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary, pp. 659–668. ACM, New York (2003)
peersim, http://peersim.sourceforge.net/
Sia, K.C., Cho, J., Cho, H.-K.: Efficient Monitoring Algorithm for Fast News Alerts. IEEE Trans. on Knowl. and Data Eng. 19(7), 950–961 (2007)
Silberstein, A., Terrace, J., Cooper, B.F., Ramakrishnan, R.: Feeding frenzy: selectively materializing users’ event feeds. In: SIGMOD 2010: Proceedings of the 2010 international conference on Management of data, Indianapolis, Indiana, USA, pp. 831–842. ACM, New York (2010)
yahoo_pipes, http://pipes.yahoo.com/pipes/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Horincar, R., Amann, B., Artières, T. (2010). Best-Effort Refresh Strategies for Content-Based RSS Feed Aggregation. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-17616-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17615-9
Online ISBN: 978-3-642-17616-6
eBook Packages: Computer ScienceComputer Science (R0)