Skip to main content

Capturing Web Dynamics by Regular Approximation

  • Conference paper
Web Information Systems – WISE 2004 (WISE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3306))

Included in the following conference series:

Abstract

Software systems like Web crawlers, Web archives or Web caches depend on or may be improved with the knowledge of update times of remote sources. In the literature, based on the assumption of an exponential distribution of time intervals between updates, diverse statistical methods were presented to find optimal reload times of remote sources. In this article first we present the observation that the time behavior of a fraction of Web data may be described more precisely by regular or quasi regular grammars. Second we present an approach to estimate the parameters of such grammars automatically. By comparing a reload policy based on regular approximation to previous exponential-distribution based methods we show that the quality of local copies of remote sources concerning ’freshness’ and the amount of lost data may be improved significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. Inter. Tech. 1(1), 2–43 (2001)

    Article  Google Scholar 

  2. Brewington, B.E., Cybenko, G.: How dynamic is the Web? Computer Networks (Amsterdam, Netherlands: 1999) 33(1-6), 257–276 (2000)

    Google Scholar 

  3. Cho, J., Ntoulas, A.: Effective change detection using sampling. In: Proceedings of the 28th VLDB Conference, Hong Kong, China (2002)

    Google Scholar 

  4. Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Trans. Inter. Tech. 3(3), 256–290 (2003)

    Article  Google Scholar 

  5. Coffman, E., Liu, Z., Weber, R.R.: Optimal robot scheduling for web search engines. Journal of Scheduling 1(1), 15–29 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. World Wide Web Consortium. W3c httpd, http://www.w3.org/Protocols/

  7. Dingle, A., Partl, T.: Web cache coherence. Computer Networks and ISDN Systems 28(7-11), 907–920 (1996)

    Article  Google Scholar 

  8. Dupont, P., Miclet, L., Vidal, E.: What is the search space of the regular inference? In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 25–37. Springer, Heidelberg (1994)

    Google Scholar 

  9. Gold, E.: Language identification in the limit. Information and Control 10, 447–474 (1967)

    Article  MATH  Google Scholar 

  10. Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: Proceedings of SIGMOD, May 2002, pp. 73–84 (2002)

    Google Scholar 

  11. Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time. In: Perez, Sanfeliu, Vidal (eds.) Pattern Recognition and Image Analysis, pp. 49–61. World Scientific, Singapore (1992)

    Chapter  Google Scholar 

  12. Parekh, R., Honavar, V.: Learning dfa from simple examples. Machine Learning 44(1/2), 9–35 (2001)

    Article  MATH  Google Scholar 

  13. Rhea, S.C., Liang, K., Brewer, E.: Value-based web caching. In: WWW 2003, pp. 619–628 (2003)

    Google Scholar 

  14. Wessels, D.: Intelligent caching for world-wide web objects. In: Proceedings of INET 1995, Honolulu, Hawaii, USA (1995)

    Google Scholar 

  15. Wolf, J.L., Squillante, M.S., Yu, P.S., Sethuraman, J., Ozsen, L.: Optimal crawling strategies for web search engines. In: Proceedings of the eleventh international conference on World Wide Web, pp. 136–147. ACM Press, New York (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kukulenz, D. (2004). Capturing Web Dynamics by Regular Approximation. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30480-7_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23894-2

  • Online ISBN: 978-3-540-30480-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics