Skip to main content

Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results

  • Conference paper
Web Information Systems Engineering – WISE 2007 (WISE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Included in the following conference series:

  • 1159 Accesses

Abstract

Web monitoring systems and meta-search engines were designed to provide time and coverage critical services, where time critical means that new information should be provided as soon as it is publicized on the web and coverage critical means that any information should not be missed by the systems. We have analyzed coverage and timeliness of three commercial search engines with the web page monitoring results to investigate how rapidly and how efficiently web monitoring system and meta-search engines collect and provide newly published web information. We have also assessed how the meta-search engines might improve coverage and timeliness by providing collective services. Our experiment results show that commercial search engines still cover 65% ~ 75% of newly published information, taking from five to 13 days to retrieve the information. Theoretically, meta-search engines discover up to 86% of all published data and shorten delay time up to 8 days.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T.: Information Management: A Proposal, CERN (1989)

    Google Scholar 

  2. O’Neill, E.T., McClain, P.D., Lavoie, B.F.: A Methodology for Sampling the World Wide Web. Journal of Library Administration 34(3-4), 279–291 (2001)

    Article  Google Scholar 

  3. Lawrence, S., Giles, C.L.: Accessibility of information on the Web. Intelligence 11(1), 32–39 (2000)

    Article  Google Scholar 

  4. Bar-Yossef, Z., et al.: Approximating Aggregate Queries about Web Pages via Random Walks. In: VLDB 2000. 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann, Cairo, Egypt (2000)

    Google Scholar 

  5. Henzinger, M.R., et al.: On near-uniform URL sampling. In: Ninth international World Wide Web conference on Computer networks, pp. 295–308. North-Holland Publishing Co., Amsterdam, Netherlands (2000)

    Google Scholar 

  6. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring Index Quality Using Random Walks on the Web. In: Eighth International World Wide Web Conference, Toronto, Canada, pp. 1291–1303 (1999)

    Google Scholar 

  7. Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for Sampling Pages Uniformly from the World Wide Web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128. North Falmouth, Massachusetts (2001)

    Google Scholar 

  8. Baykan, E., Castelberg, S.d., Henzinger, M.: A Comparison of Techniques for Sampling Web Pages. In: IIWeb 2006. Workshop on Information Integration on the Web, Edinburgh Scotland (2006)

    Google Scholar 

  9. Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science, 280 (1998)

    Google Scholar 

  10. Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public Web search engines. In: WWW7. The Seventh International World Wide Web Conference, Brisbane, Australia, pp. 379–388 (1998)

    Google Scholar 

  11. Henzinger, M.R.: Algorithmic Challenges in Web Search Engines. Internet Mathematics 1(1), 115–126 (2003)

    MATH  MathSciNet  Google Scholar 

  12. Brewington, B.E., Cybenko, G.: How dynamic is the Web? Computer Networks 33, 257–276 (2000)

    Article  Google Scholar 

  13. Brewington, B.E., Cybenko, G.: Keeping Up with the Changing Web. Computer 33(5), 52–58 (2000)

    Article  Google Scholar 

  14. Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 200–209 (2000)

    Google Scholar 

  15. Fetterly, D., et al.: A large-scale study of the evolution of Web pages. Software - Practice and Experience 34(2), 213–237 (2004)

    Article  Google Scholar 

  16. Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Transactions on Internet Technology (TOIT) 3(3), 256–290 (2003)

    Article  Google Scholar 

  17. Matloff, N.: Estimation of internet file-access/modification rates from indirect data. ACM Transactions on Modeling and Computer Simulation (TOMACS) 15(3), 233–253 (2005)

    Article  Google Scholar 

  18. Lewandowski, D., Wahlig, H., Meyer-Bautor, G.: The Freshness of Web search engine databases. Journal of Information Science 32(2), 131–148 (2006)

    Article  Google Scholar 

  19. Park, S.S., Kim, S.K., Kang, B.H.: Web Information Management System: Personalization and Generalization. In: The IADIS International Conference WWW/Internet 2003, pp. 523–530 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, Y.S., Kang, B.H. (2007). Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76993-4_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76992-7

  • Online ISBN: 978-3-540-76993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics