Advertisement

Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results

  • Yang Sok Kim
  • Byeong Ho Kang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4831)

Abstract

Web monitoring systems and meta-search engines were designed to provide time and coverage critical services, where time critical means that new information should be provided as soon as it is publicized on the web and coverage critical means that any information should not be missed by the systems. We have analyzed coverage and timeliness of three commercial search engines with the web page monitoring results to investigate how rapidly and how efficiently web monitoring system and meta-search engines collect and provide newly published web information. We have also assessed how the meta-search engines might improve coverage and timeliness by providing collective services. Our experiment results show that commercial search engines still cover 65% ~ 75% of newly published information, taking from five to 13 days to retrieve the information. Theoretically, meta-search engines discover up to 86% of all published data and shorten delay time up to 8 days.

Keywords

Search Engines Coverage of Search Engines Freshness of Search Engines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berners-Lee, T.: Information Management: A Proposal, CERN (1989)Google Scholar
  2. 2.
    O’Neill, E.T., McClain, P.D., Lavoie, B.F.: A Methodology for Sampling the World Wide Web. Journal of Library Administration 34(3-4), 279–291 (2001)CrossRefGoogle Scholar
  3. 3.
    Lawrence, S., Giles, C.L.: Accessibility of information on the Web. Intelligence 11(1), 32–39 (2000)CrossRefGoogle Scholar
  4. 4.
    Bar-Yossef, Z., et al.: Approximating Aggregate Queries about Web Pages via Random Walks. In: VLDB 2000. 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann, Cairo, Egypt (2000)Google Scholar
  5. 5.
    Henzinger, M.R., et al.: On near-uniform URL sampling. In: Ninth international World Wide Web conference on Computer networks, pp. 295–308. North-Holland Publishing Co., Amsterdam, Netherlands (2000)Google Scholar
  6. 6.
    Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring Index Quality Using Random Walks on the Web. In: Eighth International World Wide Web Conference, Toronto, Canada, pp. 1291–1303 (1999)Google Scholar
  7. 7.
    Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for Sampling Pages Uniformly from the World Wide Web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128. North Falmouth, Massachusetts (2001)Google Scholar
  8. 8.
    Baykan, E., Castelberg, S.d., Henzinger, M.: A Comparison of Techniques for Sampling Web Pages. In: IIWeb 2006. Workshop on Information Integration on the Web, Edinburgh Scotland (2006)Google Scholar
  9. 9.
    Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science, 280 (1998)Google Scholar
  10. 10.
    Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public Web search engines. In: WWW7. The Seventh International World Wide Web Conference, Brisbane, Australia, pp. 379–388 (1998)Google Scholar
  11. 11.
    Henzinger, M.R.: Algorithmic Challenges in Web Search Engines. Internet Mathematics 1(1), 115–126 (2003)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Brewington, B.E., Cybenko, G.: How dynamic is the Web? Computer Networks 33, 257–276 (2000)CrossRefGoogle Scholar
  13. 13.
    Brewington, B.E., Cybenko, G.: Keeping Up with the Changing Web. Computer 33(5), 52–58 (2000)CrossRefGoogle Scholar
  14. 14.
    Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 200–209 (2000)Google Scholar
  15. 15.
    Fetterly, D., et al.: A large-scale study of the evolution of Web pages. Software - Practice and Experience 34(2), 213–237 (2004)CrossRefGoogle Scholar
  16. 16.
    Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Transactions on Internet Technology (TOIT) 3(3), 256–290 (2003)CrossRefGoogle Scholar
  17. 17.
    Matloff, N.: Estimation of internet file-access/modification rates from indirect data. ACM Transactions on Modeling and Computer Simulation (TOMACS) 15(3), 233–253 (2005)CrossRefGoogle Scholar
  18. 18.
    Lewandowski, D., Wahlig, H., Meyer-Bautor, G.: The Freshness of Web search engine databases. Journal of Information Science 32(2), 131–148 (2006)CrossRefGoogle Scholar
  19. 19.
    Park, S.S., Kim, S.K., Kang, B.H.: Web Information Management System: Personalization and Generalization. In: The IADIS International Conference WWW/Internet 2003, pp. 523–530 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yang Sok Kim
    • 1
  • Byeong Ho Kang
    • 1
  1. 1.School of Computing, University of Tasmania, Private Bag 100 Hobart TAS 7001Australia

Personalised recommendations