Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results
Web monitoring systems and meta-search engines were designed to provide time and coverage critical services, where time critical means that new information should be provided as soon as it is publicized on the web and coverage critical means that any information should not be missed by the systems. We have analyzed coverage and timeliness of three commercial search engines with the web page monitoring results to investigate how rapidly and how efficiently web monitoring system and meta-search engines collect and provide newly published web information. We have also assessed how the meta-search engines might improve coverage and timeliness by providing collective services. Our experiment results show that commercial search engines still cover 65% ~ 75% of newly published information, taking from five to 13 days to retrieve the information. Theoretically, meta-search engines discover up to 86% of all published data and shorten delay time up to 8 days.
KeywordsSearch Engines Coverage of Search Engines Freshness of Search Engines
Unable to display preview. Download preview PDF.
- 1.Berners-Lee, T.: Information Management: A Proposal, CERN (1989)Google Scholar
- 4.Bar-Yossef, Z., et al.: Approximating Aggregate Queries about Web Pages via Random Walks. In: VLDB 2000. 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann, Cairo, Egypt (2000)Google Scholar
- 5.Henzinger, M.R., et al.: On near-uniform URL sampling. In: Ninth international World Wide Web conference on Computer networks, pp. 295–308. North-Holland Publishing Co., Amsterdam, Netherlands (2000)Google Scholar
- 6.Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring Index Quality Using Random Walks on the Web. In: Eighth International World Wide Web Conference, Toronto, Canada, pp. 1291–1303 (1999)Google Scholar
- 7.Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for Sampling Pages Uniformly from the World Wide Web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128. North Falmouth, Massachusetts (2001)Google Scholar
- 8.Baykan, E., Castelberg, S.d., Henzinger, M.: A Comparison of Techniques for Sampling Web Pages. In: IIWeb 2006. Workshop on Information Integration on the Web, Edinburgh Scotland (2006)Google Scholar
- 9.Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science, 280 (1998)Google Scholar
- 10.Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public Web search engines. In: WWW7. The Seventh International World Wide Web Conference, Brisbane, Australia, pp. 379–388 (1998)Google Scholar
- 14.Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 200–209 (2000)Google Scholar
- 19.Park, S.S., Kim, S.K., Kang, B.H.: Web Information Management System: Personalization and Generalization. In: The IADIS International Conference WWW/Internet 2003, pp. 523–530 (2003)Google Scholar