Abstract
Web monitoring systems and meta-search engines were designed to provide time and coverage critical services, where time critical means that new information should be provided as soon as it is publicized on the web and coverage critical means that any information should not be missed by the systems. We have analyzed coverage and timeliness of three commercial search engines with the web page monitoring results to investigate how rapidly and how efficiently web monitoring system and meta-search engines collect and provide newly published web information. We have also assessed how the meta-search engines might improve coverage and timeliness by providing collective services. Our experiment results show that commercial search engines still cover 65% ~ 75% of newly published information, taking from five to 13 days to retrieve the information. Theoretically, meta-search engines discover up to 86% of all published data and shorten delay time up to 8 days.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berners-Lee, T.: Information Management: A Proposal, CERN (1989)
O’Neill, E.T., McClain, P.D., Lavoie, B.F.: A Methodology for Sampling the World Wide Web. Journal of Library Administration 34(3-4), 279–291 (2001)
Lawrence, S., Giles, C.L.: Accessibility of information on the Web. Intelligence 11(1), 32–39 (2000)
Bar-Yossef, Z., et al.: Approximating Aggregate Queries about Web Pages via Random Walks. In: VLDB 2000. 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann, Cairo, Egypt (2000)
Henzinger, M.R., et al.: On near-uniform URL sampling. In: Ninth international World Wide Web conference on Computer networks, pp. 295–308. North-Holland Publishing Co., Amsterdam, Netherlands (2000)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring Index Quality Using Random Walks on the Web. In: Eighth International World Wide Web Conference, Toronto, Canada, pp. 1291–1303 (1999)
Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for Sampling Pages Uniformly from the World Wide Web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128. North Falmouth, Massachusetts (2001)
Baykan, E., Castelberg, S.d., Henzinger, M.: A Comparison of Techniques for Sampling Web Pages. In: IIWeb 2006. Workshop on Information Integration on the Web, Edinburgh Scotland (2006)
Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science, 280 (1998)
Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public Web search engines. In: WWW7. The Seventh International World Wide Web Conference, Brisbane, Australia, pp. 379–388 (1998)
Henzinger, M.R.: Algorithmic Challenges in Web Search Engines. Internet Mathematics 1(1), 115–126 (2003)
Brewington, B.E., Cybenko, G.: How dynamic is the Web? Computer Networks 33, 257–276 (2000)
Brewington, B.E., Cybenko, G.: Keeping Up with the Changing Web. Computer 33(5), 52–58 (2000)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 200–209 (2000)
Fetterly, D., et al.: A large-scale study of the evolution of Web pages. Software - Practice and Experience 34(2), 213–237 (2004)
Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Transactions on Internet Technology (TOIT) 3(3), 256–290 (2003)
Matloff, N.: Estimation of internet file-access/modification rates from indirect data. ACM Transactions on Modeling and Computer Simulation (TOMACS) 15(3), 233–253 (2005)
Lewandowski, D., Wahlig, H., Meyer-Bautor, G.: The Freshness of Web search engine databases. Journal of Information Science 32(2), 131–148 (2006)
Park, S.S., Kim, S.K., Kang, B.H.: Web Information Management System: Personalization and Generalization. In: The IADIS International Conference WWW/Internet 2003, pp. 523–530 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, Y.S., Kang, B.H. (2007). Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-76993-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)