Abstract
Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts’ reliability metrics to quantitatively evaluate hit counts’ reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts – 99.8% precision, and skip to adopt unreliable hit counts – 74.3% precision.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Grefenstette, G.: The World Wide Web as a resource for example-based machine translation tasks. In: Proc. of the ASLIB Translating and the Computer Conf., London (1999)
Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. on Knowledge and Data Engineering 19(3), 370–383 (2007)
Turney, P.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Cimiano, P., Handschuh, S.: Towards the self-annotating web. In: Proc. of 13th Int’l World Wide Web Conf. (WWW 2004), pp. 462–471 (2004)
Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hashida, K.: POLYPHONET: An advanced social network extraction system. In: Proc. of 15th Int’l World Wide Web Conf. (2006)
Thelwall, M.: Extracting accurate and complete results from search engines: Case study windows live. J. of the American Society for Information Science and Technology 59(1), 38–50 (2007)
Thelwall, M.: Quantitative comparisons of search engine results. J. of the American Society for Information Science and Technology 59(11), 1702–1710 (2008)
Uyar, A.: Investigation of the accuracy of search engine hit counts. J. of Information Science 35(4), 469–480 (2009)
Funahashi, T., Yamana, H.: Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 114–125. Springer, Heidelberg (2010)
Huang, Z., van Harmelen, F.: Using Semantic Distances for Reasoning with Inconsistent Ontologies. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 178–194. Springer, Heidelberg (2008)
Ping, I.C., Shi-Jen, L.: Automatic keyword prediction using Google similarity distance. Expert Systems with Applications 37, 1928–1938 (2010)
Challenges in Building Large-Scale Information Retrieval Systems, http://research.google.com/people/jeff/WSDM09-keynote.pdf
Skobelsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: A combination of results caching and index pruning for high-performance Web search engines. In: Proc. of 31st SIGIR, pp. 131–138 (2008)
Barroso, B.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 22(2), 22–28 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Satoh, K., Yamana, H. (2012). Hit Count Reliability: How Much Can We Trust Hit Counts?. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_73
Download citation
DOI: https://doi.org/10.1007/978-3-642-29253-8_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)