Hit Count Reliability: How Much Can We Trust Hit Counts?

Satoh, Koh; Yamana, Hayato

doi:10.1007/978-3-642-29253-8_73

Hit Count Reliability: How Much Can We Trust Hit Counts?

Koh Satoh²⁰ &
Hayato Yamana²⁰

Conference paper

2144 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Abstract

Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts’ reliability metrics to quantitatively evaluate hit counts’ reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts – 99.8% precision, and skip to adopt unreliable hit counts – 74.3% precision.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grefenstette, G.: The World Wide Web as a resource for example-based machine translation tasks. In: Proc. of the ASLIB Translating and the Computer Conf., London (1999)
Google Scholar
Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Turney, P.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Chapter Google Scholar
Cimiano, P., Handschuh, S.: Towards the self-annotating web. In: Proc. of 13th Int’l World Wide Web Conf. (WWW 2004), pp. 462–471 (2004)
Google Scholar
Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hashida, K.: POLYPHONET: An advanced social network extraction system. In: Proc. of 15th Int’l World Wide Web Conf. (2006)
Google Scholar
Thelwall, M.: Extracting accurate and complete results from search engines: Case study windows live. J. of the American Society for Information Science and Technology 59(1), 38–50 (2007)
Article Google Scholar
Thelwall, M.: Quantitative comparisons of search engine results. J. of the American Society for Information Science and Technology 59(11), 1702–1710 (2008)
Article Google Scholar
Uyar, A.: Investigation of the accuracy of search engine hit counts. J. of Information Science 35(4), 469–480 (2009)
Article MathSciNet Google Scholar
Funahashi, T., Yamana, H.: Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 114–125. Springer, Heidelberg (2010)
Chapter Google Scholar
Huang, Z., van Harmelen, F.: Using Semantic Distances for Reasoning with Inconsistent Ontologies. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 178–194. Springer, Heidelberg (2008)
Chapter Google Scholar
Ping, I.C., Shi-Jen, L.: Automatic keyword prediction using Google similarity distance. Expert Systems with Applications 37, 1928–1938 (2010)
Article Google Scholar
Challenges in Building Large-Scale Information Retrieval Systems, http://research.google.com/people/jeff/WSDM09-keynote.pdf
Skobelsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: A combination of results caching and index pruning for high-performance Web search engines. In: Proc. of 31st SIGIR, pp. 131–138 (2008)
Google Scholar
Barroso, B.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 22(2), 22–28 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fundamental Science and Engineering, Waseda University, Bld.51-11-5, 3-4-1 Okubo Shinjuku-ku, Tokyo, Japan
Koh Satoh & Hayato Yamana

Authors

Koh Satoh
View author publications
You can also search for this author in PubMed Google Scholar
Hayato Yamana
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, The University of Adelaide, Australia
Quan Z. Sheng
College of Information Science and Engineering, Northeastern University, 110819, Shenyang, China
Guoren Wang
Aarhus University, Denmark
Christian S. Jensen
Center for Applied Informatics, Victoria University, PO Box 14428, 8001, VIC, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Satoh, K., Yamana, H. (2012). Hit Count Reliability: How Much Can We Trust Hit Counts?. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_73

Download citation

DOI: https://doi.org/10.1007/978-3-642-29253-8_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics