Skip to main content

Hit Count Reliability: How Much Can We Trust Hit Counts?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Abstract

Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts’ reliability metrics to quantitatively evaluate hit counts’ reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts – 99.8% precision, and skip to adopt unreliable hit counts – 74.3% precision.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grefenstette, G.: The World Wide Web as a resource for example-based machine translation tasks. In: Proc. of the ASLIB Translating and the Computer Conf., London (1999)

    Google Scholar 

  2. Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. on Knowledge and Data Engineering 19(3), 370–383 (2007)

    Article  Google Scholar 

  3. Turney, P.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Cimiano, P., Handschuh, S.: Towards the self-annotating web. In: Proc. of 13th Int’l World Wide Web Conf. (WWW 2004), pp. 462–471 (2004)

    Google Scholar 

  5. Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hashida, K.: POLYPHONET: An advanced social network extraction system. In: Proc. of 15th Int’l World Wide Web Conf. (2006)

    Google Scholar 

  6. Thelwall, M.: Extracting accurate and complete results from search engines: Case study windows live. J. of the American Society for Information Science and Technology 59(1), 38–50 (2007)

    Article  Google Scholar 

  7. Thelwall, M.: Quantitative comparisons of search engine results. J. of the American Society for Information Science and Technology 59(11), 1702–1710 (2008)

    Article  Google Scholar 

  8. Uyar, A.: Investigation of the accuracy of search engine hit counts. J. of Information Science 35(4), 469–480 (2009)

    Article  MathSciNet  Google Scholar 

  9. Funahashi, T., Yamana, H.: Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 114–125. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Huang, Z., van Harmelen, F.: Using Semantic Distances for Reasoning with Inconsistent Ontologies. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 178–194. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Ping, I.C., Shi-Jen, L.: Automatic keyword prediction using Google similarity distance. Expert Systems with Applications 37, 1928–1938 (2010)

    Article  Google Scholar 

  12. Challenges in Building Large-Scale Information Retrieval Systems, http://research.google.com/people/jeff/WSDM09-keynote.pdf

  13. Skobelsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.: ResIn: A combination of results caching and index pruning for high-performance Web search engines. In: Proc. of 31st SIGIR, pp. 131–138 (2008)

    Google Scholar 

  14. Barroso, B.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 22(2), 22–28 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Satoh, K., Yamana, H. (2012). Hit Count Reliability: How Much Can We Trust Hit Counts?. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_73

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics