Skip to main content

Search Engine Similarity Analysis: A Combined Content and Rankings Approach

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2020 (WISE 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12343))

Included in the following conference series:

Abstract

How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habit. In this work we present a thorough analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo, which goes to great lengths to emphasize its privacy-friendly credentials. To do so, we collected search results using a comprehensive set of 300 unique queries for two time periods in 2016 and 2019, and developed a new similarity metric that leverages both the content and the ranking of search responses. We evaluated the characteristics of the metric against other metrics and approaches that have been proposed in the literature, and used it to (1) investigate the similarities of search engine results, (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services. We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.

K. Dritsa and T. Sotiropoulos—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All data, results, and source code used on our experiments are available through https://doi.org/10.5281/zenodo.3980817.

  2. 2.

    https://www.google.com/trends/topcharts.

  3. 3.

    https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/.

  4. 4.

    https://developers.google.com/custom-search/.

References

  1. Agrawal, R., Golshan, B., Papalexakis, E.: A study of distinctiveness in web results of two search engines. In: Proceedings of the 24th International Conference on World Wide Web (2015)

    Google Scholar 

  2. Agrawal, R., Golshan, B., Papalexakis, E.: Whither social networks for web search? In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)

    Google Scholar 

  3. Bailey, P., Craswell, N., White, R.W., Chen, L., Satyanarayana, A., Tahaghoghi, S.: Evaluating whole-page relevance. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)

    Google Scholar 

  4. Bar-Ilan, J., Levene, M., Mat-Hassan, M.: Dynamics of search engine rankings–a case study. In: WebDyn@ WWW (2004)

    Google Scholar 

  5. Bar-Ilan, J., Mat-Hassan, M., Levene, M.: Methods for comparing rankings of search engine results. Comput. Netw. 50(10), 1448–1463 (2006)

    Article  Google Scholar 

  6. Bar-Yossef, Z., Keidar, I., Schonfeld, U.: Do not crawl in the DUST: different URLs with similar text. ACM Trans. Web 3(1), 1–31 (2009)

    Article  Google Scholar 

  7. Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public web search engines. Comput. Netw. ISDN Syst. 30(1), 379–388 (1998)

    Article  Google Scholar 

  8. Bian, J., Liu, T.Y., Qin, T., Zha, H.: Ranking with query-dependent loss for web search. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (2010)

    Google Scholar 

  9. Cardoso, B., Magalhães, J.: Google, Bing and a new perspective on ranking similarity. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)

    Google Scholar 

  10. Chen, D., Chen, W., Wang, H., Chen, Z., Yang, Q.: Beyond ten blue links: enabling user click modeling in federated web search. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (2012)

    Google Scholar 

  11. Chu, H., Rosenthal, M.: Search engines for the world wide web: a comparative study and evaluation methodology. In: Proceedings of the ASIS Annual Meeting, vol. 33 (1996)

    Google Scholar 

  12. Collier, J.H., Konagurthu, A.S.: An information measure for comparing top k lists. In: 2014 IEEE 10th International Conference on e-Science, vol. 1 (2014)

    Google Scholar 

  13. Cutrell, E., Guan, Z.: What are you looking for?: An eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2007)

    Google Scholar 

  14. Ding, W., Marchionini, G.: A comparative study of web search service performance. Proc. ASIS Ann. Meet. 33, 136–142 (1996)

    Google Scholar 

  15. DuckDuckGo: DuckDuckGo sources (2019). https://help.duckduckgo.com/results/sources/. Accessed 07 Aug 2019

  16. Enge, E., Spencer, S., Fishkin, R., Stricchiola, J.: The Art of SEO. O’Reilly Media, Inc., Sebastopol (2012)

    Google Scholar 

  17. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top \(k\) lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)

    Article  MathSciNet  Google Scholar 

  18. StatCounter GlobalStats: Statcounter globalstats (2019). http://gs.statcounter.com. Accessed 06 Aug 2019

  19. Gordon, M., Pathak, P.: Finding information on the world wide web: the retrieval effectiveness of search engines. Inf. Process. Manag. 35(2), 141–180 (1999)

    Article  Google Scholar 

  20. Hannak, A., et al.: Measuring personalization of web search. In: Proceedings of the 22nd International Conference on World Wide Web. ACM (2013)

    Google Scholar 

  21. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)

    Article  Google Scholar 

  22. Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web. ACM (2010)

    Google Scholar 

  23. Lee, S.H., Kim, S.J., Hong, S.H.: On URL normalization. In: Gervasi, O., et al. (eds.) ICCSA 2005. LNCS, vol. 3481, pp. 1076–1085. Springer, Heidelberg (2005). https://doi.org/10.1007/11424826_115

    Chapter  Google Scholar 

  24. Maxwell, D., Azzopardi, L., Moshfeghi, Y.: A study of snippet length and informativeness: behaviour, performance and user experience. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (2017)

    Google Scholar 

  25. Ronald, S.: More distance functions for order-based encodings. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), May 1998

    Google Scholar 

  26. Sachse, J.: The influence of snippet length on user behavior in mobile web search. Aslib J. Inf. Manag. 71(3), 325–343 (2019)

    Article  Google Scholar 

  27. The Economist: Seek and you shall find: Google rewards reputable reporting, not left-wing politics, June 2019. https://www.economist.com/graphic-detail/2019/06/08/google-rewards-reputable-reporting-not-left-wing-politics

  28. Vaughan, L.: New measurements for search engine evaluation proposed and tested. Inf. Process. Manag. 40(4), 677–691 (2004)

    Article  Google Scholar 

  29. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, New York (2005). https://doi.org/10.1007/0-387-27656-4

    Book  MATH  Google Scholar 

  30. Wang, Y., et al.: Optimizing whole-page presentation for web search. ACM Trans. Web 12(3), 1–25 (2018)

    Article  Google Scholar 

  31. Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. (TOIS) 28(4), 1–38 (2010)

    Article  Google Scholar 

  32. Winkler, W.E.: String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (1990)

    Google Scholar 

  33. Zaragoza, H., Cambazoglu, B.B., Baeza-Yates, R.: Web search solved?: All result rankings the same? In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the European Union’s Horizon 2020 research and innovation program “FASTEN” under grant agreement No. 825328.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Konstantina Dritsa , Thodoris Sotiropoulos , Haris Skarpetis or Panos Louridas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dritsa, K., Sotiropoulos, T., Skarpetis, H., Louridas, P. (2020). Search Engine Similarity Analysis: A Combined Content and Rankings Approach. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62008-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62007-3

  • Online ISBN: 978-3-030-62008-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics