Skip to main content
Log in

Evaluation metrics for measuring bias in search engine results

Information Retrieval Journal Aims and scope Submit manuscript

Cite this article


Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is calculated using many features and sophisticated algorithms where search neutrality is not necessarily the focal point. Therefore, it is important to evaluate the search engine results with respect to bias. In this work we propose novel web search bias evaluation measures which take into account the rank and relevance. We also propose a framework to evaluate web search bias using the proposed measures and test our framework on two popular search engines based on 57 controversial query topics such as abortion, medical marijuana, and gay marriage. We measure the stance bias (in support or against), as well as the ideological bias (conservative or liberal). We observe that the stance does not necessarily correlate with the ideological leaning, e.g. a positive stance on abortion indicates a liberal leaning but a positive stance on Cuba embargo indicates a conservative leaning. Our experiments show that neither of the search engines suffers from stance bias. However, both search engines suffer from ideological bias, both favouring one ideological leaning to the other, which is more significant from the perspective of polarisation in our society.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. We are referring to the notion of relevance defined in the literature as system relevance, or topical relevance which is the relevance predicted by the system.

  2. We are referring to the notion of ideology perceived by the crowd workers.


  • (2018). Internetlivestats. Retrieved 2018-10-06.

  • (2018)., - pros and cons of controversial issues. Retrieved 2018-07-31.

  • (2018). Search engine statistics 2018. Retrieved 2018-10-06.

  • 99Firms (2019). Search engine statistics. Retrieved 2019-09-06.

  • Aktolga, E., & Allan, J., (2013). Sentiment diversification with different biases. Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (pp. 593–602), ACM.

  • Alam, M. A., & Downey, D. (2014). Analyzing the content emphasis of web search engines. In Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval (pp. 1083–1086), ACM.

  • Alonso, O., & Mizzaro, S. (2012). Using crowdsourcing for trec relevance assessment. Information Processing & Management, 48, 1053–1066.

    Article  Google Scholar 

  • Alonso, O., Rose, D. E., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 9–15.

    Article  Google Scholar 

  • Baeza-Yates, R. (2016). Data and algorithmic bias in the web. Proceedings of the 8th ACM Conference on Web Science (pp. 1–1), ACM.

  • Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on facebook. Science, 348, 1130–1132.

    Article  MathSciNet  Google Scholar 

  • Bargh, J. A., Gollwitzer, P. M., Lee-Chai, A., Barndollar, K., & Trötschel, R. (2001). The automated will: Nonconscious activation and pursuit of behavioral goals. Journal of Personality and Social Psychology, 81, 1014.

    Article  Google Scholar 

  • Beutel, A., Chen, J., Doshi, T., Qian, H., Wei, L., Wu, Y., Heldt, L., Zhao, Z., Hong, L., & Chi, E. H. et al. (2019). Fairness in recommendation ranking through pairwise comparisons. arXiv:1903.00780.

  • Budak, C., Goel, S., & Rao, J. M. (2016). Fair and balanced? Quantifying media bias through crowdsourced content analysis. Public Opinion Quarterly, 80, 250–271.

    Article  Google Scholar 

  • Chelaru, S., Altingovde, I. S. & Siersdorfer, S. (2012). Analyzing the polarity of opinionated queries. In European Conference on Information Retrieval (pp. 463–467), Springer.

  • Chelaru, S., Altingovde, I. S., Siersdorfer, S., & Nejdl, W. (2013). Analyzing, detecting, and exploiting sentiment in web queries. ACM Transactions on the Web (TWEB), 8, 6.

    Google Scholar 

  • Chen, X. & Yang, C. Z. (2006). Position paper: A study of web search engine bias and its assessment. IW3C2 WWW.

  • Chen, L., Ma, R., Hannák, A. & Wilson, C. (2018). Investigating the impact of gender on rank in resume search engines. In Proceedings of the 2018 chi conference on human factors in computing systems (pp. 1–14).

  • Culpepper, J. S., Diaz, F. & Smucker, M. D. (2018). Research frontiers in information retrieval: Report from the third strategic workshop on information retrieval in lorne (swirl 2018). ACM SIGIR Forum, vol. 52, pp. 46–47, ACM New York, NY, USA.

  • Demartini, G. & Siersdorfer, S. (2010). Dear search engine: what’s your opinion about...?: Sentiment analysis for semantic enrichment of web search results. In Proceedings of the 3rd International Semantic Search Workshop, (P. 4), ACM.

  • Diakopoulos, N., Trielli, D., Stark, J. & Mussenden, S. (2018). I vote for—how search informs our choice of candidate. Digital Dominance: The Power of Google, Amazon, Facebook, and Apple, M. Moore and D. Tambini (Eds.), 22.

  • Diaz, A. (2008). Through the google goggles: Sociopolitical bias in search engine design. Web search, (pp. 11–34), Springer.

  • Dutton, W. H., Reisdorf, B., Dubois, E. & Blank, G. (2017). Search and politics: The uses and impacts of search in Britain, France, Germany, Italy, Poland, Spain, and the United States.

  • Dutton, W. H., Blank, G., & Groselj, D. (2013). Cultures of the internet: the internet in Britain: Oxford Internet Survey 2013 Report. Oxford: Oxford Internet Institute.

    Google Scholar 

  • Elisa Shearer, K. E. M. (2018). News use across social media platforms 2018.

  • Epstein, R. & Robertson, R.E. (2017). A method for detecting bias in search rankings, with evidence of systematic bias related to the 2016 presidential election. Technical Report White Paper no. WP-17-02.

  • Epstein, R., & Robertson, R. E. (2015). The search engine manipulation effect (seme) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences, 112, E4512–E4521.

    Article  Google Scholar 

  • Epstein, R., Robertson, R. E., Lazer, D., & Wilson, C. (2017). Suppressing the search engine manipulation effect (seme). Proceedings of the ACM: Human–Computer Interaction, 1, 42.

    Google Scholar 

  • Fang, Y., Si, L., Somasundaram, N. & Yu, Z. (2012). Mining contrastive opinions on political texts using cross-perspective topic model. In Proceedings of the fifth ACM international conference on Web search and data mining (pp. 63–72), ACM.

  • Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J. & Dredze, M. (2010). Annotating named entities in twitter data with crowdsourcing. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 80–88, Association for Computational Linguistics.

  • Gao, R. & Shah, C. (2019). How fair can we go: Detecting the boundaries of fairness optimization in information retrieval. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (pp. 229–236).

  • Gao, R., & Shah, C. (2020). Toward creating a fairer ranking in search engine results. Information Processing & Management, 57, 102138.

    Article  Google Scholar 

  • Gentzkow, M., & Shapiro, J. M. (2010). What drives media slant? Evidence from us daily newspapers. Econometrica, 78, 35–71.

    Article  MathSciNet  Google Scholar 

  • Geyik, S. C., Ambler, S. & Kenthapadi, K. (2019). Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2221–2231).

  • Ginger, G., & David, S. (2018). Google responds to trump, says no political motive in search results. Retrieved 2018-10-06.

  • Goldman, E. (2008). Search engine bias and the demise of search engine utopianism. Web Search (pp. 121–133), Springer.

  • Grimes, D. R. (2016). Impartial journalism is laudable. but false balance is dangerous. Retrieved 2019-08-15.

  • Hu, D., Jiang, S., E. Robertson, R. & Wilson, C. (2019). Auditing the partisanship of google search snippets. The World Wide Web Conference (pp. 693–704).

  • Institute, A. P. (2014). The personal news cycle: How americans choose to get their news. Reston: American Press Institute.

    Google Scholar 

  • Kallus, N., & Zhou, A. (2019). The fairness of risk scores beyond classification: Bipartite ranking and the xauc metric. arXiv:1902.05826.

  • Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123, 32–73.

    Article  MathSciNet  Google Scholar 

  • Kulshrestha, J., Eslami, M., Messias, J., Zafar, M. B., Ghosh, S., Gummadi, K. P. & Karahalios, K. (2017). Quantifying search bias: Investigating sources of bias for political searches in social media. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing (pp. 417–432), ACM.

  • Kulshrestha, J., Eslami, M., Messias, J., Zafar, M. B., Ghosh, S., Gummadi, K. P., et al. (2018). Search bias quantification: Investigating political bias in social media and web search. Information Retrieval Journal, 22(1–2), 188–227.

    Google Scholar 

  • Lahoti, P., Garimella, K., & Gionis, A. (2018). Joint non-negative matrix factorization for learning ideological leaning on twitter. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 351–359).

  • Lawson, N., Eustice, K., Perkowitz, M. & Yetisgen-Yildiz, M. (2010). Annotating large email datasets for named entity recognition with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk (pp. 71–79), Association for Computational Linguistics.

  • Mellebeek, B., Benavent, F., Grivolla, J., Codina, J., Costa-Jussa, M. R. & Banchs, R. (2010). Opinion mining of spanish customer comments with non-expert annotations on mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on Creating speech and language data with Amazon’s mechanical turk (pp. 114–121), Association for Computational Linguistics.

  • Mowshowitz, A., & Kawaguchi, A. (2002a). Assessing bias in search engines. Information Processing & Management, 38, 141–156.

    Article  Google Scholar 

  • Mowshowitz, A., & Kawaguchi, A. (2002b). Bias on the web. Communications of the ACM, 45, 56–60.

    Article  Google Scholar 

  • Mowshowitz, A., & Kawaguchi, A. (2005). Measuring search engine bias. Information Processing & Management, 41, 1193–1205.

    Article  Google Scholar 

  • Mullainathan, S., & Shleifer, A. (2005). The market for news. American Economic Review, 95, 1031–1053.

    Article  Google Scholar 

  • Newman, N., Fletcher, R., Kalogeropoulos, A. & Nielsen, R. (2019). Reuters institute digital news report 2019, vol. 2019. Reuters Institute for the Study of Journalism.

  • Newman, N., Fletcher, R., Kalogeropoulos, A., Levy, D.A.L. & Nielsen, R. (2018). Reuters institute digital news report 2018, vol. 2018. Reuters Institute for the Study of Journalism.

  • Noble, S. U. (2018). Algorithms of Oppression: How search engines reinforce racism. New York: NYU Press.

    Book  Google Scholar 

  • Otterbacher, J., Bates, J. & Clough, P. (2017). Competent men and warm women: Gender stereotypes and backlash in image search results. In Proceedings of the 2017 chi conference on human factors in computing systems (pp. 6620–6631).

  • Otterbacher, J., Checco, A., Demartini, G. & Clough, P. (2018). Investigating user perception of gender bias in image search: The role of sexism. In The 41st International ACM SIGIR conference on research & development in information retrieval (pp. 933–936).

  • Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12, 801–823.

    Article  Google Scholar 

  • Räbiger, S., Gezici, G., Saygın, Y. & Spiliopoulou, M. (2018). Predicting worker disagreement for more effective crowd labeling. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), (pp. 179–188), IEEE.

  • Raji, I. D. & Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 429–435).

  • Robertson, R. E., Lazer, D. & Wilson, C. (2018b). Auditing the personalization and composition of politically-related search engine results pages. In Proceedings of the 2018 World Wide Web Conference (pp. 955–965).

  • Robertson, R. E., Jiang, S., Joseph, K., Friedland, L., Lazer, D., & Wilson, C. (2018a). Auditing partisan audience bias within google search. Proceedings of the ACM on Human–Computer Interaction, 2, 148.

    Article  Google Scholar 

  • Saez-Trumper, D., Castillo, C. & Lalmas, M. (2013). Social media news communities: gatekeeping, coverage, and statement bias. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management (pp. 1679–1684), ACM.

  • Sandvig, C., Hamilton, K., Karahalios, K. & Langbort, C. (2014). Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: Converting critical concerns into productive inquiry, 22.

  • Sapiezynski, P., Zeng, W., E Robertson, R., Mislove, A. & Wilson, C. (2019). Quantifying the impact of user attentionon fair group representation in ranked lists. In Companion Proceedings of The 2019 World Wide Web Conference (pp. 553–562).

  • Sarcona, C. (2019). Organic search click through rates: The numbers never lie. Retrieved 2019-09-06.

  • Stokes, P. (2019). False media balance. Retrieved 2019-09-15.

  • Su, H., Deng, J., & Fei-Fei, L. (2012). Crowdsourcing annotations for visual object detection. In Workshops at the Twenty-Sixth AAAI Conference on Artificial,. Intelligence.

  • Tavani, H. (2012). Search engines and ethics.

  • Vincent, N., Johnson, I., Sheehan, P., & Hecht, B. (2019). Measuring the importance of user-generated content to search engines. Proceedings of the International AAAI Conference on Web and Social Media, 13, 505–516.

    Google Scholar 

  • Vondrick, C., Patterson, D., & Ramanan, D. (2013). Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 101, 184–204.

    Article  Google Scholar 

  • White, R. (2013). Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (pp. 3–12), ACM.

  • Yang, K. & Stoyanovich, J. (2017). Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (p. 22), ACM.

  • Yigit-Sert, S., Altingovde, I.S. & Ulusoy, Ö. (2016). Towards detecting media bias by utilizing user comments.

  • Yuen, M. C., King, I. & Leung, K. S. (2011). A survey of crowdsourcing systems. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 766–773), IEEE.

  • Zehlike, M., Bonchi, F., Castillo, C., Hajian, S., Megahed, M. & Baeza-Yates, R. (2017). Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 1569–1578), ACM.

Download references


We thank the reviewers for their comments. This work has been funded by the EPSRC Fellowship titled “Task Based Information Retrieval”, grant reference number EP/P024289/1 and the visiting researcher programme of The Alan Turing Institute.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Gizem Gezici.

Ethics declarations

Ethical standard

Author Emine Yilmaz previously worked as a research consultant for Microsoft Research and she is currently a research consultant for Amazon Research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gezici, G., Lipani, A., Saygin, Y. et al. Evaluation metrics for measuring bias in search engine results. Inf Retrieval J 24, 85–113 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: