Skip to main content

Stereotypes in Language Models

  • Chapter
  • First Online:
More than a Chatbot
  • 230 Accesses

Abstract

In this chapter we will look at the dangers and limitations that language models bring, with a focus on bias. Bias in AI in general, and regarding language models in particular, is a topic that was neglected for many years of technology development. In the recent years, after some disturbing examples of discrimination caused by bias in AI software have made it to the broad media, the topic is explored by research and finally starts getting the attention it deserves. We will also discuss other risks such as the ecological footprint or the sometimes critical working conditions behind the scenes of machine learning training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 24.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this chapter, the focus is on text-processing technologies. If you are interested in bias in AI in general, you might want to look at Dräger and Müller-Eiselt (2020) or Eubanks (2018).

  2. 2.

    Alice and Bob are typical names used in computer science as placeholders in explanations: https://en.wikipedia.org/wiki/Alice_and_Bob.

  3. 3.

    https://huggingface.co/course/chapter1/8?fw=pt. Hugging Face is a library often used by data engineers working with transformer-based models.

  4. 4.

    Originally published online at https://www.societybyte.swiss/en/2022/12/22/hi-chatgpt-are-you-biased/

  5. 5.

    Later another article reported that OpenAI paid 12.50 to the company for these services (Beuth et al. 2023).

References

  • Ahn J, Oh A (2021) Mitigating Language-Dependent Ethnic Bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 533-549).

    Google Scholar 

  • Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

    Google Scholar 

  • Beuth P, Hoffmann H, Hoppenstedt M (2023) Die Gesichter hinter der KI. Der Spiegel Nr. 29.

    Google Scholar 

  • Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.

    Google Scholar 

  • Caliskan A, Bryson JJ, Narayanan A (2017) Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

    Article  Google Scholar 

  • Crawford K (2021) Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.

    Book  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186).

    Google Scholar 

  • Dräger J, Müller-Eiselt R (2020) We Humans and the Intelligent Machines: How algorithms shape our lives and how we can make good use of them. Verlag Bertelsmann Stiftung.

    Google Scholar 

  • Eubanks V (2018) Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.

    Google Scholar 

  • Joshi P, Santy S, Budhiraja A, Bali K, Choudhury M (2020) The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 6282-6293).

    Google Scholar 

  • Perrigo B (2023) Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. The TIME. Available at https://time.com/6247678/openai-chatgpt-kenya-workers/, last accessed 23.05.2023.

  • Søraa RA (2023) AI for Diversity. CRC Press.

    Google Scholar 

  • Strubell E, Ganesh A, McCallum A (2019) Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645-3650).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kurpicz-Briki, M. (2023). Stereotypes in Language Models. In: More than a Chatbot. Springer, Cham. https://doi.org/10.1007/978-3-031-37690-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37690-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37689-4

  • Online ISBN: 978-3-031-37690-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics