Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

De, Suparna; Jangra, Shalini; Agarwal, Vibhor; Johnson, Jon; Sastry, Nishanth

doi:10.1007/978-981-99-7184-8_6

Suparna De⁶,
Shalini Jangra⁶,
Vibhor Agarwal⁶,
Jon Johnson⁷ &
…
Nishanth Sastry⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1123))

258 Accesses

Abstract

Computational analyses driven by Artificial Intelligence (AI)/Machine Learning (ML) methods to generate patterns and inferences from big datasets in computational social science (CSS) studies can suffer from biases during the data construction, collection and analysis phases as well as encounter challenges of generalizability and ethics. Given the interdisciplinary nature of CSS, many factors such as the need for a comprehensive understanding of different facets such as the policy and rights landscape, the fast-evolving AI/ML paradigms and dataset-specific pitfalls influence the possibility of biases being introduced. This chapter identifies challenges faced by researchers in the CSS field and presents a taxonomy of biases that may arise in AI/ML approaches. The taxonomy mirrors the various stages of common AI/ML pipelines: dataset construction and collection, data analysis and evaluation. By detecting and mitigating bias in AI, an active area of research, this chapter seeks to highlight practices for incorporating responsible research and innovation into CSS practices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Ethics of Algorithms: Key Problems and Solutions

The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool

Article Open access 27 May 2024

The ethics of algorithms: key problems and solutions

Article Open access 20 February 2021

Notes

1.
https://eugdpr.org/.
2.
https://facctconference.org.

References

Shah DV, Cappella JN, Neuman WR (2015) Big data, digital media, and computational social science: possibilities and perils. Ann Am Acad Politic Soc Sci 659(1):6–13. https://doi.org/10.1177/0002716215572084
Article Google Scholar
De S, Jassat U, Grace A, Wang W, Moessner K (2022) Mining composite spatio-temporal lifestyle patterns from geotagged social data. In: IEEE international conferences on internet of things (iThings) and IEEE green computing & communications (GreenCom) and IEEE cyber, physical & social computing (CPSCom) and IEEE smart data (SmartData) and IEEE congress on cybermatics (Cybermatics). Espoo, Finland, pp 444–451
Google Scholar
Leslie D (2022) Don’t “research fast and break things": on the ethics of computational social science. arXiv, abs/2206.06370
Google Scholar
Ramya Srinivasan R, Chander A (2021) Biases in AI systems: a survey for practitioners. ACM Queue 19(2)
Google Scholar
De S, Moss H, Johnson J, Li J, Pereira H, Jabbari S (2022) Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires. IASSIST Quart 46(1)
Google Scholar
Sharifian-Attar De S, Jabbari S, Li J, Moss H, Johnson J (2022) Analysing longitudinal social science questionnaires: topic modelling with BERT-based embeddings. In: Proceedings of 2022 ieee international conference on big data, Osaka, Japan, 2022, pp 5558–5567. https://doi.org/10.1109/BigData55660.2022.10020678
Goodman A, Brown M, Silverwood RJ, Sakshaug JW, Calderwood L, Williams J, Ploubidis George B (2022) The impact of using the Web in a mixed-mode follow-up of a longitudinal birth cohort study: evidence from the national child development study. J Roy Stat Soc: Ser A (Stat Soc) 185(3):822–850
Google Scholar
Herzog L (2021) Algorithmic bias and access to opportunities. In: Véliz C (ed) The oxford handbook of digital ethics. https://doi.org/10.1093/oxfordhb/9780198857815.013.21
Spencer EA, Heneghan C (2017) Catalogue of bias collaboration. In: Catalogue of bias. https://catalogofbias.org/biases/
Gebru T, Morgenstern J, Vecchione B, Wortman Vaughan J, Wallach H, Daumé III H, Crawford K (2021) Datasheets for datasets. Commun ACM 64(12):86–92. https://doi.org/10.1145/3458723
Zhang BH, Lemoine B, Mitchell M (2018) mitigating unwanted biases with adversarial learning. In: Artificial intelligence, ethics, and society conference
Google Scholar
Cofone IN (2019) Algorithmic discrimination is an information problem. Hastings Law J 70:1389–1444
Google Scholar
Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal ME, ... Staab S (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Interdiscip Rev: Data Min Knowl Discov 10(3): e1356
Google Scholar
Hajian S (2013) Simultaneous discrimination prevention and privacy protection in data publishing and mining. arXiv:1306.6805
Fish B, Kun J, Lelkes ÁD (2016) A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM international conference on data mining. Society for Industrial and Applied, pp 144–152
Google Scholar
Kamishima T, Akaho S, Sakuma J (2021) Fairness-aware learning through regularization approach. In: 2011 IEEE 11th international conference on data mining workshops. IEEE, pp 643–650
Google Scholar
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29
Google Scholar
Celis LE, Huang L, Keswani V, Vishnoi NK (2019) Classification with fairness constraints: a meta-algorithm with provable guarantees. In: Proceedings of the conference on fairness, accountability, and transparency, pp 319–328
Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 35–50
Google Scholar
Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H (2018) A reductions approach to fair classification. In: International conference on machine learning. PMLR, pp 60–69
Google Scholar
Canetti R, Cohen A, Dikkala N, Ramnarayan G, Scheffler S, Smith A (2019) From soft classifiers to hard decisions: how fair can we be?. In: Proceedings of the conference on fairness, accountability, and transparency, pp 309–318
Google Scholar
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the 2009 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 581–592
Google Scholar
Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: 2009 IEEE international conference on data mining workshops, pp 13–18
Google Scholar
Wallach H (2018) Computational social science \(\ne \) computer science \(+\) social data. Commun ACM 61(3):42–44
Google Scholar
Garcia M (2017) Racist in the machine: the disturbing implications of algorithmic bias. World Policy J 33(4):111–117
Google Scholar
Zhao Q, Adeli E, Pohl KM (2020) Training confounder-free deep learning models for medical applications. Nat Commun 11(1):1–9
Google Scholar
Jager KJ, Zoccali C, Macleod A, Dekker FW (2008) Confounding: what it is and how to deal with it. Kidney Int 73(3):256–260
Google Scholar
Schwind C, Buder J (2012) Reducing confirmation bias and evaluation bias: when are preference-inconsistent recommendations effective-and when not?. Comput Hum Behav 28(6):280–2290
Google Scholar
Shadowen N (2019) Ethics and bias in machine learning: a technical study of what makes us “good”. The transhumanism handbook. Springer, Cham, pp 247–261
Google Scholar
Shankar S, Halpern Y, Breck E, Atwood J, Wilson J, Sculley D (2017) No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv:1711.08536
Ghili S, Kazemi E, Karbasi A (2019) Eliminating latent discrimination: train then mask. Proc AAAI Conf Artif Intell 33(01): 3672–3680
Google Scholar
He M, Hu X, Li C, Chen X, Wang J (2022) Mitigating confounding bias for recommendation via counterfactual inference. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML-PKDD22)
Google Scholar
Liu D, Cheng P, Zhu H, Dong Z, He X, Pan W, Ming Z (2021) Mitigating confounding bias in recommendation via information bottleneck. In: Fifteenth ACM conference on recommender systems, pp 351–360
Google Scholar
Gnjatović M, Maček N, Adamović S (2020) Putting humans back in the loop: a study in human-machine cooperative learning. Acta Polytech Hungarica 17(2)
Google Scholar
Demartini G, Mizzaro S, Spina D (2020) Human-in-the-loop artificial intelligence for fighting online misinformation: challenges and opportunities. IEEE Data Eng Bull 43(3):65–74
Google Scholar
Agarwal V, Joglekar S, Young AP, Sastry N (2022) GraphNLI: a graph-based natural language inference model for polarity prediction in online debates. In: Proceedings of the ACM web conference 2022, pp 2729–2737
Google Scholar
Young AP, Joglekar S, Agarwal V, Sastry N (2022) Modelling online debates with argumentation theory. ACM SIGWEB newsletter, (Spring), pp 1–9
Google Scholar
Agarwal V, Young AP, Joglekar S, Sastry N (2022) A graph-based context-aware model to understand online conversations. arxiv:2211.09207
Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 1336–1350
Google Scholar
Akhtar S, Basile V, Patti V (2020) Modeling annotator perspective and polarized opinions to improve hate speech detection. In: Proceedings of the AAAI conference on human computation and crowdsourcing, pp 151–154
Google Scholar
Aroyo L, Dixon L, Thain N, Redfield O, Rosen R (2019) Crowdsourcing subjective tasks: the case study of understanding toxicity in online discussions. In: Companion proceedings of the 2019 World Wide Web conference, pp 1100–1105
Google Scholar
Sheng VS, Zhang J, Gu B, Wu X (2017) Majority voting and pairing with multiple noisy labeling. IEEE Trans Knowl Data Eng 1355–1368
Google Scholar
Wilms R, Mäthner E, Winnen L, Lanwehr R (2021) Omitted variable bias: a threat to estimating causal relationships. Methods Psychol 5:2021
Article Google Scholar
Nikolov D, Oliveira DF, Flammini A, Menczer F (2015) Measuring online social bubbles. Peer J Comput Sci 1:e38
Article Google Scholar
Ciampaglia GL, Menczer F (2018) Misinformation and biases infect social media, both intentionally and accidentally. The Conversation, 20
Google Scholar
Chen J, Nairn R, Nelson L, Bernstein M, Chi E (2010) Short and tweet: experiments on recommending content from information streams. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’10), New York, NY, USA, pp 1185–1194
Google Scholar
Olteanu A, Castillo C, Diaz F, Kıcıman E (2019) Social data: biases, methodological pitfalls, and ethical boundaries. Front Big Data 2
Google Scholar
Cohen R, Ruths D (2013) Classifying political orientation on twitter: It’s not easy!. Proc Int AAAI Conf Web Soc Media 7(1):91–99
Google Scholar
Lazer D, Kennedy R, King G, Vespignani A (2014) The parable of google flu: traps in big data analysis. Science 343(6176):1203–1205
Article Google Scholar
Naveed N, Gottron T, Kunegis J, Alhadi AC (2011) Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, New York, pp 183–188
Google Scholar
Gong W, Lim E-P, Zhu F, Cher PH (2016) On unravelling opinions of issue specific-silent users in social media. In: Proceedings of the international AAAI conference on web and social media, Cologne
Google Scholar
Das S, Kramer A (2013) Self-censorship on facebook. In: Proceedings of the international AAAI conference on web and social media, Boston, MA
Google Scholar
Wang Y, Norcie G, Komanduri S, Acquisti A, Leon PG, Cranor LF (2011) ‘i regretted the minute i pressed share’: a qualitative study of regrets on facebook. In: Proceedings of the seventh symposium on usable privacy and security, SOUPS ’11, New York, NY, pp 10:1–10:16
Google Scholar
Tasse D, Liu Z, Sciuto A, Hong J (2017) State of the geotags: motivations and recent changes. In: Proceedings of the international AAAI conference on web and social media, Montreal, QC
Google Scholar
Hecht B, Stephens M (2014) A tale of cities: urban biases in volunteered geographic information. In: Proceedings of the international AAAI conference on web and social media, Ann Arbor, M
Google Scholar
Salganik MJ (2017) Bit by bit: Social research in the digital age. Princeton University Press, Princeton, NJ
Google Scholar
Lampe C, Ellison NB, Steinfield C (2008) Changes in use and perception of Facebook. In: Proceedings of the 2008 ACM conference on computer supported cooperative work, CSCW’08. New York, NY, pp 721–730
Google Scholar
Liu Y, Kliman-Silver C, Mislove A (2014) The tweets they are a-changin’: evolution of twitter users and behavior. In: Proceedings of the international AAAI conference on web and social media, Ann Arbor, MI
Google Scholar
Danescu-Niculescu-Mizil C, West R, Jurafsky D, Leskovec J, Potts C (2013) No country for old members: user lifecycle and linguistic change in online communities. In: Proceedings of the 22nd international conference on world wide web,WWW’13. New York, NY, pp 307–318
Google Scholar
Resnick P, Garrett RK, Kriplean T, Munson SA, Stroud NJ (2013) Bursting your (filter) bubble: strategies for promoting diverse exposure. In: Proceedings of the 2013 conference on computer supported cooperative work companion, CSCW’13. New York, NY, pp 95–100
Google Scholar
Van Binh T, Minh D, Linh L, Van Nhan T (2023) Location-based service information disclosure on social networking sites: the effect of privacy calculus, subjective norms, trust, and cultural difference. Inf Serv & Use. 1–25
Google Scholar
Newell ET, Dimitrov S, Piper A, Van Ruths D (2021) To buy or to read: how a platform shapes reviewing behavior. In: Proceedings of international conference on web and social media (ICWSM)
Google Scholar
D’Alessio D, Allen M (2000) Media bias in presidential elections: a metaanalysis. J Commun 50:133–156
Article Google Scholar
Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: a case study of African-American English. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 1119–1130
Google Scholar
Liang H, Fu K-W (2015) Testing propositions derived from twitter studies: generalization and replication in computational social science. PLoS ONE 10:e0134270
Article Google Scholar
White RW (2016) Interactions with search systems. Cambridge University Press, Cambridge
Book Google Scholar
Radford J, Joseph K (2020) Theory in, theory out: the uses of social theory in machine learning for social science. Front Big Data 3:18
Article Google Scholar
Cerqueira V, Torgo L, Smailović J, Mozetič I (2017) A comparative study of performance estimation methods for time series forecasting. In: 2017 IEEE international conference on data science and advanced analytics (DSAA)8. IEEE, pp 529–53
Google Scholar
Guest E, Vidgen B, Mittos A, Sastry N, Tyson G, Margetts H (2021) An expert annotated dataset for the detection of online misogyny. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, Association for Computational Linguistics, pp 1336–1350
Google Scholar
Agarwal P, Hawkins O, Amaxopoulou M, Dempsey N, Sastry N, Wood E (2021) Hate speech in political discourse: a case study of UK MPs on twitter. In: Proceedings of the 32nd ACM conference on hypertext and social media (HT ’21). New York, NY, USA, pp 5–16
Google Scholar
Zia HB, Raman A, Castro I, Anaobi IH, Cristofaro ED, Sastry N, Tyson G (2022) Toxicity in the decentralized web and the potential for model sharing. In: Proceedings of ACM measurement and analysis of computing system vol 6, 2, Article 35
Google Scholar
Vidgen B, Thrush T, Waseem Z, Kiela D (2021) Learning from the worst: dynamically generated datasets to improve online hate detection. arXiv:2012.15761
Yin W, Agarwal V, Jiang A, Zubiaga A, Sastry N (2023) AnnoBERT: effectively representing multiple annotators’ label choices to improve hate speech detection. Accepted In: The 17th international AAAI conference on web and social media (ICWSM)
Google Scholar

Download references

Acknowledgements

This research is funded by the UKRI Strategic Priority Fund as part of the wider Protecting Citizens Online programme (Grant number: EP/W032473/1) associated with the National Research Centre on Privacy, Harm Reduction and Adversarial Influence Online (REPHRAIN), and by the Science and Technology Facilities Council (STFC) DiRAC-funded “Understanding the multiple dimensions of prediction of concepts in social and biomedical science questionnaires” project, grant number ST/S003916/1.

Author information

Authors and Affiliations

University of Surrey, Guildford, UK
Suparna De, Shalini Jangra, Vibhor Agarwal & Nishanth Sastry
University College, London, UCL, UK
Jon Johnson

Authors

Suparna De
View author publications
You can also search for this author in PubMed Google Scholar
Shalini Jangra
View author publications
You can also search for this author in PubMed Google Scholar
Vibhor Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Jon Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Nishanth Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suparna De .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, IIT Kharagpur, West Bengal, India
Animesh Mukherjee
Department of Computer Science, Aalto University, Espoo, Finland
Juhi Kulshrestha
Department of Computer Science and Engineering, IIT Delhi, New Delhi, Delhi, India
Abhijnan Chakraborty
School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Srijan Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

De, S., Jangra, S., Agarwal, V., Johnson, J., Sastry, N. (2023). Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences. In: Mukherjee, A., Kulshrestha, J., Chakraborty, A., Kumar, S. (eds) Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, vol 1123. Springer, Singapore. https://doi.org/10.1007/978-981-99-7184-8_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-7184-8_6
Published: 30 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7183-1
Online ISBN: 978-981-99-7184-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

Abstract

Access this chapter

Similar content being viewed by others

The Ethics of Algorithms: Key Problems and Solutions

The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool

The ethics of algorithms: key problems and solutions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

Abstract

Access this chapter

Similar content being viewed by others

The Ethics of Algorithms: Key Problems and Solutions

The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool

The ethics of algorithms: key problems and solutions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation