A growing amount of scholarship examining the online environment has highlighted a range of social and ethical challenges emerging from digital spaces. These include individual misbehaviors caused by erosion of empathy, promotion of narcissistic behavior, internet addiction, etc. (Vallor 2016). What such issues illustrate is the urgent need for a comprehensive understanding of socially responsible behavior that enables individuals to function and flourish online, while protecting the individuals, structures and systems around them. In particular, there is a need for a robust interpretation of “digital citizenship” that takes into account the novelties of the online environment as compared to traditional spheres of action. What is needed are better descriptions of how individuals can act responsibly in spaces that are removed from traditional societal structures and mechanisms of control.
Perhaps even more challenging to formulations of digital citizenship, however, is the recognition that the online environment is highly dynamic. The same people who are users of digital structures are also contributing to their evolution. Indeed, the emergence of tools such as machine learning mean that user behavior dynamically changes the tool in question through feedback and evolution. A considerable amount of research already details these problems, such as those examining the operation of search engines and the perpetuation of biases (Bozdag 2013). Moreover, examples such as Cambridge Analytica highlight how these processes can be used to manipulate the behavior of users (Zuboff 2015; Susser et al. 2019). A number of scholars have already started trying to bring together these disparate ethical considerations into comprehensive narratives. Floridi (2018), for example, distinguishes between digital governance, digital regulation, and digital ethics. Digital governance relates to the procedures and practices for establishing and implementing policies, as well as the creation of codes of conducts and practice, while digital regulation refers to the evolving system of rules and laws enforced through social and governmental institutions. Digital ethics plays a role in both. According to Floridi, it can play a significant role in shaping both governance and regulation by providing guidance on principles that fosters more just digital environments that align with features of ‘the good society’. Here we will focus especially on the part of digital ethics that deals with data science—what has been called data ethics.
In large part, discussions on responsible individual behavior online within data ethics have focused on understanding the ethical outcomes of data use and the design of algorithms. Approaches vary from utilitarian discussions on the societal impact of algorithm design to the deontological development of principles to guide action. The latter in particular has attempted to address ethical and societal issues connected to the digital environment through the formulation of ethical principles to guide the innovation and the use of digital tools. Websites such as Algorithm Watch offer an (almost) up-to-date list of initiatives proposing frameworks or principles.Footnote 2 The proliferation of scholarship on aspirational, guidance and enforceable codes of conduct has been welcomed as a positive contribution to AI regulation and governance (and data science in particular).
In an attempt to focalize data ethics discussions, Floridi and Cowls (2019) have proposed that the number of ethical principles in use should be reduced. They suggest that the identification of a set of common principles will inform ongoing attempts of digital governance and regulation, and it can constrain the ability of corporations to embrace expedient relativism in their interpretations of ethics. In particular, they identify five principles that seem to be common to many relevant initiatives. These are beneficence, nonmaleficence, autonomy, justice, and explicability (which includes intelligibility and accountability). This, they acknowledge, aligns AI ethics (and as a result also data ethics) with the principlist approach in biomedical ethics (Beauchamp and Childress 2009), and less with the rich tradition of ethics of technology and computer/information ethics. The ethics discourse around the AI revolution (including data science) is thus emerging with a specific character. It is increasingly aiming to deliver an abstract and general evaluation of what is right and wrong, and to identify common shared principles that loosely guide grand projects of regulation and governance, as well as individual behavior.
This move towards principlism is not without its critics (Mittelstadt 2019; Whittlestone et al. 2019), and an increasing number of scholars are raising concerns. Given that the principlist approach has been developed in the medical context, its content has been shaped along those lines. Some criticisms are geared especially towards the shape of this particular content: data scientists are not physicians, and the ethical content of principlism may not be adequate to properly cover the issues emerged in the data science context. However, we are more interested in other issues which are connected to two key areas: the level at which the discourse is situated (“applicability”) and the problems associated with pedagogy (“teachability”).
Digital ethics—and data ethics is no exception—is currently dominated by what has come to be called macroethics or hard ethics (Floridi 2018). This approach attempts to integrate the disparate areas of infrastructure design, deployment, and the use by taking a broad view of the online environment. This approach links to the growing number of centers and courses focusing on internet and society. These centers (and the courses that they offer) focus on internet studies, intersecting with key fields like human–computer interaction and science, and technology studies.
The scope covered by macroethics, together with its alignment with the social studies of digital environments/cultures, can make it difficult to locate the individual within ethics discussions. Indeed, how individual responsibility plays out in spaces in which disparate technologies, platforms, stakeholders, practices and discourses are co-evolving is extremely complex. As a result, much of macroethics discourse focuses on key themes, such as identity and subjectivity, social exclusion and inequality, politics and democracy, globalization and development, privacy and surveillance.
In discussing these themes, macroethics often uses higher level case studies from thematic areas, such as social media, big data, citizen journalism, digital culture, the creative industries, internet governance, and digital rights. These include examples of clear-cut ethics violations, such as the controversy surrounding Cambridge Analytica’s involvement in the US elections (Susser et al. 2019). They also include examples of multifaceted, multistakeholder problems, such as the integration of algorithmic bias in search engines (Bozdag 2013). These case studies are variously presented using both deontological and utilitarian ethics, but are united through their focus on the higher level outcomes and the impact of these outcomes on society. Rarely, if ever, do they specifically focus on individual actions, collaborative negotiations and decision-making practices.
The use of high-level case studies thus presents various problems. First, while the principlist approach implicit in the use of high-level case studies works well for analyzing these large issues, understanding them from an individual perspective is more difficult. Many of these case studies either do not describe individual action (focusing on companies, multi/national structures), describe intentionally maleficence actions, or reduce individual action to yes/no decisions (i.e., to use or not use a platform). The nebulous position of the individual within these issues, and the reliance on higher level principles, thus reduces discussion on individual ethics and agency to a reduced range of positions. These can be detailed as follows in Table 1:
Moreover, while individuals are able to engage with the case studies and discuss the ethical implications in general, the link between these ethics and their personal experiences and daily activities is far from certain. Indeed, most digital activity is repetitive and relatively mundane, and users unlikely to be engaged with the action spaces in which most of these case studies play out.
As a result, macroethics discussions often limit individual responsibility to the avoidance of obviously unethical behavior, such as theft, harm, violation of privacy. This leaves the responsibility—and agency—for the ethical issues described in the case studies to large corporations and governments, as they fight for control over algorithms, data distribution and re-use. Thus, while the individual user is recognized to be a contributor to the dynamic digital evolution, there is little guidance on how they can influence their immediate online environment towards more ethical futures. In other words, macroethics provides few hints on how to apply ethical principles in concrete situations.
These problems have been noted in the literature. For instance, Morley et al. (2019) argue that, while macroethics gives a justification of “why” individuals should be concerned about AI ethics (and hence data ethics), it does not provide an easy pathway from “why” to “how” they should be engaged. Floridi recognizes this problem of applicability, by stressing that it is “not just what ethics is needed but also how ethics can be effectively applied and implemented in order to make a positive difference” (2019, p. 185). Nonetheless, as highlighted again by Morley et al. (2019), “[t]he gap between principles and practice is large” (p. 7), since efforts in data ethics do not specify to practitioners where and how the principles should be implemented exactly. This is a problem that also hampers codes of conduct—shaped in a principled way—in the computational sciences with the result of being ineffective in practice (McNamara et al. 2018). When one attempts to applies those principles in specific contexts, what emerges is that much of the macroethical work on data ethics “has been completed in the abstract, independent of concrete cases” (Kitto and Knight 2019, p. 2856).
Similar voices of concern come from Haggendorff (2020), who claims that “[u]ltimately, it is a major problem to deduce concrete technological implementations from the very abstract ethical values and principles”. Madaio et al. (2020) add that “the abstract nature of AI ethics principles [including data ethics] makes them difficult for practitioners to operationalize” (p. 1). On a related note, Vakkuri et al. (2020) claim that “[d]evelopers struggle to implement abstract ethical guidelines into the development process” (p. 1). The problem of ‘deducing concrete technological implementations from principles’ or ‘operationalizing principles’ has two parts. First, principles are not rules, which are precise and neat. As Zwolinski and Schmidtz say “[w]here rules function in our reasoning like trump cards, principles function like weights” (2013, p. 222). They can be weighed one against the other, in the sense “principles can weigh against X without categorically ruling out X” (p. 222), and “[q]uestions of weight and priority must be assessed in specific contexts” (Beauchamp 2015, p. 406). Yet, people expect principles to be like rules. The second part of the problem is that those principles can be understood in radically different ways, sometimes mutually exclusive. This creates confusion in understanding which version of the principles we should apply (Binns 2018). These issues have motivated new proposals aimed at ‘embedding ethics’ in the practice of data science (Grosz et al. 2019; McLennan et al. 2020). The idea behind this intuition is that we should find ways to move ethics closer to the actual practice of data science, so that data scientists will be able to visualize what part of their job has ethical relevance.
A final set of issues associated with macroethics that exacerbate the applicability problem relates to its scope. The focus on general principles means that it rarely engages with the diversity of roles that individuals play within the digital landscape (e.g., data producer, data engineer, data analyst, machine learning engineer, general user). The diversity of the digital landscape itself makes it difficult to translate the macroethical concerns into rules (the “how”) that apply “across the board” to daily individual activities. Similarly, it does not respond to recent sociotechnical scholarship on digital landscapes. It is therefore ill placed to address questions of landscape boundaries, such as whether it includes the data, the technical infrastructure, the companies operating online, the online communities, etc. Related questions of whether the digital landscape is solely located online, or whether it extends to the physical world through its interconnectedness with sociotechnical landscapes.
The gap between principles and concrete technological implementations has consequences on the teachability of macroethics to students or training of professionals. If there is not a connection between individual technical choices and ethical relevance (i.e., if ethics is not embedded in the actual practice of data science), then it is difficult to deliver modules that shows the relevance of ethics for the tasks of data scientists. The difficulties of teaching macroethics are even more evident when considered from the point of view of its strong links to other dominant pedagogical strategies within bioethics, namely biomedical ethics and Responsible Conduct of Research (RCR), which seem to suffer from analogous problems. Indeed, data management, data sharing and responsible online behavior are often incorporated into RCR teaching in universities across the globe.
When considered in light of the problems of developing an individual ethics that accounts for daily actions (as outlined in Sects. 2.1, 2.2), it is unsurprising that this bioethicization/RCRization can be viewed as problematic. Biomedical ethics, in particular, has been heavily criticized for its reliance on extreme and unrealistic moral dilemmas and famous controversies where the application of principles is more straightforward. This has led to concerns that the full spectrum of ethical nuances encountered in the medical profession are unlikely to be fully addressed. Komesaroff (1995), for example, suggested that the structure of prevalent bioethical discourse constrained the way topics are taught, most notably in the form of a dilemma. Ethical issues are positioned within a demarcated theoretical field that postulates choices from a range of pre-established possibilities, with clear attractive and unattractive connotations. This, in turn, restricts the scope of its subjects, by emphasizing topics more prone to be expressed in the form of “extreme dilemmas” such as euthanasia, autonomy and paternalism. Truog et al. (2015) emphasized that most educators largely rely on a case-based method for teaching ethics, and that these case studies tend “to focus on extreme or unusual situations [and] controversies that generate media attention” (p. 11). This focus is not helpful in educating medical students to identify other subtle and highly contextualized ethical issues. It is precisely this lack of contextual guidance that Komesaroff laments when he suggests that medical ethics ignores the subtle nature of doctor–patient interaction, its social context, and all ethical issues underneath this endless negotiation. Multimodal communication, such as the choice of words, inflexions and gestures, all have ethical relevance in shaping the doctor–patient relation but are largely ignored by most bioethics training (Komesaroff 1995). RCR trainings have been plagued by similar problems (Chen 2020). There are growing concerns that the vocational nature of RCR has been replaced educational approaches that foster rule following, compliance and avoidance of recognized misbehaviors rather than aspiring to excellence.
Data ethics modules do not have yet a precise identity, but a list of courses discussing ethical issues related to AI and data scienceFootnote 3 shows that many courses are shaped along the lines of the characteristics of bioethics/RCR that we outlined above. Teaching courses in this way reflects a macroethical approach that simply imports in pedagogy the same issues of applicability outlined above (McNamara et al. 2018; Madaio et al. 2020; Vakkuri et al. 2020).
However, if data ethics has yet to find its identity in terms of pedagogical strategies, we want to avoid that it inherits all these problems. In what follows, we will focus on the issues around teachability, especially in the context of teaching data ethics to students in data science, by proposing a new approach based on the integration of microethics within a virtue ethics framework. Integrating microethics and virtue ethics, we argue, provides solid foundations for embedding ethics in the practice and in the teaching of data science.
The need for a new approach
While macroethics provides an important perspective on the “big picture” of digital evolution, it thus struggles to address the questions that affect individuals in their daily activities. It would seem that what is needed instead is a way of fostering mindfulness, social responsibility and care that directly relates to the daily engagement of individuals with the digital landscape. In the rest of this paper, we develop such an approach by focusing on data scientists, and in particular our target are students in data science curricula. However, our approach can be extended to data science professionals or researchers with little adjustments. We make use of “data scientist” broadly to refer to any individual with a level of computing/programming expertise whose daily activities involve working with data analysis or processing. These data scientists work in a wide range of disciplines, institutions, and make use of a plethora of different data types. Nonetheless, they are united by the scope and focus of their daily actions and the types of computational tools they use.
Before proceeding, it is important to note that our criticism of macroethics relates to its use as the sole means of ethics instruction. There is undoubted value in using macroethics case studies as a means of outlining the ethics of the emerging digital landscape. Nonetheless, as a means of teaching responsible daily research conduct to data scientist, we believe that macroethics needs to be blended with another approach that highlights the ethical import of daily actions.