The SRDS is taught in a 2-week block, and there is little formal teaching time allocated to ethics on the extremely full curriculum (3.5 h). Unlike more traditional ethics courses that take place over a semester or year with considerable student engagement time (Friedman and Kahn 1994), the course designers had to find ways to integrate open and responsible science citizenship in a short space of time. In particular, they had to find ways to integrate ethics into the curriculum so that it was integrated into the general flow of the course and was not a “stand alone” subject. Similarly, as the rest of the modules are highly practice-oriented, the staff had to ensure that the ethics instruction integrated into a highly technical curriculum in a way that nonetheless encourages internalisation and assimilation by the students.
Of course, the difficulties of integrating meaningful teaching into crowded curricula is a common occurrence for ethics education. Indeed, perceptions of “losing technical content to ethics instruction” is a common concern that ethics educators have to navigate (Miller 1988, p. 38). Similarly, high student numbers and limited time mean that there is often little space for more creative pedagogical tools, such as extensive group discussion, role playing, group work, projects or any of the other tools commonly promoted for engaged ethics pedagogy (Baker et al. 2013). Moreover, as there is often a lack of expertise in ethics teaching amongst computing faculty, there are concerns within teaching staff about taking on ethics teaching. In particular, potential educators are worried that lack of experience will lead to an imposition of moral codes rather than robust ethics discussions (Miller 1988).
As a result, lecturing ethics to science students often becomes a balancing act of content, depth, style and focus. Common responses to these balancing acts involve stand-alone ethics lectures detailing key ethical principles and/or case studies relating some ethical crisis. The limitations of this approach are evident. In particular, it is important to question how much the “stand-alone” style of ethics instruction enables students to internalize ethical norms and enact them in their daily practices (De Schrijver and Maesschalck 2013). It is often questioned whether an ethical “light touch” really leads to ethically competent researchers. Instead, detractors suggest that this educational approach educates solely for ethical awareness or compliance. Moreover, it is possible that the use of case studies—particularly those that do not closely reflect the working conditions and activities of the students—further hinder the process of internalization by making ethics appear as “something that happens to other people”. The challenge was therefore to find ways to address all these issues, and to weave ethics awareness throughout the curriculum.
As far as possible, the course designers wanted to avoid a stand-alone ethics lecture that provided a high-level introduction to ethical concepts without any contextualization. This approach to teaching, as discussed above, was felt to be unproductive and stopped students from making connections between ethics and their daily practices. The course designers were very sure that what is needed is a combination of broader ethical principles with contextual case studies enables students to see how the ethical principles translate into daily practice. It was also important that these discussions include all key areas of data ethics: provenance, design of infrastructures and practice.
The course designers determined that ethics need to be embedded at the core of the SRDS curriculum. Students need to see how ethics permeate all aspects of data science practice, from their use of programming tools to their authorship practices and research data management. In collaboration with both Sarah Jones, the research data management instructor, and Gail Clement, the open authorship instructor, the design team capitalized on the allocated teaching time within the curriculum to maximise the exposure that the students had to open and responsible science citizenship (Table 1).
Table 1 Course breakdown for teaching open and responsible (data) science citizenship Lecture 1: Introducing Key Concepts
In all the summer schools, the majority of students have no prior exposure to formal ethics instruction. The first lecture of each SRDS therefore, addresses the key concepts of open and responsible data science citizenship. This 1.5 h time slot consists of a lecture introducing key concepts such as Open Science/Data, Responsible Conduct of Research and the course designers’ concept of open and responsible data science citizenship. The lecture is followed up by a series of exercises during which students are asked to note issues they felt represented good and bad practice in relation to research data. Students first note their own perceptions, then discuss synergies in small groups. This is followed by a group discussion.
Week 1: Lectures
During week 1 students are taught a range of technical modules, including shell, R, GitHub, and SQL. As will be discussed below, each of these modules involves some ethics instruction. Week 1 also includes modules on research data management and open authorship. These modules are strongly rooted in both Open Science and Responsible Conduct of Research. Furthermore, these modules introduce specific data ethics issues relating to the subject area. For example, research data management includes discussions on FAIR data (Wilkinson et al. 2016), while open authorship includes discussions on predatory journals, author processing charges, and channels of data dissemination.
Lecture 2: Contextualizing Ethics
Week 1 ends with a 2-h ethics lecture. This lecture has two key objectives: to enable and prompt students to begin to think about how to implement open and responsible science citizenship within their own research context, and to think about the broader responsibilities associated with data science expertise.
Persuading students to problematize how they would implement open and responsible science citizenship within their own research institution is linked to the underlying virtue ethics tradition informing the SRDS curriculum. As the curriculum advocates for practice-based ethics, it is vital that students start making connections between the ethics instruction they are receiving and their daily research practices. Moreover, as the students of the SRDS are from LMICs, it is likely that many of them will experience challenging circumstances in their home research environments. Enabling students to discuss potential challenges to openness and responsibility is thus an important way of normalizing future problems, of highlighting potential solutions, and of ensuring that students feel comfortable raising these challenges with their peers and future collaborators. This ensures the longevity of the SRDS instruction and avoids students becoming disheartened and disengaged.
The discussions on challenges and solutions makes use of the grid presented in “Appendix 1” This is given as a handout to students, who are encouraged to fill it in during and after their discussions. As is evident from the design, the object of the grid is to get students to think about their challenges and practices through the research life cycle. The second column of “Appendix 1” lists some of the tools that are introduced during the SRDS, column 3 highlights some of the ethical issues that were discussed, and columns 4 and 5 require the students to fill in issues relating to their own context. Columns 2 and 3 are intentionally incomplete, requiring the student groups adapt them as they see fit.
Once the students complete their grids, the class engages in a group discussion about the challenges of implementing open and responsible data science citizenship within one’s home institution, and problematizes ways in which these challenges can be overcome. Common issues to be discussed include institutional cultures such as, promotion criteria, incentivization, cultural specificities; institutional support such as facilities, resources, institutional cultures; resources, such as time, money, infrastructures; copyright and ownership, and general concerns such as being scooped and not having time for research.
By getting students to talk through the problems and possible solutions the staff hope to demystified some of the misconceptions about Open Data: that it should be easy, that other people do not have problems, that if someone cannot get it right it is their own fault. The staff encourages students to see that their peers, and even the instructors, experience the same problems and that the most effective way of dealing with them is to ask for help. Students need to identify how the multifarious tools that they have learned during the SRDS can to proactive problem-solving actions.
At the end of the class the staff encourages students to form support networks that they can tap into once they return home. In particular, students who would likely experience similar problems at their home institutions are encouraged to connect so as to share best-practice experiences and ideas. Having this support is something the staff views as essential for stable and persistent ethics among outgoing students.
The second half of this session involves a more formal lecture introducing some of the broader topics of data ethics, such as infraethics and algorithmic biases. This sets the scene for the ethics exercises relating to the modules offered in the second week such as data visualization, information security, recommender systems, machine learning, and research computational infrastructures.
Modular Ethics Exercises
As mentioned above, the SRDS curriculum is modular, and students learn key data science tools in discrete work packages. This approach follows the Carpentries format, which is modular and incremental (Teal et al. 2015). In order to ensure that the ethics content from the formal ethics lectures is linked to the technical content, the course designers created small 15-min “ethics prompts” to accompany each module (see Fig. 2). The ethics prompts are administered via a range of different modalities, including writing answers on post-it notes, live voting and mind-mapping. Students are expected to complete an ethics-related question and participate in a short discussion at the end of each module (see “Appendix 2”). These ethics prompts are specifically related to the content of the module completed, while linked to the broader ethical issues introduced in the lectures.
The prompts are intended to link the concept of open and responsible science citizenship to the data tools being taught. Engaging students in a short amount of ethical reflection that relates to the tool they have just learned is a good way of highlighting/noting ethical issues, responsibilities and considerations that are part of daily data science practice. They also provide the opportunity to extend the ethics discussions to some topics that could not be addressed in the formal instruction. The responses to each prompt are collated by the organizers, and a summary of the class participation for each prompt are displayed on boards in the communal area for the duration of the course. Students often visit these boards during the break times, showing a good level of follow-up on the exercises.