Static results: digital ethics is truly multidisciplinary
Figure 1 shows the term map of the field of digital ethics that was constructed using the methodology discussed in the previous section. The visualization shows 2000 key terms extracted from the titles and abstracts of the publications in our dataset. The size of a term indicates the number of publications in which the term occurs: the larger the size of a term, the larger the number of publications in which the term occurs in the title or abstract. The colour of a term indicates the cluster to which the term belongs. The horizontal and vertical axes have no special meaning. Instead, it is the distances between the terms that is important. In general, the smaller the distance between two terms, the stronger the relation between the terms, as measured by co-occurrences. Lines are used to indicate the strongest co-occurrence relations between terms. To avoid overlapping labels, only a subset of all labels is visible. The term map can be explored interactively here: https://goo.gl/hkBAWi. The software has zoom, scroll, and search functionality to facilitate a detailed exploration of the term map. It provides different views, allowing one to focus either on the map’s global structure or on its more detailed properties.
In the term map in Fig. 1, four clusters of closely related terms can be identified. Each cluster is indicated in a different colour. Our interpretation of these clusters is as follows:
-
The Law and Governance cluster, visible in blue in Figs. 1 and 2, contains terms like ‘law’, ‘right’, ‘freedom’ and ‘justice’. This cluster represents publications from the fields of philosophy of law, jurisprudence and moral philosophy.
-
The Medical Ethics cluster, visible in green in Figs. 1 and 3, contains terms such as ‘autonomy’, ‘informed consent’, ‘care’, ‘participant’ and ‘dignity’. This cluster mostly represents publications in medicine, healthcare and biomedical ethics.
-
The Business Ethics cluster, visible in yellow in Fig. 1, contains terms such as ‘customer’, ‘perception’, ‘influence’, ‘vendor’, ‘purchase’ and ‘intention’. This cluster represents mostly publications from the field of social science, predominantly economics and business studies and marketing.
-
The Data and Information Security cluster, visible in red in Figs. 1 and 4, contains terms such as ‘security’, ‘protocol’, ‘application’, ‘network’ and ‘technique’. This cluster represents publications that discuss data and information security, mostly from the field of computer science. These publications often discuss the technical and security challenges and the means to overcome problems related to data ethics.
We might have expected to find clusters entered around particular ethical terms, such as autonomy, fairness or freedom. However, the automated clustering results in clusters that correspond closely to specific academic fields: law, medicine and computer science. It shows that the strongest connections between terms originate from the fact that digital ethics is spread out over different disciplines. We see, for example, that autonomy and dignity are dominant in medicine, freedom is prominent in law and security in computer science.
We furthermore notice that there is a significant gap between these fields. As the distance between terms indicates their relations, it is noteworthy that technical and juridical terms never appear side by side. The term map is instead divided in two halves, with the left being the ethical/juridical and the right being the technical. Because the clusters form around different fields and the different clusters are rather dispersed. This is an indication that different values are discussed in different disciplines, rather than all values across all disciplines.
However, while this is an indication in that direction, the conclusion cannot readily be accepted. The clustering technique used will always put any term in only one cluster. So while it shows us where a term dominates, it does not show whether and to what extent a term is also present within the domain of another cluster. For example, the fact that security is in the Data and Information Security cluster does not mean that we can conclude that security is unimportant in other domains.
To solve this, lines are displayed in the term map to visually indicate the most frequently co-occurring terms. In Fig. 1, the 500 pairs of terms with the highest co-occurrence are presented in this way. The top 25 is listed in Table 3. By looking at the co-occurrences in this way we can find out if terms that are part of one cluster also co-occur with terms from another cluster. We gained a better understanding of the occurrences of values in different fields by looking at the position of the values “security”, “autonomy”, and “dignity”. The number of occurrences or co-occurrences within the dataset are displayed between brackets.
Table 3 Top 25 most occurring and co-occurring terms
Security (2240) is the most frequently occurring term of all. It is located in the Data and Information Security cluster and is indeed very dominant within the computer science literature. However, it also has a high co-occurrence with terms such as law (103), which is in the Law and Governance cluster and with both care (88) and participant (118), which are in the Medical Ethics cluster, showing that it is also prevalent in the other domains.
Autonomy (682) and dignity (241) are positioned close to each other in the Medical Ethics cluster. Autonomy is also the term with the highest co-occurrence to dignity (110). While autonomy itself also has high co-occurrence with informed consent (162), care (127), decision (126) and right (116). Autonomy thus has strong connections with other terms in the Medical Ethics cluster as well as with the Law and Governance cluster.
By looking at the locations of the different values in the term map and their relation with other terms, we can conclude that different values are being used in the different fields. To give some examples: Security is an important value in all clusters, but dominates in the Data and Information Security cluster, while autonomy is most prevalent in the context of Medical Ethics and in Law and Governance, but is almost absent in the Data and Information Security literature.
The meaning of the ethical terms found also depends on the context in which they are used. Autonomy, for instance, refers in the medical field to the individual’s capability to make decisions regarding the use of their data by themselves. There are many discussions on the autonomy of choice to have personal data in biobanks, under which conditions data can be shared for medical research and there is a discourse on the autonomy of the health care professionals. In computer science, however, autonomy is often used for describing a property of a technological system, often referring to the property of a system that acts or makes decisions without the involvement of any human.
Dynamic results: shift towards technical issues
Figure 5 shows a time trend overlay visualization of the term map of the field of digital ethics. The colour of a term indicates the average year of publication of the publications in which the term occurs. The closer the colour of a term is to blue, the older the publications in which the term occurs, and the closer the colour of a term is to red, the more recent the publications in which the term occurs. It shows that the terms on the right (computer science) side of the figure are more used in recent publications. What is striking about this image is that the emphasis in scientific research is shifting away from ethical and juridical terms such as dignity, autonomy, freedom, and informed consent, to more technical issues, such as encryption, dataset, efficiency, and better performance.
An initial explanation of this shift towards technical issues can be given by looking at the development of the field of digital ethics over time. Overall, the analysis shows that there is an increase of scholarly work on questions of digital ethics. As Fig. 6 demonstrates, in the first years of our analysis, between 2000 and 2002, there were between 100 and 200 publications on digital ethics per year. In 2016, the last complete year in our analysis, the number of publications was almost 1200. Overall, we see an approximately exponential increase in the number of publications over time.
Zooming in and looking at the development in the different scientific fields in Fig. 7, a slightly different picture emerges.Footnote 5 In the early years, the dataset shows that biomedical and social sciences dominate the scholarly work on digital ethics. Both fields show a marked growth in publications on digital ethics. The field of computer science research starts out at a very low number of publications in the early years, but shows a much faster increase in the number of publications compared to the other fields. So, in 2016, many more publications on digital ethics are from this field than from any other field. The shift from ethical/juridical to technical issues would thus be explained as an expression of the growth of the number of publications in computer science.Footnote 6
It might also be that the growth of publications on digital ethics is an effect of the growth in scientific publishing in general. This growth, although very hard to know exactly is estimated to be around 8–9% per year in recent years (Bornmann and Multz 2015). Similarly, the relative growth of digital ethics in computer science could be an effect of the fast growth of that field in general. In order to check for this we also looked at the normalized growth in the number of publications.
Doing so reveals the following: While the number of scientific publications in general grew with a factor 2, the number of publications on digital ethics grew with a factor 10. So, if we adjust for the general growth of scientific publications, we see that the number of publications in digital ethics grew 5 times faster than the number of publications in general.
The data also shows that digital ethics in computer science has increased with a factor 9.5 relative to the growth of computer science in general. This validates the thesis that computer science is increasingly the locus of questions concerning digital ethics. This fact is borne out by Fig. 8, which shows the relative percentages of the different fields, showing a marked growth in the share of computer science.