From dignity to security protocols: a scientometric analysis of digital ethics

Our lives are increasingly intertwined with the digital realm, and with new technology, new ethical problems emerge. The academic field that addresses these problems—which we tentatively call ‘digital ethics’—can be an important intellectual resource for policy making and regulation. This is why it is important to understand how the new ethical challenges of a digital society are being met by academic research. We have undertaken a scientometric analysis to arrive at a better understanding of the nature, scope and dynamics of the field of digital ethics. Our approach in this paper shows how the field of digital ethics is distributed over various academic disciplines. By first having experts select a collection of keywords central to digital ethics, we have generated a dataset of articles discussing these issues. This approach allows us to generate a scientometric visualisation of the field of digital ethics, without being constrained by any preconceived definitions of academic disciplines. We have first of all found that the number of publications pertaining to digital ethics is exponentially increasing. We furthermore established that whereas one may expect digital ethics to be a species of ethics, we in fact found that the various questions pertaining to digital ethics are predominantly being discussed in computer science, law and biomedical science. It is in these fields, more than in the independent field of ethics, that ethical discourse is being developed around concrete and often technical issues. Moreover, it appears that some important ethical values are very prominent in one field (e.g., autonomy in medical science), while being almost absent in others. We conclude that to get a thorough understanding of, and grip on, all the hard ethical questions of a digital society, ethicists, policy makers and legal scholars will need to familiarize themselves with the concrete and practical work that is being done across a range of different scientific fields to deal with these questions.


Introduction
In a time of rapid development in the field of digital technologies, data protection is becoming an increasing priority for citizens. In response to this, new legislation is being adopted that incorporates core values such as privacy, autonomy and integrity. The General Data Protection Regulation (2016/679) (GDPR), for example, which has passed on the 27th of April 2016 and will enter into force on the 25th of May 2018, will require governments and businesses to drastically change their relationship with personal data. Yet whether these core values will actually be safeguarded depends on our ability to effectively translate these abstract notions into sound principles, adequate concepts and concrete data protection practices.

3
The academic community could play an important role in finding solutions to these and other ethical problems associated with the digital revolution. We want to know how the academic community is addressing these ethical questions, and how it can bridge the gap between core values and the application of these values in practice. In order to arrive at a better understanding, we aim to find out where and by whom the collection of issues that we call 'digital ethics' is being investigated, and we will pursue the following sub-questions: (1) Are the number of publications on digital ethics growing over time? (2) Which values are being discussed, and who is involved in these discussions? And in relation to this question, being just as relevant for understanding the current landscape of academic research on digital ethics, is the question: (3) What values are not being discussed and who is not part of these discussions?
To answer these research questions, we have used scientometric methods, that is the application of quantitative methods to (scientific) corpora of texts. In a first phase, we have collected the academic publications which revolve around ethical questions in the digital realm within a given timespan in a dataset. In a second phase, we have mapped the cooccurrence relations of key terms used in these publications by using a software tool called VOSviewer to offer an indication of the priorities and interests in different scientific areas and to show how these have developed over time.
Scientometric methods 1 are often used to assess the state of a research field by mapping and visualising it. Scientometric network analysis is used to determine the proliferation of certain research topics over time, to study the occurrence of certain research topics in different research fields and to identify topical clusters within a field (e.g., Groot et al. 2015;Rizzi et al. 2014;Rodrigues et al. 2014). In the domain of digital ethics, a scientometric analysis has previously been undertaken by Heersmink et al. (2011). They constructed their dataset based on all publications that appeared in a prespecified number of journals that are part of the field of digital ethics. Yet it appears that many of the important contributions to digital ethics are being made outside of the academic discipline of ethics. We have therefore created a dataset that includes publications that deal with topics that are relevant for digital ethics and data protection, irrespective of the academic field in which the publication is published. This allows us to analyse the current discussions on digital ethics with a much broader scope, to include discussions that are taking place outside of the journals that traditionally and specifically deal with digital ethics, and to identify in which fields these discussions predominantly take place.
We will start in "Data and method" by giving a description of the scientometric method we use to delineate the academic literature that deals with digital ethics. In "Results", we will present our findings. In the first part of this section, we present an overview of the different sub-fields that seem to be present in the field of digital ethics. In the second part of this section, we show the changes that occurred in the field over time. In "Discussion", we discuss the implications of our findings. In the last section we will summarise and present ideas for future research.

Constructing a dataset of publications on digital ethics
The first step in our study was the construction of a representative dataset of scientific publications on digital ethics. To delineate the field of digital ethics, we took as our point of departure the term "digital ethics" as it is used by the European Data Protection Supervisor (EDPS), namely to indicate those reflections and analysis regarding ethical concerns that arise in the wake of digital technological expansion, especially those revolving around privacy and data protection-broadly conceived (Buttarelli 2015). The field is closely related to Computer and Information Ethics as described by Bynum (2016), which refers to the branch of applied ethics which studies and analyses social and ethical impacts of ICT (Information and Communication Technology).
To construct our dataset of publications on digital ethics, we considered all publications that are indexed in Clarivate Analytics' Web of Science (WoS) database, appearing between 2000 and 2016, and that are of the document type "article" or "review". The selection of publications from the WoS database was done in two steps. In the first step, we selected all publications from two journals that are specifically focused on digital ethics, namely Ethics of Information Technology (EIT) and Information, Communication & Society (ICS). This step yielded 717 publications. In the second step, publications were selected based on terms occurring in their title, abstract, and author keywords. These search terms are related to the digital realm and to ethics, such as privacy, big data, and informed consent. Table 1 lists all the search terms that we used. Each search term was given a score between 1 and 5. The more specific to digital ethics a search term is, the higher the corresponding score. Publications were selected only if the cumulative score of all search terms occurring in the title, abstract and author keywords was at least 10. Search terms and the corresponding scores were selected by the authors of this article in collaboration with representatives of the EDPS. The term-based search in the second step yielded 7314 publications in addition to the 717 publications selected in the first step. In this way, we obtained an overall selection of 8031 (= 717 + 7314) publications. The publications appeared in 1906 different journals. Table 2 provides an overview of the 10 journals with the largest number of publications in the dataset.
In order to construct and validate our publication dataset, we consulted a group of 7 academics in the field of digital ethics from Leiden University, Delft University of Technology and Princeton University. Special attention was paid to so-called 'false positives' (publications that are included in the dataset while not dealing with digital ethics) and 'false negatives' (publications that deal with digital ethics but are not present in the dataset). In our final dataset of 8031 publications, well over 80% of the publications were considered to be related to digital ethics, while most of the publications that the experts deemed important in the field were included. 2

Constructing a term map based on the collected dataset
Based on 8031 publications collected in the previous step, the next step in our study was the construction of a "term map". The idea of this term map is to provide a visual representation of the field of digital ethics by showing the most relevant terms occurring in the titles and abstracts of the publications in our dataset. We constructed the term map using the VOSviewer software tool 3 (Van Eck and . This was done as follows. First, we identified relevant terms in the titles and abstracts by using the term identification algorithm that is implemented in VOSviewer (Van Eck and Waltman 2011). The algorithm has three main steps. In the first step, all noun phrases are identified using natural language processing techniques and plural noun phrases are converted  into singular ones. In the second step, infrequently occurring noun phrases are excluded. In the third step, very general, irrelevant noun phrases like "result", "conclusion" or "paper" are excluded. These noun phrases appear in many scientific publications and are therefore less informative. The first step of the automatic term identification algorithm, identified 125,961 noun phrases in the 8031 publications in our dataset. The second step excluded all noun phrases occurring in fewer than 13 publications. This resulted in a set of 2730 noun phrases of which the third step of the algorithm selected the 2000 most relevant noun phrases. Based on our set of 2000 relevant terms, we then used the VOSviewer software to determine for each pair of terms the co-occurrence frequency. Two terms co-occur when they both occur in the title or abstract of the same publication. Based on the co-occurrence frequencies, the VOSviewer software constructed a term map using a layout and clustering technique. The layout technique  is responsible for positioning the terms in the term map in such a way that the distance between any pair of terms provides an approximate indication of the relatedness of the terms as measured by co-occurrences. The layout technique has attraction and repulsion parameters that allow for some degree of customization in the way terms are positioned in a term map. We used a value of 1 for the attraction parameter and a value of 0 for the repulsion parameter. These values yielded the most satisfactory layout. The clustering technique  is responsible for producing a clustering of the terms in the term map by assigning frequently co-occurring terms to the same cluster. Colours are used to indicate the clustering of terms. Terms that belong to the same cluster have the same colour. The clustering technique has a resolution parameter that determines the level of granularity of the clustering that is obtained. We used the default value of 1 for this parameter.

Strengths and weaknesses of the applied methodology
The methodology applied in this study has certain strengths and weaknesses. By applying an automated search through a large set of scientific publications, it becomes possible to get an overview of the literature that is wide in scope. The rigidity of the search approach that is used helps to get an objective overview. These characteristics are more difficult to accomplish in a common literature review. Furthermore, the method directs us to relevant publications in diverse fields that are not necessarily the specialization of the researchers doing the analysis.
Although this is very useful for a broad and complex topic like digital ethics that is multidisciplinary in nature, these advantages come with some limitations, which we have tried to minimize. The applied methodology does not provide detailed insights in the content of individual publications. This is resolved by looking at the abstract or full-text of specific publications. Also, some of the decisions for using certain parameter values for the algorithms used are quite arbitrary. For example, the threshold value of 10 for the inclusion of publications in the dataset, or choosing a clustering that results in 4 clusters and not 2 or 8. We, therefore, checked a range of different values and found that our main conclusions are robust with regards to these choices.
Furthermore, we only search for publications in the WoS database. At this moment it is one of the most comprehensive databases available with good data quality (Mingers and Leydesdorff 2015). While it is known that this database has a relative lack of publications in the humanities, important ethics journals that regularly cover topics relevant to digital ethics such as "Ethics and Information Technology", "Information Communication & Society", "Science & Engineering Ethics" and "Science, Technology & Human Values" are included. Our method includes English language publications and academic articles only (and not academic books, newspaper articles, OECD reports, etc.). 4 So, we do not claim to have a database of all publications on digital ethics, but we do have a representative selection and have validated that we do so by discussing the representativeness of our database with specialists in the field.

Results
Static results: digital ethics is truly multidisciplinary Figure 1 shows the term map of the field of digital ethics that was constructed using the methodology discussed in the previous section. The visualization shows 2000 key terms extracted from the titles and abstracts of the publications in our dataset. The size of a term indicates the number of publications in which the term occurs: the larger the size of a term, the larger the number of publications in which the term occurs in the title or abstract. The colour of a term indicates the cluster to which the term belongs. The horizontal and vertical axes have no special meaning. Instead, it is the distances between the terms that is important. In general, the smaller the distance between two terms, the stronger the relation between the terms, as measured by co-occurrences. Lines are used to indicate the strongest co-occurrence relations between terms. To avoid overlapping labels, only a subset of all labels is visible. The term map can be explored interactively here: https ://goo.gl/hkBAW i. The software has zoom, scroll, and search functionality to facilitate a detailed exploration of the term map. It provides different views, allowing one to focus either on the map's global structure or on its more detailed properties.
In the term map in Fig. 1, four clusters of closely related terms can be identified. Each cluster is indicated in a different colour. Our interpretation of these clusters is as follows:   Fig. 1, contains terms such as 'customer', 'perception', 'influence', 'vendor', 'purchase' and 'intention'. This cluster represents mostly publications from the field of social science, predominantly economics and business studies and marketing. • The Data and Information Security cluster, visible in red in Figs. 1 and 4, contains terms such as 'security', 'protocol', 'application', 'network' and 'technique'. This cluster represents publications that discuss data and information security, mostly from the field of computer science. These publications often discuss the technical and security challenges and the means to overcome problems related to data ethics.
We might have expected to find clusters entered around particular ethical terms, such as autonomy, fairness or freedom. However, the automated clustering results in clusters that correspond closely to specific academic fields: law, medicine and computer science. It shows that the strongest connections between terms originate from the fact that digital ethics is spread out over different disciplines. We see, for example, that autonomy and dignity are dominant in medicine, freedom is prominent in law and security in computer science.
We furthermore notice that there is a significant gap between these fields. As the distance between terms indicates their relations, it is noteworthy that technical and juridical terms never appear side by side. The term map is instead divided in two halves, with the left being the ethical/ juridical and the right being the technical. Because the clusters form around different fields and the different clusters are rather dispersed. This is an indication that different values are discussed in different disciplines, rather than all values across all disciplines.
However, while this is an indication in that direction, the conclusion cannot readily be accepted. The clustering technique used will always put any term in only one cluster. So while it shows us where a term dominates, it does not show whether and to what extent a term is also present within the domain of another cluster. For example, the fact that security is in the Data and Information Security cluster does not mean that we can conclude that security is unimportant in other domains.
To solve this, lines are displayed in the term map to visually indicate the most frequently co-occurring terms. In Fig. 1, the 500 pairs of terms with the highest co-occurrence are presented in this way. The top 25 is listed in Table 3. By looking at the co-occurrences in this way we can find out if terms that are part of one cluster also co-occur with terms   1 3 from another cluster. We gained a better understanding of the occurrences of values in different fields by looking at the position of the values "security", "autonomy", and "dignity". The number of occurrences or co-occurrences within the dataset are displayed between brackets. Security (2240) is the most frequently occurring term of all. It is located in the Data and Information Security cluster and is indeed very dominant within the computer science literature. However, it also has a high co-occurrence with terms such as law (103), which is in the Law and Governance cluster and with both care (88) and participant (118), which are in the Medical Ethics cluster, showing that it is also prevalent in the other domains.
Autonomy (682) and dignity (241) are positioned close to each other in the Medical Ethics cluster. Autonomy is also the term with the highest co-occurrence to dignity (110). While autonomy itself also has high co-occurrence with informed consent (162), care (127), decision (126) and right (116). Autonomy thus has strong connections with other terms in the Medical Ethics cluster as well as with the Law and Governance cluster.
By looking at the locations of the different values in the term map and their relation with other terms, we can conclude that different values are being used in the different fields. To give some examples: Security is an important value in all clusters, but dominates in the Data and Information Security cluster, while autonomy is most prevalent in the context of Medical Ethics and in Law and Governance, but is almost absent in the Data and Information Security literature.
The meaning of the ethical terms found also depends on the context in which they are used. Autonomy, for instance, refers in the medical field to the individual's capability to make decisions regarding the use of their data by themselves. There are many discussions on the autonomy of choice to have personal data in biobanks, under which conditions data can be shared for medical research and there is a discourse on the autonomy of the health care professionals. In computer science, however, autonomy is often used for describing a property of a technological system, often referring to the property of a system that acts or makes decisions without the involvement of any human. Figure 5 shows a time trend overlay visualization of the term map of the field of digital ethics. The colour of a term indicates the average year of publication of the publications in which the term occurs. The closer the colour of a term is to blue, the older the publications in which the term occurs, and the closer the colour of a term is to red, the more recent the publications in which the term occurs. It shows that the terms on the right (computer science) side of the figure are more used in recent publications. What is striking about this image is that the emphasis in scientific research is shifting away from ethical and juridical terms such as dignity, autonomy, freedom, and informed consent, to more technical issues, such as encryption, dataset, efficiency, and better performance. An initial explanation of this shift towards technical issues can be given by looking at the development of the field of digital ethics over time. Overall, the analysis shows that there is an increase of scholarly work on questions of digital ethics. As Fig. 6 demonstrates, in the first years of our analysis, between 2000 and 2002, there were between 100 and 200 publications on digital ethics per year. In 2016, the last complete year in our analysis, the number of publications was almost 1200. Overall, we see an approximately exponential increase in the number of publications over time.

Dynamic results: shift towards technical issues
Zooming in and looking at the development in the different scientific fields in Fig. 7, a slightly different picture emerges. 5 In the early years, the dataset shows that biomedical and social sciences dominate the scholarly work on digital ethics. Both fields show a marked growth in publications on digital ethics. The field of computer science research starts out at a very low number of publications in the early years, but shows a much faster increase in the number of publications compared to the other fields. So, in 2016, many more publications on digital ethics are from this field than from any other field. The shift from ethical/juridical to technical issues would thus be explained as an expression of the growth of the number of publications in computer science. 6 It might also be that the growth of publications on digital ethics is an effect of the growth in scientific publishing in general. This growth, although very hard to know exactly is estimated to be around 8-9% per year in recent years (Bornmann and Multz 2015). Similarly, the relative growth of digital ethics in computer science could be an effect of the fast growth of that field in general. In order to check for this we also looked at the normalized growth in the number of publications.
Doing so reveals the following: While the number of scientific publications in general grew with a factor 2, the number of publications on digital ethics grew with a factor 10. So, if we adjust for the general growth of scientific publications, we see that the number of publications in digital ethics grew 5 times faster than the number of publications in general.
The data also shows that digital ethics in computer science has increased with a factor 9.5 relative to the growth of computer science in general. This validates the thesis that computer science is increasingly the locus of questions concerning digital ethics. This fact is borne out by Fig. 8, which shows the relative percentages of the different fields, showing a marked growth in the share of computer science.

Discussion
From the static and dynamic analyses of the term map, we derive some general findings which we will discuss in this section. The findings are a result of an analysis at the level provided to us by the scientometric method, which by nature is at a level of compounded statistics and word counts. Ultimately, our findings have to correspond with what is going on at the level of the individual publications. In analysing our material, we have at all times switched back and forth between looking at the term map and going into the (abstracts of) publications in our dataset, in order to corroborate the findings at the term map level with the set of underlying publications. In the discussion of our findings we will therefore refer to some of the publications in the dataset.
We start out by noting that some topics were found to be scarcely present in the dataset. 'Power' and 'economics' for example seem to us central notions for a proper understanding of the ways digital technologies are shaping our world. While the influence of economic thinking has grown in many fields, it seems to not yet have fully caught up in Fig. 8 Percentage of publications per main field 6 Another cause for the marked increase in the number of publications from the field of computer science may be the decision of many venues for publishing in this field started demanding that an ethics section be included in publications. These ethics sections are by themselves not causing publications to be included in our dataset, as our method only looks at the abstracts of publications. This decision to mandate ethics sections should be seen as an effect of a rising concern to ethical issues in that field. And the added focus on ethical issues created by the decision may have driven more computer scientists to do more research that focus on questions of digital ethics. the domain of digital ethics. Another noteworthy absence is that of the Edward Snowden NSA affair. 7 This is a marked difference from discussion in the public debate, in which whistle blowers are hotly debated. The same question could be asked about concepts such as the 'filter bubble', which has become very influential over the past few years in the public debate. It seems that in the scientific realm the issue is barely discussed, or at least not in colloquial terms.

Digital ethics is being discussed across different scientific disciplines
As discussed before, the term map shows distinct clusters of terms. And the separation of the clusters indicates that diverse ethical aspects of digital domain are being discussed across different scientific disciplines. A closer look at the publications in the dataset correspond with this image. Computer scientists are discussing technical issues for safeguarding privacy and security, legal scholars are discussing the right to be forgotten and fundamental differences between the European and US legal frameworks, and medical specialists discuss patient autonomy and informed consent in the medical domain. All these discussions are part of the field of digital ethics. So if we ask what is being discussed in digital ethics and who is doing the discussing we need to take account of work being done in all these disciplines.
We should note that some values are discussed widely in one field, while scarcely discussed in another. To give an example, we see that the terms dignity, autonomy and informed consent, are most used in the fields of medical ethics and much less in computer science. Perhaps this is no surprise, as the field of medical ethics has a much longer history of dealing with privacy and related issues than computer science. If we want to understand the notions autonomy and dignity in their relationship to the digital, it may be that medical science is best equipped to help us. Some other values, like privacy and security are discussed across all disciplines.
Yet it is also no simple matter to carry a term over from one scientific discipline to another. Dignity, for instance, is a term that is rarely used in a computer science publications and even when it is used in computer science it is often used in relation to questions regarding healthcare or in very abstract discussions. We think this can be explained by the fact that a term from a legal/ethical human rights discourse cannot be simply carried over to another discipline. A concept such as dignity can have a different meaning in another discipline like computer science.
We found that different disciplines are talking about various questions of digital ethics, but it still remains to be seen to what extent they are talking with each other. When a computer scientist, a medical scientist, a lawyer, or an ethicist are researching privacy issues, are they talking about the same thing? The gap between the different clusters suggest that they are not. In order to give an example of this let us have a closer look at some of the publications in the dataset.
In Hajian et al. (2015) and many other publications in the dataset the important question of discrimination and privacy preservation in data-mining applications is discussed. The analysis discusses different data sanitization methods that result to a certain level of k-anonymity. The analysis is deeply technical and the definitions of privacy and discrimination are technologically defined. This is typical for the way that privacy is being discussed in computer science. But if we look at the work of academics who self-identify as ethicists, published in the EIT and ICS journals, we see that these technologically defined interpretations of ethical considerations are hardly discussed at all. So Helen Nissenbaum's (2001) 8 call to action to ethicists "to pay painstaking attention to cases, one at the time from the bottom up" seems as relevant now as it was in 2001.

Mind the gap
This is an important point. Especially at a time where there seems to be a political will to actively work towards solutions that can help to reap the benefits to society of increased use of personal data, while at the same time protecting important human values. Achieving this goal will be hard when there remains a gap between the abstract ethical concepts developed by ethicists on the one hand and the practical implementations developed in the applied sciences on the other. There are developments in the field of ethics and computer science that indicate a move in what we consider to be a promising direction. In ethics, 'value-sensitive design' (Van den Hoven 2013; Friedman et al. 2013) is gaining traction and calling for a closer integration of value requirements into design processes. While in computer science the growth of design methods such as agile software development signals a closer integration of different stakeholders-and their values-into developments of digital systems, it seems to us that the gap is not yet receiving the attention that it deserves. As long as core values such as human dignity are not translated, applied and specified at a concrete level where they can be used as functional requirements for the systems that are being built, we cannot expect them to become part of these systems in any meaningful way.

Further research
There is ongoing scientometric research revolving around the question of the interdisciplinarity of certain scientific fields, i.e., to what extent these fields feature cooperation between different scientific disciplines. Our results seem to indicate that (applied) ethics is still rather far removed from the applied sciences on this subject. Newly developed scientometric measures such as the integration score (Porter and Rafols 2009) could be used to measure to which extent digital ethics is developing as an interdisciplinary research field.
We have found that in some instances it seems that topics that become very important in the public debate, such as filter bubble and Edward Snowden revelations, do not spill over proportionally to the academic realm. A similar scientometric analysis could be done on other types of texts such as newspaper articles, to get a similar view of the field writings in digital ethics outside of academia.

Conclusion
The concerns regarding digital ethics and protection of core values manifests itself not only in the use of ethical terms, but more significantly and increasingly in terms of technical measures of guaranteeing and realizing fundamental moral considerations, such as the privacy of individuals by means of encryption and access.
An increasing part of the ethical discussion has migrated to specialised fields of computer science, health and life sciences and law. It is in these branches of scholarship that abstract ethical values materialize and are meaningfully applied and transformed into ethical praxis. At the same time, we see that certain ethical concerns are under-represented in certain fields (such as discussions on human dignity and discrimination in computer science). If we believe in the importance of these core values, it is necessary to find out why these and other moral concepts are missing in certain disciplinary fields. Politicians, policy makers, regulators and ethicists who aim for a comprehensive and balanced view need to be aware of the features of the digital ethics terrain that our cartography has mapped out.