Ethics in Health Data Science

New technologies offer great opportunities to improve and expand the provision of health information and services worldwide. Digital health interventions (WHO 2018) include those designed for individuals, such as personalized health information delivered to their mobile phones; health care providers, such as decision support tools; and health systems, which include the digitization of health records. The other chapters outline the scope and potential for digital advances to impact global health outcomes. This chapter focuses on the responsibilities that accompany the adoption of these technologies. Speciﬁcally, we examine the ethical considerations to leveraging technology for global health, with a focus on resource-poor regions. Our paramount ethical consideration centers on putting the community and end user needs at the center of the approach. Using the concerned community as the starting point, all other ethical considerations follow, from safeguarding the rights of those impacted, which includes data privacy, security, and consent, to assessing unintended consequences.

• Establish a critical understanding about the importance of including the community and end user needs throughout the development process for digital health interventions.

Why Do We Need to Talk About Ethics?
An unsystematic literature review on data science and global health reveals how little the topic of ethics is covered. Furthermore, a scoping review of ethics of big data health research found significant ethical lapses in the areas of privacy, confidentiality, informed consent, fairness, justice, trust and data ownership (Ienca et al. 2018). This is a concern, given the global health sector is awash with digital solutions that have often, at best, failed to be adopted by the intended users, scaled by national health systems or delivered measurable outcomes. At worst, they have ignored the medical profession's principle to do no harm. Even in one of the best hospitals, mistakes can be made by poorly designed tech solutions, as in one example leading to an overdosing of patients (Wachter 2015). The widespread penetration of mobile phones and social media influence, along with techno-solutionism mindsets, has led to pressure on governmental and nongovernmental organizations to experiment with new technologies. Many lack the appropriate in-house skills to deliver digital-driven solutions and look to tech partners for support. The result is global health actors designing and developing digital initiatives without the appropriate training and experience in ethics as it relates to data science and global health. The burgeoning field of digital health has expanded the number of people and institutions involved in creating global health solutions, such as tech startups selling their proprietary solution or data analytics companies offering new skills to the public health domain.
There is much promise in the momentum around the tech sector building global health solutions, but there is often a lack of healthcare domain and localization expertise that is required to develop solutions that will serve and not harm a community. Software engineers and data scientists are trained to build technology and interrogate data sets, but are not trained in understanding anthropology, social science, history, public health, and other fields that would help with design of a global health solution. The results can mean the development of tools that can exacerbate inequalities, such as tech that can only detect Alzheimer's (Fraser et al. 2016) in native English speakers.
The lack of consistently applied global standards around ethical concerns in digital health has long been a concern; however, there are some encouraging recent developments. In April 2019, the World Health Organization (WHO 2019) released its first guideline on digital health interventions and created a Department of Digital Health to support its role in assessing emerging digital technologies and helping member states regulate them. These WHO initiatives follow others, such as the Principles of Donor Alignment for Digital Health, which emphasizes that donors should align their funding to national health strategies, and the broader Principles for Digital Development, which aim to set a standard for how to use technology in the development context. These principles are not compulsory, rather, they are meant to offer guidance to help practitioners succeed in applying digital technologies to health and development programs. These global standard setting initiatives are laudable and necessary. However, it will take time for the guidelines and principles to embed into actual thinking and practice.

Data Privacy and Protection
The proliferation of digital initiatives in global health is accompanied by data. All digital health activities collect data, which exist on one or more platforms, from government servers, mobile networks to social media. Safeguarding user privacy must be an essential part of any intervention. There are principles for this as well, such as the United Nations (UN) Global Pulse Data Privacy and Data Protection Principles and The European Union's (EU) General Data Protection Regulation (GDPR), both adopted in 2018. In contrast to UN principles and other global guidelines, GDPR is compulsory, with significant fines levied at those found to be in non-compliance. GDPR is a welcome piece of legislation that gives individuals more control over their personal data; however, it is designed to protect the personal data and privacy of EU citizens for transactions that occur within EU member states, and therefore is limited in geographic scope. It is noteworthy that GDPR is increasingly being seen as a gold standard for other countries to follow. It mandates that the platform or content provider must always be transparent when dealing with personal data and provide people with details about how their data is processed. This means telling people who they are, what personal data they are collecting, what they will do with it and why, how long the data will be kept, and who it will be shared with. Data must be used only for the purpose it was collected and if it is used for a new purpose, and this includes for previously unstated research inquires, additional permissions may need to be gathered.
Those engaged in providing digital health services are bound by the stipulated regulations set by the country where the activity is happening, though the breadth and enforcement of such regulations vary considerably from country to country. The State of Digital Health 2019 reports that 18 of the 22 countries they reviewed found that they have laws relating to data security (storage, transmission, use) and data protection (governing ownership, access and sharing of individually identifiable digital health data); however, only four countries consistently enforce the data security law, and only two countries consistently enforce the privacy law (Mecheal and Edelman 2019). They also found that the majority of countries lack protocols for regulating and certifying digital health devices and services. Only four of the 22 countries reported having approved protocols and policies "governing the clinical and patient care use of connected medical devices and digital health services (e.g. telemedicine, applications), particularly in relation to safety, data integrity, and quality of care." Global health is no stranger to rigorous data collection protocols. It is standard practice for biomedical research to undergo stringent institutional review boards (IRB) to protect the welfare of human research subjects participating in research activities. Yet, there is a lot of research and data collection in global health that does not fall under a process like this, because it is not biomedical in nature (e.g. research around health attitudes and behaviors) or is not under the auspices of an academic or similar organization where IRB is embedded into practice. Examples include work conducted by governmental and non-governmental organizations and private companies, which have varying degrees of institutional research protocols and data protection standards, ranging from the highest standard to none at all.
There are many global health interventions that use social media or mobile devices as platforms to reach and engage populations for health promotion initiatives (see mhealthknowledge.org). How many of these initiatives account for the fact that they have little or no control over what data these platforms are collecting about their users and for what purposes? The vast majority of social media platforms rely on a business model where they are free to use, and in exchange, they collect data on their users, which they monetize. This monetization mostly comes in the form of selling advertising space personalized to the user or selling data to third parties. Most users will be largely unclear about the depth of data that is being extracted about them. The mobile health field itself is huge, and there is still too little consideration to the ethical considerations concerning matters such as how the mobile network operators store and share data. Efforts and promises to anonymize data are not immune to data leaks, hacks, or government-mandated requests to hand over data. This leaves people, including vulnerable populations, having their personal details exposed to unknown third parties. For example, in places where conditions such as HIV are stigmatizing, this could lead to public shame, discrimination, denial of services and violence.
In the United States, concerns about unfettered data collection abound. For example, most health and wellness apps do not fall under the regulatory authority of the Food and Drug Administration. Apps require FDA approval if they are considered medical devices, such as those that monitor, analyze, diagnose or treat specific medical conditions (FDA 2015). This means that the majority of apps, such as those considered lifestyle, diet or fitness trackers, are not regulated. They do however collect data about their users.
More guidance and tools are needed to help global health professionals understand how platforms, as well as partners such as governments and research agencies, use data. Unfortunately, matters of data protection are not fixed and solved by one-off training. This is because regulations, and company policies and practices on data capture, change on a constant basis. One example is the announcement in early 2019 by Facebook about its plans to merge its messaging applications Messenger, Instagram and WhatsApp, and introduce end to end encryption to all the applications. These plans raise new security and privacy concerns with the information people share within these platforms around the world. Specific guidance is needed on negotiating contractual arrangements with platform providers and technology and research partners, conducting privacy impact assessments, and creating operational tools such as data management plans and information asset registers. It is critical for digital health developers to invest in formal protocols and staff expertise in order to avoid risks to health institutions and the people they serve.

Consent, Clarity and Consequences
This brings us to the concept of informed consent and clarity. The Data Science and Ethics e-book (Patil and Mason 2018) tells us that users need to have an agreement about what data is being collected and how it is being used. In order for them to consent, they need clarity on what they are consenting to. People need to have the right to consent to the data collected about them, and the experiments performed on them. This concept also needs to be clear, not hidden in some terms somewhere, or in a place where they simply provide a signature because they need the care. The FRIES framework is an example of a high standard of consent, which stands for freely given, reversible, informed, enthusiastic and specific. Taking a justice oriented design approach is especially important for resource poor regions that have a history of exploitation by external actors and involve data subjects who may have limitations due to language translation, literacy or socio-cultural context issues.
Yet, are traditional clinical research protocols fit for purpose in a digital age? Ienca et al. suggest that informed consent and other ethical requirements may be ill suited for big data research, pointing to the example of obtaining publically available data on social media. This is pointedly relevant given the proliferation of health misinformation on social network platforms (Gyenes and Mina 2018) and private apps, and private chat apps used for medical personnel to communicate with patients (Benedictis et al. 2019). Most people who post personal health stories and opinions on social media will do so without knowing that they could be the subject of future research.
Data science is about collecting and using data to make insights. This data then get acted upon and the decisions impact people's lives. It is therefore essential that global health actors of all kinds consider the consequences of the digital health tools that we build. Whose data is being collected and what decisions are being made based on this data? Machine learning algorithms can model the progression of cancerous tumors. Doctors then interpret the data and make treatment decisions. Are the recommendations skewed towards a particular sub-population based on the dataset that was used to train the system? If a research hospital used a certain dataset for understanding tumors, are the results trained for a particular community, and will tumors from other communities be misdiagnosed?
Ministries of Health aim to achieve health for all, but the data on which they base policies and programs often do not account for those most marginalized. For example, people with intellectual disabilities have been found to be left out of censuses and public surveys, and they have poorer health status as a result (Special Olympics Health 2018). We now know that many cars have been designed with car crash test dummies built to the body sizes of men, thus, safety features were designed for the typical male body, potentially resulting in more harm to non-male bodies in a car crash. Teams need to conduct assessments of potential impacts and pay special attention to issues of equity and exclusion.

Putting the Community at the Center
To mitigate some of the potential harm when using information technology and machine learning in global digital health, we must put the community, not just the code and data, at the center of the development cycle. It is not enough to build technology first, and then deploy it to see how it can help a community. We first need to have a deep understanding of the community. This means involving the concerned individuals in the design process in a meaningful way, right from the beginning. Merely designing on behalf of the community can lead to digital health services that can propagate the problems the intervention may be trying to solve and perpetuate bias in data, algorithms, models and analysis. These concerns are exacerbated by the nature of the global health and development sector, which too often involves external actors designing digital outputs that do not have a firm understanding of the needs of the community they are intended to serve.
The aforementioned Principles for Digital Development puts designing for the user top of its list of principles. Everything needs to be grounded in specific community and context. These principles, and others, refer to human centered design approaches guiding the development of the technology. Human centered design starts with the people we are designing for, and ends with the solutions that fit the needs of the people within their communities, responding to a strong understanding of what shapes their decisions and behavior and what is relevant for their health system context. Digital health interventions need to deeply understand how providers, patients, caretakers, administrators, and all in the health ecosystem interact with the technology. This will also help technology developers understand which groups of people are missing from the design of the system and anticipate unintended consequences.
A major challenge to this ethical consideration is that there is a fundamental lack of accountability to the people global health actors seek to serve. Private foundations, for example, that fund billions of dollars in global health interventions, are institutionally accountable to their board of directors only, and set and enforce their own ethical standards. Similarly, UN agencies are beholden to their member states, NGOs to their boards, and private research agencies and tech companies to their owners and investors. Considering the impact, intended or unintended, of an intervention on a target population is best practice, but there is no official accountability to that population by major global health actors. The exception to this is the concerned government where the data collection or digital health intervention is being implemented. In a democracy, these governments are accountable to their citizens. This is why it is recommended that in most cases, it is appropriate to work alongside the national government when introducing digital health initiatives (along with other benefits such as enabling interoperability and priority alignment) (See Pepper et al. for an example, 2019). Increasingly nation-states (since the Paris declaration 2005) are dictating how donor funds are used, guided by government priorities.
Certainly many institutions aim for accountability in their practice. For example, USAID's recently published Considerations for Using Data Responsibly (USAID 2019) reports puts the 'data subjects', the people from whom data are collected, at the top of their list of who they are responsible for, followed by themselves and the broader development community. Ultimately, the onus of applying ethical standards in global health data science is on the institution carrying out the activity.
No intervention should be explored without proper consideration of ethicsspecifically, understanding the impact any intervention can have on the individuals and communities interacting with these technologies. This growing field requires coordinated, interdisciplinary teams, blending skills of data science, public health and policy, working together to do no harm and safeguard those most vulnerable.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.