Introduction

With the ever-increasing focus on artificial intelligence, exponential growth of generated data from private, public, and government sources (IDC & Statista, 2023) and the increase in the number of data generating devices, such as smart devices, wearable devices, and the everyday used and seen sensor devices around us (Greaton, 2019). Understanding data governance, privacy, and ethics surrounding the availability and use of data is more important than ever. Every day, new data breaches on individuals and organizations become known. Every day, new uses for data are envisioned and the ethical considerations around collection, storage, and use of these data are yet to be clearly thought out. The need for data governance to protect privacy and ensuring the ethical use of data is clear and there are many challenges to effectively define and implement effective data governance policies. And these must be implemented through legal and regulatory frameworks to be effective and enforceable. In this chapter, we will examine these topics and more related to data governance, privacy, and ethics.

Looking at data governance, privacy, and ethics from a data science point of view, understanding these is critically important. Governance is the purview of organizations in ensuring and insuring their data in all aspects. Privacy is the purview of the individual about their data and the data about them. Ethics is the purview of both the individual and organizations. Data privacy and ethics need to be integrated into data governance and be a real part of the organization’s being. It is important that data science policies, practices, organizations, and people ensure the moral behavior of all to ensure ethical use of data and to ensure privacy.

The Importance of Data Governance, Privacy, and Ethics in Today’s World

Much has been written about data governance, privacy, and ethics over the past several years. In fact, those terms have been used interchangeably in the popular press. However, they are not interchangeable and beginning with definitions of each will be helpful.

Data Governance. “Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.” (Gartner, 2023)

Data Privacy. “Data privacy is focused on the use and governance of personal data—things like putting policies in place to ensure that consumers’ personal information is being collected, shared and used in appropriate ways.” (IAAP, 2023)

Data Ethics. “…the norms of behavior that promote appropriate judgments and accountability when acquiring, managing, or using data, with the goals of protecting civil liberties, minimizing risks to individuals and society, and maximizing the public good.” (U.S. General Services Administration, n.d.)

While they are different, they are related and the three together are what provide “good hygiene” for data stewardship.

Data governance is rarely seen by the public or outside organizations. In fact, many do not even know it exists and, as a result, even organizations and those in organizations who should be designing and implementing data governance solutions have either yet to start or are at the very beginning of their journey. For example, a systematic literature review of the role of ethics in big data (Roche & Jamal, 2021) briefly touches on data governance in the context of data ethics: “The question of using data ethically is being retrospectively applied to big data already in use, and is often considered alongside other data issues such as data governance, cyber security and data privacy.”

The emergence of the COVID-19 pandemic is widely recognized as an international crisis and in “crisis mode” decisions are made that are oftentimes expedient and the questions of data governance, data privacy, and data ethics shift from “are we doing the right thing” to “are we compliant.” As Yallop and Aliasghar (2020) observe from Yallop and Seraphin (2020), “…data governance frameworks need to expand from ‘solely compliance-based frameworks to inclusion of privacy and ethics solutions for an equitable and ethical exchange of data and information.’”

At this point, it would be reasonable to ask if these are addressed in the General Data Protection Regulation (GDPR). In fact, it does not explicitly. It does have several sections on processing personal data and the responsibility of those who “control” data, but it does not specifically address data governance. There are emerging internationally recognized guidelines for data governance such as ISO 38500 - International Standard for Corporate Governance of IT | IT Governance USA and ISO/IEC TS 38505-3:2021—Information technology—Governance of data—Part 3: Guidelines for data classification. There are even certifications available (Data Governance Certification: A Guide to the Top Certifications in 2023 (thedatagovernor.info)) from the Data Management Association (DAMA), the Association for Information and Image Management (AIIM), Project Management Institute (PMI), and the Data Governance Institute (DGI). Many of these are new and not widely subscribed, though this can be expected to change over the next 5–10 years as the realization of the importance of doing so grows.

For a concrete example of what, how, and why of data governance, consider a multinational corporation that collects data on customers who purchase their varied products in varied locations. Given language differences, currency differences, or the different practices within their locations, there are numerous occasions for inconsistency in databases. Processing of financial data may be compromised by inconsistency in noting currency values. If pounds and dollars are confused, for instance, analysts will inevitably draw misleading economic conclusions. If there are different names for the same products, owing to simple language differences from one store location to the other, attempts to draw insights about those products will be more difficult. Attempts to aggregate data may also be complicated by varied data collection methods. If one store collects information about customers or transactions in a particular kind of format, while another store collects slightly different information about customers or transactions in still another kind of format, it will be difficult to combine that data and use it as evidence for better business decision-making. What data governance is partially about is “deciding on how to decide” about concerns like these in the collection and use of data.

But, of course, data governance is not only about the efficient business use and construction of databases. It is also about the oversight of the ethical dimensions of data use. What are examples of these dimensions and why is data governance concerned with them? Consider, again, the multinational corporation. Its analysts have mined its exceptionally large database and discovered various conclusions about its customers which are not readily evident from basic customer information. Suppose the analysts have discovered, based on purchase information, that it is possible to predict the credit score of customers to a high degree of accuracy. Clearly, this predictive ability raises questions about whether it is ethically permissible to make these predictions, use them for marketing purposes, share them with other businesses, and especially sell the predictive information to other businesses. Customers may not even have consented to the collection of data on which the inference depends, much less consented to the collection/storage of that credit information. Note also that there isn’t merely the loss of control over sensitive information that is at stake. The customers also stand to lose autonomy over their decision-making and how they conduct their personal relationships. The potential for others to know their credit score, not only financial institutions but also their friends, will alter the customers’ range of behavioral options and the landscape of those relationships. These changes are of self-evident moral significance. They are the reasons why data governance is not only about rules that affect the economic or structural properties of data gathering and analysis.

The how of data governance is the set of rules and accepted practices that oversee that gathering and analysis. In the context of ethics, these rules and practices are “data privacy.” Extending the example, the corporation will develop policies that balance the concerns of the relevant stakeholders. If there were some way that such credit score prediction could be simultaneously profitable for the corporation and beneficial to their customers—if, for instance, they were better able to offer useful financial services to customers by predicting those scores—then rules would spring up about how to store the credit data safely, whether and how it can be shared with others, how to explain to customers what exact information is recorded, how it is used, and perhaps also whether they would like to opt out of its collection.

The Impact of Data Breaches on Individuals and Organizations

Data breaches have been happening long before the advent of computers and data stores. For example, simple reading or copying the carbon copies of credit and debit card slips was common early on. Those impacts on individuals were originally limited in the United States by law to $50 per misuse by someone else of the credit or debit card. Eventually, for competitive reasons, even the $50 was waived if the individual reported the breach in a timely manner. That, however, does not reduce the impact on the credit or debit card issuer or the organization from whom goods and/or services were purchased.

The National Association of Attorneys General (National Association of Attorneys General, n.d.), have defined a data breach as

…the unlawful and unauthorized acquisition of personal information that compromises security, confidentiality, or integrity of personal information. What is considered personal information depends on state law but typically includes an individual’s first name (or initial) and last name plus one or more of the following:

  • Social Security Number

  • Driver’s license number or state-issued ID card number

  • Account number, credit, or debit card number, combined with any security code, access code, PIN or password needed to access an account

Additional categories may include:

  • Medical history or health information

  • Biometric information

  • Email address and password

  • Tax ID number.

The “what” is important, and so is understanding the “how.” A recent security foundation report (IFF Lab, n.d.) identified seven major causes of data breaches:

  1. 1.

    Human Error. These include sending sensitive information to an incorrect email, leaving your computer or smart device unattended or unlocked, or leaving paperwork with confidential information open and available to others. This is one of the more common causes of data breach.

  2. 2.

    Physical Theft/Loss of Device. These are generally either negligence or a well-planned malicious event by others.

  3. 3.

    Phishing. These are malicious links provided on a website or email for users to fall prey to provide sensitive information to the attacker.

  4. 4.

    Stolen/Weak Credentials. Too many users have very simple passwords that are either easy to guess or easy to “crack.”

  5. 5.

    Application/Operating System Vulnerabilities. Many users have pirated software they use which oftentimes have vulnerabilities that enable hackers to capture sensitive information. Also, out-of-date browsers, applications, and operating systems also provide opportunities through vulnerabilities fixed in later and updated releases.

  6. 6.

    Malicious Cyber Attacks. These are among the most damaging to individuals and organizations and include denial of service (DoS) attacks and the use of ransomware.

  7. 7.

    Social Engineering. These are the use of deception to convince individuals to provide confidential or personal information using psychology rather than software programming. Oftentimes, these are designed to entice someone to provide data in exchange for some exciting reward or other offer.

Now that we have looked at the “how,” time to look at the effects.

Recently, IBM Security (IBM Security, 2022) studied 550 organizations impacted by data breaches from March 2021 through March 2022, across 17 countries and 17 different industries, and interviews with 3,600 individuals from organizations impacted by data breaches to understand the costs of data breaches. The impacts are significant:

  • Organizations studied have had more than one data breach: 83%

  • Organizations’ breaches led to increases in prices passed on to customers: 60%

  • Breaches occurred because of a compromise at a business partner: 19%

  • Breaches that were cloud-based: 45%

  • Average cost of a data breach: $4.35M (USD)

  • Average cost of a data breach in the United States: $9.44M (USD)

  • Average cost savings, fully deployed security AI & automation: $3.05M (USD)

  • Average cost of ransomware attack, excluding cost of the ransom: $4.54M (USD)

  • Frequency of breaches caused by stolen or compromised credentials: 19%

  • Average difference in cost with remote vs. local work: $1.00M (USD)

  • Consecutive years the healthcare industry highest cost of breach: 12

A recent example is a data breach at Johns Hopkins (Higher Ed Dive, 2023), where it is alleged that “the health system failed to safeguard patients’ health information and provided insufficient details about stolen data….” The breach was through a third party in a file transfer by a ransomware group and is believed to have affected tens to hundreds of thousands of individuals. In the same week, HCA Healthcare (Health Care Dive, 2023) reported a data breach with personal information about approximately 11 million patients (hospitals and physicians offices) in 20 states. In the same article, federal reports were referenced identifying 385 million patient records exposed through data breaches from 2010 through 2022.

This is the “why” for caring about breaches at an organizational level. But why should one care about such breaches from an individual standpoint? To begin with the obvious, data breaches will cause emotional distress in the individuals whose personal information has been illegitimately accessed. The distress itself is intrinsically morally significant, but it also points to other harms that cause those feelings. For instance, stolen information affects one’s dignity. Depending on the type of information, a person’s reputation can be irrevocably tarnished. If the issue is, once again, about credit scores, one’s stature could easily be diminished in the eyes of those who access those numbers. The same would be true of a social media hack: information pried off these sites could easily affect reputational standing. If a user’s direct messages became known, they could contain information that person would never say to others—off-color jokes, vulgar ideas, vulgar language. If the hack was about health information, the consequences for dignity could run even deeper. Some health conditions could lead to stigmatization and subordination. If it were known that some carried sexually transmitted infections, or were diagnosed with a mental health disorder, they could be treated in such a way that they lose access to resources and opportunities because they are viewed as unworthy of them.

Another source of distress is the loss of freedom. Economically, data breaches can have large effects on individual freedom. If malicious actors obtain access to financial accounts, they can steal funds that leave the owners with far fewer economic options. Likewise, if health care information is breached, thieves might purchase prescription drugs that limit account owner’s ability to care for themselves and others. The impacts on freedom can also extend beyond the economic realm. Suppose an academic institution suffers a data breach involving its records about applicants. Fraudsters can then muddy these records and cause issues about the status of applicants. Now the applicants’ ability to gain admissions, which might represent important lifelong goals, is in jeopardy. Suppose a music studio suffers a data breach that makes available hours of work by its artists. If the work is plagiarized, artists’ dreams of living a particular kind of life, in particular kinds of places, and having a large cultural impact, may be at an end.

Finally, there is the worry that one cannot live one’s conception of the good life without privacy guarantees. When data breaches occur, they can affect relationships. Friendships, for instance, are built on intimacy, trust, and the willingness to confide information that would never be available to others. Data breaches could shatter the ability to maintain these bonds. In the health care situation, imagine that information about a mental health disorder becomes publicly available. The friend of the person who has the disorder might not want others to know that they support their friend with the disorder. This could cause a distancing in that relationship that undermines it altogether. Since relationships of these types relate to people’s conceptions of what makes lives worth living, the distress of being a victim of a data breach is partly about this harm, too.

Of course, there are reasons why organizations should offer better data governance for individuals who commit data breaches, too. Those whose data is stolen can bring civil cases against them and be awarded monetary compensation for the damages they suffer. There are also criminal penalties associated with data breaches that result in sizeable fines and even imprisonment. In the context of health care information, the Health Insurance Portability and Accountability Act (HIPAA) covers the laws and penalties associated with data breaches involving health information. Note that many organizations deal with such information and are not in the health care industry. IT or HR personnel, with no malevolent intentions, might have access to such information and would benefit from better data governance that automates HIPAA compliance.

The Role of Data Governance in Protecting Privacy and Ensuring Ethical Use of Data

Protecting privacy and ensuring ethical use of data is a significant role for data governance. Proper design and use of data governance frameworks include an understanding of data origin, how it has been and/or is being used, and the trustworthiness of the data. Data governance also plays a role in optimizing the value of data and usefulness of data while at the same time protecting privacy and ethical use.

Data governance is also important to ensure the appropriate privacy laws are understood and organizations are in compliance. Privacy laws are put into place to establish expectations to follow and consequences of non-compliance, willful negligence, breaches, and responsibilities (financial and other) and outcomes. These also point to the privacy policies published by organizations in the documentation and typically on their websites.

Data governance plays a role in establishing how data can be used ethically. This includes defining transparency in data collection, data storage, and what constitutes ethical use. Data governance frameworks include checks and balances to ensure the guidelines and controls for ethical use of data are followed. “Data ethics is at the top of the CEO agenda, as negligence may result in severe consequences such as reputational loss or business shutdown. To create an effective policy, companies need a formal program to ensure standards are upheld and evaluated regularly” (Janiszewska-Kiewra et al., 2020).

Figure 5.1 illustrates the “what” and “how” of data governance (adapted from Caserta, n.d.).

Fig. 5.1
A cycle diagram titled, operationalizing data governance includes the following processes numbered 1 to 6. Establish why, establish initial roles, define data domains, document data flows, policies and standards, and data controls.

Data governance (Adapted by authors from Caserta, n.d.)

Lack of adherence to a data governance process—or the complete lack of a data governance process—can have disastrous consequences for any organization and the individuals involved regardless of how they are involved.

Three examples of recent breakdowns or lack of adequate governance related to data breaches or exposure are illustrative:

  1. 1.

    SolarWinds: Third Party Infiltration

  2. 2.

    UpGuard: Misconfigured Software

  3. 3.

    Securitus: Misconfigured Data Access

In the SolarWinds case (Cyolo, 2020), a foreign country-backed hacker group was able to infiltrate the SolarWinds Orion Platform with malware. This is a software platform used by many Fortune 500 companies, the US government, and non-government organizations (NGOs) to monitor their IT systems. The proper use of data governance would have included (and subsequently does include) internal and external authentication of devices in every and all situations where they access systems, applications, and key assets. This is referred to as “zero trust” and as the term implies, there is no trust and constant verification of identity is required and in this specific case, the network structure and assets would not have been visible to the malware. In this case, referring to Fig. 5.1, SolarWinds failed in operationalizing data governance in the areas of establishing initial roles (step 2), documenting data flows (step 4), establishing policies and standards (step 5), and establishing data controls (step 6).

In the UpGuard case (Fung, 2021), major corporations, federal and state governments, and other organizations (47+) were affected by a misconfigured setting in Microsoft Power Apps resulting in access to millions of pieces of personally identifiable data to the public internet for months. Examples of companies affected included American Airlines, the Maryland Department of Health, the New York Transportation Authority, J. B. Hunt, State of Indiana government, Ford Motor Company, and Microsoft itself. The 38+ million records breached included employee information, COVID-19 vaccination, and other related data, including Social Security numbers, phone numbers, date of birth, demographic information, addresses, and various employee events and memberships. This case is an example where checks and balances in understanding default security settings for software are needed. In this case, referring to Fig. 5.1, UpGuard failed in operationalizing data governance in the areas of establishing initial roles (step 2), documenting data roles (step 3), documenting data flows (step 4), establishing policies and standards (step 5), and establishing data controls (step 6).

In the Securitus case (Henriquez, 2022; Safety Detectives, 2023), 1.5 million files containing information on their employees and airport employees in the Latin American aviation industry were accessed. Information breached included photos of ID cards, full names and pictures of employees, occupations and national ID numbers, cameras used, GPS locations of the photos, and time and date of photos. Photos also included data of Securitus clients, airport employees, and other businesses. Misconfigured cloud data storage access allowed a breach of more than 3 TB (terabytes) of data in more than 1 million files. This could result in serious threats to airports, passengers, airlines, and airport personnel. Similarly to the previous example, checks and balances in understanding default security settings for software are needed. Higher security and less or no access by default should be the norm. In this case, referring to Fig. 5.1, Securitus failed in operationalizing data governance in the areas of establishing initial roles (step 2), documenting data flows (step 4), establishing policies and standards (step 5), and establishing data controls (step 6), and documenting data roles (step 3).

An example of data ethics violations is the access by Cambridge Analytica to data mine Facebook data (Criddle, 2020; Federal Trade Commission, 2019a). Facebook (or, as it is now known, “Meta”) was sued by the Federal Trade Commission (FTC) for not protecting users’ personal data as the result of 87 million records of Facebook users being used for advertising during the US Presidential elections. “Facebook, Inc. [was ordered to] pay a record-breaking $5 billion penalty, and submit to new restrictions and a modified corporate structure that will hold the company accountable for the decisions it makes about its users’ privacy, to settle Federal Trade Commission charges that the company violated a 2012 FTC order by deceiving users about their ability to control the privacy of their personal information” (FTC, 2019a). The FTC went on to say that Facebook had a sustained history of using deceptive disclosures and settings to cause users to be lax in their privacy settings, thus making information available to Facebook and third-party applications and that Facebook knew that these data were being used inappropriately. They also, separately, acted against Cambridge Analytica for their harvesting of data (FTC, 2019b).

In the case of Facebook, the FTC imposed new, corporate-level mechanisms to ensure privacy protections. They established an independent privacy committee, composed of members of Facebook’s board of directors, who could only be removed from the board by a supermajority vote. The purpose was to strip CEO Mark Zuckerberg of total control over decisions that affect the privacy of the users of Facebook’s various subcompanies (Instagram, WhatsApp, Oculus VR, e.g.). The orders also require the appointment of compliance officers who are answerable only to that privacy committee and who must submit quarterly certifications demonstrating compliance with FTC privacy rules. Finally, the FTC also enhanced the powers of third-party accessors who, independently of the foregoing measures, test and verify Facebook’s privacy policies and who serve only at the direction of the FTC. So, while the previous examples involve systems-level, or software-level, governance structures, there are corporation-level data governance policies that can also provide further privacy safeguards. The Facebook example illustrates why it is important to establish initial roles (who heads what committees and who answers to whom) and how this establishment partly constitutes what it means for “data governance” to ensure privacy controls. In this case, referring to Fig. 5.1, Facebook failed in all areas operationalizing data governance in the areas of establishing why data governance was necessary (step 2), establishing initial roles (step 2), documenting data roles (step 3), documenting data flows (step 4), establishing policies and standards (step 5), and establishing data controls (step 6).

The Challenges of Implementing Effective Data Governance Policies

Even when there is consensus on the need for data governance, privacy, and ethical use of data, there are still many challenges ahead. There are challenges with identifying with the data, with the people who see themselves as owning the data, lack of agreement on who should lead data governance, and understanding the difference between managing and controlling the data. The greatest challenges are the lack of commitment by those who believe they own the data and lack of executive sponsorship to ensure governance becomes a reality.

In most companies, data have been created by many people, many departments, many divisions, and so on, and over time, so the proliferation of data has resulted in duplication, inconsistencies, uneven quality, and many “roll-your-own” (RYO) “applications” with untold numbers of interdependencies. Alongside this there is considerable selective knowledge of the data, the processes and transformations, and the meaning of results. Arriving at a collaborative agreement to implement effective data governance policies may well be seen by many as their losing control of their data and applications—forgetting, of course, that their organization owns the data and the applications, not the individual. Acceptance of the need and even requirement to create effective data governance can come from agreement that “(t)he primary goal of any data governance program is to deliver against the prioritized business objectives and unlock the value of your data across your organization” (IBM, 2022). From the start, data governance must keep business objectives in mind while creating realistic plans and defining measurable outcomes. These business objectives are far more than solely profit-oriented. They also recognize the custodianship responsibilities that come with their data. Once started, data governance is a journey, not a destination. To be successful, it will need to be implemented incrementally and iteratively with short-term successes in the direction of the long-term goals. Success requires strong executive-level support, cross-functional collaboration, and visible and demonstrable results that positively affect the company, employees, customers, and more.

Data governance must include ethical considerations, which we will delve into next.

The Ethical Considerations Surrounding the Collection, Storage, and Use of Personal Data

Many writers agree about specific harms having to do with privacy—like dignity and freedom, mentioned previously. But the more general work on the philosophy of privacy is much like any other area of philosophy: there is little agreement on what privacy is or what even should fall under the scope of privacy (Auxier et al., 2019; DeCew, 2018). A handy, if controversial, way of thinking about privacy is to divide it into the following four fields. One, privacy seems to be violated whenever your physical security is involuntarily threatened. This is why assault is harmful beyond the physical injuries involved. It is also why being the recipient of unwanted medical procedures would intuitively violate privacy. Second, privacy seems to be violated whenever some intimate, or personal, location is invaded in an unauthorized fashion. This is the sense in which a burglar who enters a house violates privacy, quite apart from the harms of any damage or theft of property. Third, privacy seems to be violated when the autonomy to make intimate, or personal, decisions is interfered with. This sense of privacy is closely aligned with laws about privacy. Many understand abortion laws, for instance, to be a matter of privacy. This is also the reason the recent Dobbs decision was immediately related to laws about same-sex marriage, access to contraception, and interracial marriage: in all cases there is the threat of interference with the freedom to make intimate decisions about how one’s life goes (Goldhill, 2022). Fourth, finally, privacy seems to be violated when control over access to intimate, or personal, information is lost. This is, of course, the sense in which hacks of online databases raise privacy violation concerns, or the sense in which HIPPA seems like sensible privacy protection for health information.

It is tempting to think that data privacy would concern only the last, informational sense of privacy. But it is important to note that data privacy is at least tangentially connected to each of the other senses of privacy, too. Leaks of personal data from some data holder could include home addresses. This makes it possible for those who illicitly access that information to invade actual locations, or, since it would be possible to pinpoint an individual, to threaten their physical security. With respect to the third sense of privacy, data privacy is much more related to it. Much of the legislation in the world (see the next section) is about automated information processing that could treat data subjects unfairly or discriminatorily. Unfairness and discrimination are important because they could limit people’s freedom to make significant personal decisions. Financial institutions, for instance, could use automated means for deciding on loan offers. If the algorithm that makes the decision is trained on biased data, its output could reproduce that bias and have an evident impact on people’s lives (Heaven, 2021; Klein, 2020).

Specifically owing to technological innovations involving data collection and processing, the ethical literature on privacy has changed significantly. In the past, privacy was seen as a matter of individuals being protected from the intrusion of society; it was seen as an individualistic good. This is related to the sense of privacy as resistance to autonomy interference. Whether or not one wants to have an abortion, for instance, is seen as a personal, individual decision which ought not be subject to societal or governmental pressures. But, in the current information age, many theorists have emphasized the societal goods that are fostered by privacy protections (Roessler & Mokrosinska, 2015). Even further, some theorists point out that today’s technological advances render obsolete the old, individual-against-society understanding of privacy (Nissenbaum, 2010). That is, not only are there social goods promoted by privacy, but also that some privacy harms are collective harms.

To explain the former point, consider the simple fact that democracies protect privacy in voting. Without one’s vote being expressed in privacy, it is not clear that a democracy could exist. Pressure being applied to one’s voting is effectively the abandonment of allowing the conscience of people to hold political power. People are subject to many social pressures, from family, friends, business associates, and others. Any of these pressure points could sway voting decisions and effectively undermine democratic political participation. To the extent that democratic forms of government are socially worthwhile, privacy protections in the voting booth function as an extremely important social good. Privacy is not merely a trump card to play against society’s wellbeing; it is a shield for that very wellbeing, too. In big data contexts, there are similar worries. The collection and processing of information about the voting public allows microtargeted political persuasion (Dizikes, 2023). This is different from being in the voting booth watching someone vote, but it is a way of snooping on voters’ behaviors and using the information gained to pressure their political decision-making. In the same way that privacy in the booth is an important social good, so, too, is data privacy.

To explain the latter point, consider that today’s technology for data collection and analysis, particularly given how integrated so much of it is across public and private entities (think of consumer data, for instance), has created an environment where privacy protections exist only at the collective level. Where analysis of a small minority of users’ personal data makes it possible to draw reliable inferences about data values of the majority of users, the lack of privacy concerns for the minority automatically destabilizes the privacy concerns of the majority. Social media users provide an obvious example of the community-based nature of privacy’s value. If some are willing to share enough information, much can be learned about others who are not willing.

From a data governance perspective, there is much to consider. Though the mere collection of data may seem harmless enough, there are ethical worries that arise. The loss of control over sensitive information is unsettling for people in its own right. Think of losing a diary, even if it has thorough security mechanisms protecting it. Or think of how the knowledge that some others are tracking one’s usage of the internet would be repressive, even if that information were never used for other objectives. It is, thus, quite common for data brokers to require the consent of persons before their data is gathered. There is further discussion in the section below, but many data companies also release details about exactly how personal data will be used and for what purposes. Consent and transparency are usually understood to be important barriers for data collection, and proper data governance would, where other business and societal needs do not clash, look to obtain that consent and be transparent to data subjects about what happens with their information.

More obviously, the storage of data is an important feature of data governance. Exacting security measures must be in place, and be routinely tested, to ensure a much greater range of harms do not occur. There are the same informational considerations as with collection; simply knowing that one’s information is stored in a location one cannot control is worrying. But the disclosure of that information to the wrong parties, or its being illicitly accessed, raises many other ethical concerns. There is, of course, the repressive aspect of others accessing personal data. But knowing location data could lead to trespassing harms and as noted earlier, even to physical harms. Think of a hack to an online dating website. Users’ information being stolen could easily lead to stalking and physical confrontations.

Finally, the use of data is another key area of data governance. For still other ethical reasons, there must be guardrails in place to guarantee that data is used in appropriate ways. Consider, again, the possibility of automated decision-making. If the data processor does not fully understand the nature of the automated decision, they will not be able to assess the quality of the output. For financial information, or crime/security matters, such an inability could have severe consequences for data subjects. Requiring transparency in the use of data is thus also ethically important. Many data holders also consider selling (parts of) their databases to other entities. This creates many of the same ethical concerns that have been raised about informational, decisional, and physical forms of privacy, to say nothing of the physical harms that could come from personal information coming to be owned by the wrong party. Good data governance can avoid these problems by, for instance, informing data subjects about automation or third-party sharing of their information, or by allowing those subjects to opt out of various uses of their personal data.

The Legal and Regulatory Frameworks Governing Data Privacy and Ethics

There are, of course, no universal regulatory frameworks that cover data privacy or mandate data governance. But there are nation-to-nation regulations and, within the United States (US), state-to-state regulations.

The most famous piece of law, mentioned above, is the European Union’s General Data Protection Regulation (GDPR). It offers significant protections for data subjects, including rights to transparency, access, rectification, erasure, and even the right to object to certain forms of processing. Outside of compliance with the rights of data subjects, the GDPR also requires data “controllers” and “processors” to protect their data, pseudonymize it, keep records of processing, and even to appoint Data Protection Officers whose job is to ensure compliance with GDPR instructions.

From the perspective of privacy, the rights of data subjects in the GDPR cover plenty of ground. Many understand privacy to concern a “right to be forgotten.” That is, they maintain that an important part of privacy is the ability to have one’s past (decisions, actions, events) remain overlooked. The GDPR specifically includes Article 17 (Art. 17 GDPR, n.d.) that states the right of data subjects to demand that controllers erase their personal data under certain conditions. The GDPR requires a high level of openness about the data being collected and how it is processed. Article 15 (Art. 15 GDPR, n.d.) states that data subjects have a right of access, to know the content of the data, the purposes of its processing, who it has been or will be shared with, how long it will be shared, whether any automated processing of that data will occur, and even access to meaningful information about how the automation works and what conclusions are hoped to be drawn from it. As discussed, many see a dimension of privacy as covering our own personal autonomy—whether one has the freedom to make personal decisions affecting their pursuit of the good life. This right of access supports that dimension of privacy. European data subjects can ensure, for instance, that no automated processing can affect their ability to obtain credit, which could otherwise negatively impact personal decision-making about how those subjects want their lives to go.

From the perspective of data governance, the GDPR also guarantees some measures will be taken. Article 25 (Art. 25 GDPR, n.d.) mentions pseudonymization: controllers must have the technological capacity to transform personal data into a form that cannot be attributed to any natural person. This rule, then, builds in a decision about how data can be processed; it automatically requires a form of data governance. Similarly, Article 37 (Art. 37 GDPR, n.d.) requires the appointment of a Data Protection Officer (DPO): this officer can be from the relevant company’s staff, or be contracted from an external firm, but they are effectively a compliance officer who must have IT and legal competences. This person advises the data processors on the obligations the GDPR imposes and monitors for compliance in the processing operations. The existence of this position, then, is a manifestation of data governance. The DPO enforces certain structures that regulate the flow and use of data within the company (to preserve the privacy of data subjects).

Many other nations have laws modeled after the GDPR: Japan, South Korea, and Brazil, for instance. The United Kingdom, in the aftermath of Brexit, also adopted an identical piece of legislation. The United States, on the other hand, has no such national legal code.

There are specific US federal-level codes, however. One is the Privacy Act of 1974 (Office of Privacy & Civil Liberties, 2020), which covers disclosure of personal data that is held by federal agencies. It has exceptions that are similar to the GDPR and US citizens are even entitled to access that information. But this only concerns information held by federal entities, not by any (private) data holder or processor in the United States.

Also in the United States, there is the Gramm-Leach-Bliley Act (GLBA) (Federal Trade Commission, n.d.), which, while having a broader regulatory scope than privacy preservation, requires that financial institutions with personal data to explain to its customers how their data is being used and provide opt-out information. It also includes a safeguard rule that requires those institutions to have (at least) written plans for protecting that nonpublic, personal data about its customers. These plans, then, function as a kind of mechanism for installing data governance within the companies covered by the Act. Again, the focus here is narrowly on financial information.

There is also HIPPA, which, among other things, imposes rules on disclosure of personal health data gathered by healthcare providers and businesses. As with the other Acts, HIPPA also concerns only this specific realm of data.

Finally, the most general federal-level law is the Children’s Online Privacy Protection Act (COPPA) (Federal Trade Commission, n.d.), which, as the name suggests, covers the collection of personal data of those under 13 years of age. The existence of COPPA is largely the reason many social media companies will not allow people under 13 to have accounts. Instagram, for example, had toyed with the notion of allowing children to have them, but, as of writing, still requires users to be over 13.

There are individual US states that have adopted data privacy legislation. The California Privacy Rights Act (State of California, 2023) functions much like the GDPR with the exception of health and financial information, which is already federally covered by HIPPA and GLBA. It is considered the “strongest” piece of data privacy law in the United States, partially because it is the only one that allows citizens to sue companies for privacy violations. The Act only applies, unsurprisingly, to residents of the state. Virginia instituted its Consumer Data Protection Act (Office of the Attorney General, 2023) only at the beginning of 2023. The Colorado Privacy Act (Colorado Attorney General, n.d.) and Connecticut Data Privacy Act (The Connecticut Privacy Act, n.d.) also followed in July of 2023. Utah’s Consumer Privacy Act (DataGuidance, n.d.) will take effect at the end of 2023. More states are sure to follow until there is US federally based legislation.

Looking to the Future

The trifecta of data governance, data privacy, and data ethics have finally reached the consciousness of nearly everyone. Erroneous reports sent to consumers based on bad data, data breaches, and horrendous misuse of data have been in the news weekly, if not daily. One of the few things that we know we can count on (besides death and taxes) is the continual growth in the amount of data collected and used (though much more is collected than used). And with the abundance of data, and the fact that implementations for the trifecta are at the very early stages, there is much more to come of the good and the bad before we can or should feel comfortable.

The actions in Europe with the GDPR are far ahead of the rest of the world. The United States has states that are developing their own regulations which means that it will not be long before the Federal Government steps in to provide necessary consistency. Even with all the safeguards and best intentions, there is no practical means by which guarantees can be enforced because, at the heart, the proper governance, respect for privacy, and ethical use, are in the hands of human beings. Mistakes, malfeasance, maliciousness, and ignorance are our worst enemies. The best we can hope for is to have the mechanisms, training, and oversight to optimize the business value of data while minimizing those things that can go wrong. The alternative, no data, is not realistic. Therefore, we need to focus on ensuring that the best possible data governance is put into place, enforced, audited, managed, and evolved to ensure the data privacy and ethical use of data we should all expect.