Keywords

Key Take-Aways

  • Mixed methods research. This study employs a mixed-methods research design that includes interviews with, and a survey of, business data ethics managers and the think tanks, attorneys, and consultants who advise them.

  • Five main areas of inquiry. The interview protocol and survey instrument focused on five main areas of inquiry: (1) the risks that business use of big data can create; (2) why businesses seek to mitigate the risks of big data when the law does not yet require them to do so; (3) how businesses go about managing these risks, including their use of substantive frameworks, management structures and processes, and technological solutions; (4) business attempts to use advanced analytics and AI for the social good; and (5) the broader regulatory and legal environment within which data ethics management operates.

  • Snowball sampling. The researchers used a snowball sampling method to identify and interview twenty-three subjects. They distributed the survey through five industry-oriented trade associations and think tanks and received 51 responses, with 24 of them being fully complete.

  • Skewed sample. The survey sample skewed towards larger organizations, and focused on companies in the information technology, financial services, communications, industrial, and healthcare sectors.

  • State of the art, not best practice. The research focused on why businesses were pursuing data ethics management, and what they were doing in this regard. It did not evaluate these activities and so does not identify best practices.

We employ a mixed-methods research design, including in-depth interviews with, and a survey of, business data ethics managers and the think tanks, attorneys, and consultants who advise them. The interview component served as an open-ended way to map the terrain of the contestation around big data ethics and inform the construction of a meaningful survey instrument. The survey component sought to synthesize insights from the interviews and so to understand how systematically to assess business uses of big data, the risks, and the specific policies and processes intended to address those risks. We treat the research components as complementary, collectively contributing unique dimensions to our empirical investigation of business practices for addressing the risks of advanced analytics and AI. Targeting higher-level executives for our interviews and survey gives us the view of business practices from the top, but precludes us from assessing the coupling between high-level policies and the actual daily work of engineers and employees on the ground (Waldman 2018). Likewise, our survey sampling methodology, discussed more below, lends itself to selection bias. The Ohio State University Institutional Review Board deemed the research exempt from further review.

2.1 Interviews

For the interviews, we used a purposive sampling method in which we leveraged the research team’s social networks to identify individuals prominently engaged in managing the risks that business use of advanced analytics and AI can create, and the professionals (lawyers, consultants, think tanks, thought leaders) who advise them (Singleton and Straits 2010). We then snowball sampled by asking interviewees to identify additional individuals who were highly knowledgeable about the business practice of big data ethics management or were actively grappling with big data ethics in their positions. The snowball method facilitated access to new interviewees. This proved to be important as reaching high-level managers is particularly challenging (Biernacki and Waldorf 1981; Cycyota and Harrison 2002).

The interview protocol was developed to probe respondents broadly about big data ethics in the business context. The protocol had four major sections: (1) risks of big data, (2) motivations and goals of mitigating the risks of big data, (3) management processes, substantive frameworks, and technological solutions for mitigating these risks, (4) whether businesses seek to use advanced analytics and AI for the social good in ways that do not directly impact the bottom line; and (5) the broader regulatory and legal environment within which data ethics management takes place. The protocol was adjusted to account for differences between types of organizations for which the interviewees worked: businesses that use advanced analytics and AI, law firms, think tanks, and consulting firms. When interviewing representatives of business organizations that used advanced analytics and AI, our interviews probed respondents on the structure, design, and perceived efficacy of internal processes. Interviews were conducted primarily over the phone (with one interview conducted in person) from September 2017 to March 2019. Interviews ranged from 60 to 160 min with an average of 75 min. We transcribed interviews to facilitate coding and analysis which involved descriptive coding according to the sections of interviews followed by close readings to identify prevalent themes across interviews.

Overall, we interviewed 23 respondents. The industries represented in the interview sample include telecommunications, information technology, social media, pharmaceuticals, and insurance. Both publicly traded and private companies are represented in the sample. The interviewee’s titles included, at various levels of seniority: Privacy Officer, Data Ethics Officer, Counsel, Public Policy Executive, Compliance Executive, and Partner.

2.2 Survey

We paired the interview study with an online survey that we designed and conducted using the Qualtrics platform. As with the interview component, we targeted higher-level management. Accessing this type of population with large-scale probability sampling methods is notoriously difficult (Cycyota and Harrison 2002). Survey research of corporate management has indicated that an important way to increase response rates is to have the survey delivered through legitimated or trusted organizations. As a result, we opted for a convenience sampling approach that leveraged the social networks of corporations through membership in industry trade associations and industry-funded think tanks.

Specifically, five industry-oriented trade organizations and think tanks engaged with issues of data and privacy sent our survey to their member companies.Footnote 1 Given that this targeted sample selected into membership in organizations engaged with privacy and data accountability, our survey results likely provide a more optimistic view of current corporate practices than a larger or more random sample would have offered.. We provided the think tanks and trade organizations with email language/script and survey links for their members and made sure that companies that belonged to more than one of these organizations received only one survey link. The think tanks and trade organizations agreed to send reminder emails one week after the initial survey was sent. Data was collected from November 2019 to January 2020.

In total, our survey was sent to 246 companies. We received a total of 51 responses with 24 fully completed yielding a response rate of approximately 20% for all surveys and approximately 10% for fully completed surveys. This response rate is fairly consistent with other surveys of corporate managers (Cycyota and Harrison 2006). Given our targeted sampling strategy and exploratory nature of the study, we are unable to make strong claims. However, we can identify cleavages of variation and associations that will serve as an important entry point for future research. In particular, our findings from this targeted survey provide evidence that a much larger sampling of corporate big data ethics is necessary and would likely yield valuable insights for scholars, policymakers and business organizations.

We derive our survey results, presented below, from our “core sample” of 31 respondents who answered our survey question about the policies their company has in place to address the risks of advanced analytics and AI (22 of these respondents fully completed the survey). Figures 2.1 and 2.2 display the variation in company size in our sample by number of employees and revenue, respectively. We expected that our sample would be comprised of larger companies on average given the membership of the organizations through which we developed our sample, and this indeed turned out to be the case. The largest proportion of the sample, approximately 30%, have more than 40,000 employees or more than $15 billion in revenue. While skewed towards large companies, almost 50% of the sample has fewer than 10,000 employees. As it relates to industries, Table 2.1 shows that most of our corporate respondents worked for information technology companies, with the remainder working for financial services, communications, industrial and healthcare companies, with healthcare composing the smallest proportion of the core sample.

Fig. 2.1
A bar graph plots the core sample percentage versus the number of employees. All values are estimated. Fewer than 250, 3%. 250 to 499, 9.9%. 500 to 999, 7%. 1000 to 9999, 29%. 10000 to 19999, 7%. 20000 to 29999, 3%. 30000 to 39999, 9.9%. More than 40000, 33%.

Distribution of sample company size according to total number of employees

Fig. 2.2
A bar graph plots the core sample percentage versus revenue. All values are estimated. Lesser than 10 million, 7%. 10 to 49 million, 3.5%. 50 to 99 million, 7%. 100 to 499 million, 14%. 1 to 4.99 billion, 20.5%. 5 to 14.99 billion, 14%. 15 billion or more, 34.5%.

Distribution of sample company size according to 2018 revenue

Table 2.1 Survey respondent industry

In subsequent chapters, we will draw from both our survey and interviews to illustrate the array of different concerns, rationales, policies, and systems that are central to data ethics management in these types of corporations.