Introduction

More and more parts of our lives are online and digitally stored as the use of online services has become an integral part of everyday life – be it to search for information, to keep up to date, to stay in touch with friends and family, to work, monitor our health, pass the time, and much more. Thereby, every user creates a large amount of data. The advent of the Internet of Things and Big Data offers manifold benefits for consumers, businesses, and society. Health care, mobility, production, and education, to name but a few, can considerably profit from the significant data-driven knowledge gain. Personal data become the “new oil” of the digital economy as it drives innovations, creates knowledge, and allows effectiveness and efficiency in many societally relevant fields (Spiekermann et al. 2015).

At the same time, according to some authors, we have reached the end of privacy (Enserink and Chin 2018). The huge availability of data attracts abusive usage and malpractice. Consumers are worried that they have lost control over their information and report to be highly concerned about their informational privacy (European Commission 2015). In addition, the existing data markets are not transparent: Users are not fully aware of what happens to their data and cannot control data use (Spiekermann and Novotny 2015). A more or less legitimate trade of data in a “shadow market” has evolved (Conger et al. 2013).

This conflict between users’ rights and desires for privacy, on the one hand, and market and businesses’ demands for data, on the other hand, is still unresolved. One vision that provides a win-win situation for both consumers and businesses are privacy-preserving data markets (Gkatzelis et al. 2015; Matzutt et al. 2017; Spiekermann et al. 2015): A market that provides privacy protection and informational self-determination for consumers and data for business and research, where data are handled transparently and in accordance with customers’ consent. And the procedure is open for all of business and research in a viable market that breaks up existing monopolies. When willingly shared, when the information by customers is accessible with a low market entry barrier, higher legal certainty, a high transparency for the parties involved as well as good data quality, then the foundation for societally relevant innovations is laid.

In privacy-preserving data markets, users shall actively and voluntarily share their data. The willingness to share data is dependent on many factors, e.g., privacy concerns, benefits, type of information, and culture (Hallam and Zanella 2017; Markos et al. 2017; Trepte et al. 2017). This willingness to share has been extensively studied (Smith et al. 2011) and some studies were conducted within the context of data markets and included privacy protection in data sharing (Roeber et al. 2015; Ziefle et al. 2016). But we do not yet have the necessary insights into what informational self-determination and privacy means to users in such data markets. How can users’ desires for privacy be satisfied while they share data in the decontextualized online world? And how do users form decisions regarding data sharing in such a new environment? Because of this knowledge gap, we empirically analyze users’ notions of and desires for privacy protection in a privacy-preserving data market.

We use a threefold mixed-method approach to identify and understand users’ notions of privacy in data sharing in detail as well as to quantify users wishes and observe trade-offs in data sharing decisions. Therefore, we start with a qualitative assessment of users’ understanding and wishes for privacy, in which underlying mental models, attitudes, and visions about privacy, data sharing, and the conditions for it are identified. Based on the findings of this first study, users’ motives, barriers, and conditions for privacy in data sharing are then measured and quantified in a second study (survey). The third step incorporates a choice-based conjoint study, a method that is very valuable to determine the importance of and trade-offs between different attributes in realistic decision scenarios which are a combination of conflicting factors. The participants are asked to decide in different simulated scenarios if they would be willing to share data as to understand which factors are decisive in users’ decision making for data sharing. This triangulation of empirical methods enables us to combine the benefits of each method and thus to answer our research question holistically.

The following section gives an overview over the theoretical approach and related work to users’ online privacy behaviors and attitudes before each study’s methodology and results are presented.

Related work: Online privacy and data sharing from the user perspective

As Nissenbaum describes, researchers can agree on one fact: privacy is messy, complex, and hard to define (Nissenbaum 2010). Nonetheless, a description of online privacy from a user perspective is approximated in the next section and related work regarding users’ attitudes and public perceptions of online privacy and data sharing are briefly outlined.

Online privacy and data sharing

As one essential work, Altman defines privacy as “the selective control of access to the self” (Altman 1976, p. 8) that requires an active and dynamic regulatory process. As such, the desired state is not to keep everything secret and to withdraw completely from people but to find the right balance between withdrawal and disclosure in the given context. For that, the control over “when, how, and to what extent information about [oneself] is communicated to others” is decisive (Westin 1967, p. 7). The concept of informational self-determination, first established by the German Bundesverfassungsgericht (Federal Constitutional Court) as a basic constitutional right in 1983, describes exactly this active and cognizant empowerment of users (Hornung and Schnabel 2009).

But in the online world, users feel to have lost control (European Commission 2015). Disclosed information is persistently available over space and time, is searchable, shareable, and replicable (Palen and Dourish 2003; Taddicken 2014). From early childhood on, we learn that we can manage our privacy by means of physical space and visible environments: we close doors and curtains, lock doors, and hold conversations away from unwanted listeners. In the digital world, we cannot rely on these natural and intuitive mechanisms anymore. The desired state of privacy, and with this the desired sharing of information, is defined by context and audience (Nissenbaum 2010). Online, contexts collapse and audiences become heterogeneous (Marwick and Boyd 2011; Taddicken 2014). Even malicious acts targeting our privacy become more harmful and far-reaching. Aggravatingly, users might not know and might not be able to realistically assess the harmful situation, as the danger of malpractice is not necessarily visible. Many internet users perceive that they have lost control over their information and they are highly concerned about their online privacy (European Commission 2015; Li 2011). 58% of Europeans see no alternative to the provision of personal information to obtain products or services (ibid.). Active privacy protection which could soothe concerns is perceived as too complex, not feasible, and users are resigned (Hoffmann et al. 2016). On the other hand, protection is neglected because of a feeling of “nothing to hide” (Prettyman et al. 2015) or “no-one is interested in my data” (Schomakers et al. 2018).

This feeling of being overpowered and the loss of control experienced by the users could be one explanation for the so-called privacy paradox. A number of studies have found a mismatch between users’ privacy attitudes and their respective online behavior: Users are concerned about their online privacy, but they still share private information without hesitation and do not protect their privacy accordingly (Gerber et al. 2018; Kokolakis 2015). But research has also yielded opposing results: Users are willing to pay extra for privacy and concerns indeed trigger protective responses (Baruh et al. 2017; Blank et al. 2014; Egelman and Felt 2012; Lutz and Strathoff 2013). The privacy calculus theory postulates that users weigh the anticipated risks for their privacy against the perceived benefits of data disclosure (Dinev and Hart 2006). But users cannot foresee all of the consequences and risks of said disclosure. On the one hand, humans’ cognitive capacity and complexity skills are limited and decisions are therefore based on heuristics and biases formed by experience (Acquisti et al. 2015). On the other hand, even if we could process the complexity of information, we do not have full knowledge about what may happen to our data in the present and future (Baruh and Popescu 2017; Spiekermann and Novotny 2015).

In the vision of a privacy-preserving data market, users are provided with full informational self-determination. In contrast to data repositories maintained by digital companies, who collect information about their customers, and existing third-party data markets, in privacy-preserving data markets users are not only informed about proper data handling, data are actively distributed by the users to the privacy-preserving data market, thereby enabling full control over the information flow and privacy (Matzutt et al. 2017). This also adheres to the demands of the new European General Data Protection Regulation (GDPR) that has come into force in 2018. This paper aims to understand which privacy protective measures and which control possibilities users desire to maintain their privacy when sharing data.

Privacy protection in data sharing

To interact with complex technical systems, people employ mental models. Mental models are cognitive representations of how technical systems or interfaces might work, including persons’ beliefs as well as cognitive and affective expectations about their functions (Gentner and Stevens 2014). Mental models are not necessarily correct representations of the real functioning of the real systems, but still helpful in guiding users and helping them understand and interact with complex and abstract issues and systems (Asgharpour et al. 2007; Morgan et al. 2002).

Mental models concerning perceived risks in the online environment and online privacy are often transferred from the physical world. Five predominant mental models of users regarding online threats have been identified in previous research (Camp 2009): physical security (e.g., break-in in your home), medical security (e.g., computer virus), criminal behavior (e.g., identity theft, vandalism), warfare (threats have fast response times and huge potential losses), and economic failure (financial losses). Especially physical security – meaning security from physical entrance to your private space(s) and belongings – are used to describe the risks of the online environment (Asgharpour et al. 2007). Security is perceived as a barrier, therefore items locking someone out or preventing access to information are associated with privacy and security, e.g., locks, keys, and shields (Dourish et al. 2003; Motti and Caine 2016).

These real world concepts that users take to make sense of the virtual world – the quintessential idea of mental models (Asgharpour et al. 2007; Coopamootoo and Groß 2014) – restrict access and data sharing completely. So how can these concepts be translated and applied to the context of data sharing in a privacy-preserving data market? One possibility to protect users’ privacy in data markets is anonymization of the shared data. The level of anonymity can be described and acquired in different ways. One concept is the k-anonymity. If a person has k-anonymity within a data set, this person is not distinguishable from at least k-1 other individuals in this data set. Therefore, with k > 1, the person is not 100% identifiable.

In real-world decisions, often conflicting aspects have to be considered and weighed up – like benefits and barriers are weighed against each other according to the Privacy Calculus. In the trade-offs between positive and negative aspects, the relevance of each factor for the individual can be revealed. To do so empirically, Conjoint studies can be conducted. Prior Conjoint studies included different as well as similar factors and show partly contradicting results. In Online Social Networks (OSN), it could be shown that free of charge access compared to small payments had the most impact on users’ data sharing decisions. Privacy control mechanisms for the visibility of posts to other users, on the other hand, were less important (Krasnova et al. 2009). For data sharing with organizations, Roeber et al. (2015) found that the monetary compensation was the second most important factor and how the data is used by organization (including anonymized use as options) again less important. Here, the most important factor for users’ decision to share data was the data type. For sharing medical data, two anonymization techniques, k-anonymity and differential privacy, have been compared in their impact on sharing decisions by Calero Valdez and Ziefle (2019). Regardless of the anonymization method, anonymization was the most important factor for the participants in this study and the type of benefit for data sharing was less important. These conflicting results can be contributed to different reasons: For one, different sample and cultural and contextual settings for data sharing were used in these studies. For another, the selection and exact operationalizing of the factors, as well as the number of levels and level selection influence the results of Conjoint studies extremely. Therefore, the difference in reward from 0 to 50€ in the study by Roeber et al. (2015) to a not further specified type of benefit being global, financial, or personal (Calero Valdez and Ziefle 2019) makes a huge differences and makes these studies theoretically incomparable.

Still, the qualitative and quantitative research approaches to understand privacy in data sharing have not yet been brought together. There is a need for a holistic approach to reveal users’ notions about anonymization, their understanding of this abstract concept, and wishes for anonymization levels as well as to show how users attribute relevance to privacy protection in trade-off with other important factors. Prior Conjoint studies have used specific context (e.g., OSNs, medical data sharing) and different operationalization of privacy in data sharing (e.g., privacy control for visibility of posts in OSNs, binary levels of anonymized data compared to data linked to the name). To understand privacy in data sharing in privacy-preserving data markets, research within this specific context considering the users’ mental models of privacy protection is needed.

Material and methods: Empirical approach

In order to understand users’ notions of privacy in data sharing, we undertook a three-step empirical approach. In a first exploratory approach, qualitative interviews and focus groups were used to uncover mental models and attitudes regarding privacy in data sharing and conditions for a user-centered privacy-preserving data market. Based on these findings, an online questionnaire was developed to quantify users’ motives, barriers, and conditions as well as willingness to provide data using a larger sample of German internet users. In a third-step, experiments were conducted to model users’ choices of data sharing in which the trade-offs between privacy protection, type of data, data receiver, and benefit of data sharing are identified. Figure 1 illustrates our approach.

Fig. 1
figure 1

Three-tier research approach

Study I: Exploring users’ understanding of privacy in data sharing

The aim of Study I was to examine users’ notions regarding privacy in data sharing and to identify motives, barriers, and conditions for the use of user-centered privacy-preserving data markets. A qualitative approach was chosen, as it is not yet known what users’ preferences for such data markets are and especially mental models regarding privacy usually do not include the case of sharing data.

The research process of study I

In total, 7 interviews and 6 focus groups (guided group discussions) were conducted. All followed a similar interview guideline (cf. Figure 2) but put different foci on the topics. After an introduction and warming up with discussions about privacy in the online environment, the scenario of a user-centered, privacy-preserving data market was introduced. The participants were asked, what privacy protection means to them when sharing data. Three focus groups put focus onto this topic and included further exercises to elicit mental models about privacy in more detail (the preliminary results of these are published in Schomakers et al. 2018). In all focus groups, the participants were not only asked to talk about their notions of privacy in such a data market but also to draw visualizations and controls for privacy in data sharing. The second topic dealt with motives to use a privacy-preserving data market and barriers which hinder usage. Additionally, conditions (technical, social, economic and more) were discussed.

Fig. 2
figure 2

Method and sample of study I

The interviews and focus groups were audiotaped and transcribed verbatim, before categories were derived from the data following a conventional content analysis (cf. Hsieh and Shannan 2005).

37 people between the age of 15 and 60 years (M = 32.3, SD = 13.6) participated. 43% were women. The participants were recruited with the goal to include people of differing level of privacy knowledge and digital skills, age, and gender.

Results of study I

The reporting of the results is structured as follows: first, the ideas and concepts regarding privacy protection are presented before attitudes about and conditions for data sharing are described.

Concepts of privacy protection

As taken from the users’ reports, privacy protection in general is not a well-defined concept: It appears as multi-layered, contextualized, and participants differ heavily in their understanding of the topic. “Exhausting,” “complex,” even “impossible” but still very “important” were the attributes used for privacy protection in general. Privacy protection is perceived as multi-layered and contextualized as there are various parties of responsibility for privacy protection, different perceived threats, and, thus, different options for protection. Threats mentioned by the participants are especially criminal behaviors and illicit use of data, but also the collection of data itself and resulting targeted advertising. The demand for more controllability and self-determination is plain. Moreover, some participants saw a threat for society and democracy, e.g., because of manipulation possibilities and filter bubbles which may influence public opinion and elections.

Analyzing the perceived threats when participating in the online world is a key approach to understand mental models of online privacy protection, as some participants defined privacy protection explicitly as the absence of negative consequences. One conceptual threat was implicitly addressed very often throughout the discussion and interviews: identification. Participants oppose the notion that others can “form a picture of themselves” (a German idiom meaning to form an opinion about themselves), that “apps can identify yourself,” that “data are traced back to yourself,” and that “data are combined to individual profiles.” Central is the identity and the “me” that participants want to protect. Their wordings and statements suggest that they do see identification as the key problem and not being identified as the one mechanism that can protect them online. Focusing on the idea of sharing data while preserving privacy, the control over what is shared to whom for what benefit and which purpose is one central element. The other is guaranteeing anonymity (see the following quote).

“The most important thing for me would be anonymity” (male participant, 29 years old)

Notions of privacy protection are mostly ‘binary’: privacy is either protected or not. This matches most learned, real world concepts of privacy protection, e.g., the use of locks, shutting doors, or building a fence. Still, the idea of gradual anonymity can be learned. Here, the concept of k-anonymity can be easily understood and applied to the idea of privacy-preserving data markets.

Motives and barriers for sharing data in a privacy-preserving data market

As motives for using such a data market mostly benefits for oneself or others were discussed. Benefits for oneself do not only include gratification (e.g., monetary) but also to get an overview over one’s data and online accounts. Benefits for others are especially seen regarding the use of the shared data for science and medicine. Here, users are more willing to share data for such a benefit. The feeling of being in control when the data market puts informational self-determination into practice is another decisive driver.

“If the data would only be used for medical purposes, for improvements in science or so, then I would be more willing to provide data.” (female participant, 51 years old)

Perceived barriers include essentially three topics: privacy concerns and missing trust into the security of such a data market, moral concerns, and not seeing the personal benefit of the system. Privacy concerns were quite foreground in the discussions and range from fears of the provider of the data market being dishonest, the security of the data market being insufficient, to the data receivers abusing the data material, e.g., with using it for different purposes than consented to. Some participants stated that they would never use such a data market because they morally disapprove with ‘selling’ data. Others saw “no benefit at all” in the concept (male participant, 26 years old).

“I don’t trust it. Even if the provider would be honest, how will he guarantee that the data buyers use it only for the stated purposes, or that no hackers can get access.” (female participant, 21 years old)

Conditions for data sharing

The participants addressed multiple factors that influence their willingness to provide data. Of great importance were the data receiver and the purpose of the collection. Here, the benefits for the self or the society, e.g., for medical studies, are one important aspect. Also, when the participants can understand why the data are useful to the receiver, they seem to be more willing to provide data. One huge point for discussion was, what types of data are shared and their perceived sensitivity. Of additional importance for the users are the questions how the data are collected and stored, how long it is stored, the prevailing options to let personal data be deleted, and transparency. For privacy protection, the participants demanded – besides informational self-determination and transparency – anonymity and protection within the data market application (e.g., password protection, facial recognition).

All in all, the participants’ attitudes towards user-centered privacy-preserving data markets were quite diverse. Some participants showed very high privacy concerns in general and regarding such data markets, while others were more laid-back. Also, some participants saw moral barriers against trading data, where others were focused more on the personal benefit.

Key Findings of Study I

  • A key element of online privacy and privacy in data sharing is the protection of the identity, and thus, anonymity.

  • For privacy in data sharing, it is important to control.

    • what types of data are shared (and as how sensitive these are perceived),

    • with whom (and how much the person is trusted),

    • for what purposes,

    • and for which benefit.

  • Barriers against the use of privacy-preserving data markets are foremost privacy concerns, as well as moral concerns against the ‘trading’ with personal data.

  • Users’ are diverse regarding their attitudes, especially they differ regarding the strength of their privacy concerns.

Study II: Quantifying users’ conditions for data sharing

The second study aimed at confirming and quantifying the motives, barriers, and conditions for data sharing identified in Study I. N = 800 German participants completed an online questionnaire in which they were introduced to the scenario of a privacy-preserving data market. In Study I, varying preferences towards such data markets could be observed between people with less and more privacy concerns. Therefore, in Study II, we also analyze differences between groups of users with lower and higher privacy concerns.

The research process of study II

The online questionnaire consisted of three main parts (cf., Fig. 3). The third and main part of the questionnaire started with a scenario of a privacy-preserving data market: A newly developed data market enables participants to share their data with interested parties and to get rewards in return. This application values the users’ privacy: Data are shared anonymously, the users can control exactly what information is shared with whom, and they get full transparency. One of four scenarios was randomly assigned to each participant. The scenario differed only regarding the context and type of data: smart home data, customer service data, medical data for clinical trials, and data about online behavior for market research. After the introduction to the scenario, the participants indicated their willingness to share exemplary data types, and their agreement to benefits, barriers, and conditions for the use of the data market. Beforehand, demographic factors and online attitudes and behaviors were assessed (items for privacy concern based on Xu et al. (2008), trust based on McKnight et al. (2002), privacy self-efficacy based on Schomakers et al. 2019a). All items were measured on symmetric 6-point Likert scales ranging from 1 (‘I do not agree’) to 6 (‘I fully agree’). The reliability of the scales was confirmed using Cronbach’s Alpha (criteria α > .7 was met by all scales).

Fig. 3
figure 3

Description of the online questionnaire of study II

For the analysis of group differences, a median split of privacy concern was used. By this, a group of 429 participants with high privacy concerns (M = 4.79, SD = 0.59, min = 4.00, max = 6.00) and a group of 371 participants with low privacy concerns (M = 3.3, SD = 0.51, min = 1.29, max = 3.86) were compared. The data were analyzed using multivariate procedures. Analyses of variance (Pillai’s Trace) were calculated to study differences between the groups. The significance level was set at 5%.

The questionnaire was distributed online via an independent market research company. N = 800 Germans participated who are regular internet users. Details of the demographic characteristics of the sample and each group can be taken from Table 1.

Table 1 Demographic characteristics of the sample and the two concern groups

Results of study II

The first part of the analysis examines whether the data types differ regarding how willing the participants are to share data in the presented scenario. Figure 4 illustrates the mean willingness to provide data, which differs significantly between the data types (F(9, 791) = 84.2, p < .001, partial η2 = .49). Age is the only data type, that the participants are, on average, willing to provide (M = 3.84). Location data (M = 2.53) and income level (M = 2.54) are most rejected to be provided. Also, activity data, real time data, consumer habits, and even zip-codes, shopping interests, and preferences for technology are seen critically (scores below the midpoint of the scale).

Fig. 4
figure 4

Mean willingness to provide data in a user-centered privacy-preserving data market in comparison between the groups of low and high privacy concern (n = 800)

The willingness to provide data differs significantly between user with low privacy concern and with high privacy concern (F(10,789) = 5.10, p < .001, partial η2 = .06): Users with lower privacy concerns are more willing to provide data than users with higher privacy concerns.

Motives and barriers

Secondly, the motives and barriers regarding the use of a privacy-preserving data market are studied. The agreement to the motives is all in all rather neutral (around the midpoint of the scale of 3.5). Users see mostly benefits in the informational self-determination, privacy protection, and the rewards for data provision (cf. Fig. 5). Trust in data protection and a laid-back attitude towards data collection are rather disagreed to. Decisive barriers to use a user-centered privacy-preserving data market are foremost concerns about privacy and data security (cf. Fig. 6). Additionally, moral arguments and not seeing personal benefits are perceived barriers.

Fig. 5
figure 5

Mean agreement to the motives of using a user-centered privacy-preserving data market: “I would use the data market because...” (n = 800)

Fig. 6
figure 6

Mean agreement to the barriers of a user-centred privacy-preserving data market: “ I would not use the data market because...” (n = 800)

Users with lower privacy concerns see significantly more motives to use the privacy-preserving data market than do users with higher privacy concerns (F(10,789) = 9.24, p < .001, partial η2 = .11). Especially, the trust in the data protection in such a data market is higher by those with lower privacy concerns. Additionally, those with lower privacy concerns do not perceive that many barriers (F(15,784) = 12.7, p < .001, partial η2 = .2). These differences are prevalent for all barriers and all motives.

Conditions

The mean evaluations of importance of the conditions are depicted in Fig. 7. All conditions are rated to be between important to very important, with data security (M = 5.4, SD = 0.9), simple ways of deleting data (M = 5.34, SD = 1.03), and anonymity (M = 5.33, SD = 1.07) being the most important. Sufficient rewards (M = 4.33, SD = 1.53) are least important. Users’ with high privacy concerns put more importance on these conditions for privacy than do users with lower privacy concerns (F(10,789) = 18.13, p < .001, partial η2 = .19). These user groups differ in the perceived importance of all conditions with exception of sufficient rewards.

Fig. 7
figure 7

Importance ratings of the conditions for a user-centered privacy-preserving data market (n = 800)

Key Findings of Study II.

  • Significant differences in the willingness to provide different data types could be observed between the data types ➔ the type of data to be shared is a decisive factor.

  • Informational self-determination and benefits are the most important motives to use a user-centered privacy-preserving data market.

  • Privacy concerns are the most relevant barriers against such a data market, followed by moral concerns.

  • All conditions for data sharing identified in the qualitative approach in Study I were confirmed to be highly important and differences between conditions were small.

  • Users with higher privacy concerns in general are less willing to provide data in a privacy-preserving data market, agree less to motives and more to barriers, and put more emphasize on privacy conditions.

Study III: Modeling users’ trade-offs in data sharing

Study II showed that all barriers and conditions for privacy in data sharing were important when asked independently. However, in real world usage scenarios, factors are not independent of each other but might influence and compensate each other, resulting in a weighing of motives, barriers, and conditions. In order to examine which factors are decisive, conjoint experiments were developed for Study III that show the trade-offs between the different factors. As differences between users with higher and lower privacy concerns could be shown to be relevant, we again look at differences between these groups.

The research process of study III

In reality, decisions are based on different factors in combination, weighing them against each other (Luce and Tukey 1964). In the previous studies, the privacy protection, type of data, data receiver, and the benefit were identified as decisive factors for the decision to share data in a privacy-preserving data market. In study III, these factors are now evaluated jointly in a conjoint experiment to mimic realistic user decisions.

Choice-based conjoint experiments model these complex decision processes: Participants need to choose between n scenarios with m different attributes which vary in levels. They need to take all attributes into account at the same time. As a result of the conjoint analysis, the relative importance of attributes shows how much each attribute influences the decision, and part-worth utilities reflect which attribute levels are most and least accepted (Arning 2017).

The attributes and attribute levels

The selection of the attributes for this study was based on the findings in Studies I and II, referring to which factors mostly influence users’ decisions to share data. Privacy protection was operationalized in two ways: on the one hand, the level of anonymization was included as k-anonymity; on the other hand, the type of app security was assessed. The type of app security was included as it was shown in Study I to be a typical representation for privacy protection in laypersons’ perceptions. Table 2 shows all attributes and attribute levels. Detailed explanations were given to the participants and the anonymization level was additionally instructed with an illustrative example to clarify k-anonymity. Again, here the laypersons’ understanding of anonymity was considered: As most participants of Study I showed to have a notion of a ‘complete’ anonymization, this was included as level in this study. Thereby, the two other presented anonymization levels can be compared to this fictitious and unrealistic threshold that is still a mental model for many users not familiar with technical anonymization capabilities.

Table 2 Attributes and attribute levels of the conjoint experiment

Pictograms were used to improve comprehensibility and ease of use. The participants were given 10 random choice tasks, as a complete design with 1024 (4 × 4 × 4 × 4 × 4) possible combinations of the attribute levels would not have been feasible. Tests of design efficiency confirmed the applicability of this reduction. In each choice task, the participants were presented with n = 3 scenarios – that are random combinations of the attribute levels – plus a “none” option. No restrictions in level combination were given and participants were not able to skip tasks.

Questionnaire and sample

In this study, the choices between scenarios were provided within an online questionnaire that started with questions about demographics, usage and experience with apps, and privacy concerns, before the conjoint tasks were introduced (cf., Fig. 8). For the conception and analysis of the conjoint, Sawtooth Software was used.

Fig. 8
figure 8

Description of the online questionnaire and conjoint tasks in study III

N = 126 complete sets of answers are analyzed. The participants were recruited online, volunteered to take part, and are rather young on average (M = 27.87, SD = 7.9). Gender was quite balanced with 56% women (cf. Table 3). All participants reported to be frequent internet and app users. For the analysis of group differences regarding privacy concerns, again a median split was used resulting in a group of 54 participants with low privacy concerns (M = 2.8, SD = 0.75, min = 1, max = 3.75) and a group of 72 participants with higher privacy concerns (M = 4.81, SD = 64, min = 4, max = 6).

Table 3 Demographic characteristics of the sample and the two groups

Results of Study III

To identify which attribute has the greatest impact on users’ decision to share data, relative importance scores are calculated. Part-worth utilities (zero-centered diffs, calculated using Hierarchical Bayes multinomial logit model) are analyzed to observe how the attribute levels are accepted in comparison to each other. For the interpretation of part-worth utilities, it is important to bear in mind that they cannot be compared between attributes (Gustafsson et al. 2007). Also, positive utilities do not signify acceptance nor do negative utilities signify rejection. The differences between utilities show the influence on the sharing decision in comparison to each other.

Figure 9 pictures the relative importance scores of the five attributes. The anonymization level has the greatest impact on the sharing decision for both groups (M = 34.8%). Second most important is the type of data (M = 23.1%). Benefit, data receiver, and app security show a similar, smaller influence (M ≈ 14%).

Fig. 9
figure 9

Relative importance scores of the attributes (n = 126)

The concern groups show small but significant differences in their importance attribution (F(4,121) = 3.14, p < .05, partial η2 = .09). The relative importance of data type, data receiver, and app security does not differ. But users with a higher privacy concerns put more value to the level of anonymization (M = 38%) than do those with lower privacy concerns (M = 30.6%, F(1,124) = 12.37, p < .01, partial η2 = .09). Instead, those with lower privacy concerns put more importance to the monetary benefit (M = 15.2%) than those with higher privacy concerns (M = 12.5%, F(1,124) = 4.86, p < .05, partial η2 = .02).

In Fig. 10, the part-worth utilities for all attribute levels are presented. Not surprisingly, higher privacy protection with ‘complete’ anonymization is most accepted by the participants whereas no anonymization is least accepted. Users rather share their social network profile and location data than their medical history and least their financial account data. Also not surprisingly, the higher the monetary reward is, the more are users willing to provide data. As data receivers, public administration and non-governmental organization are more accepted than online company and financial institution. The most accepted app security option is double password protection followed by password protection. Facial recognition is the least accepted.

Fig. 10
figure 10

Part-worth utilities of all attribute levels (n = 126)

No significant differences in the level evaluation between the groups with low and high privacy concerns can be observed (F(15,110) = 1.71, p > 0.5).

Key Findings of Study III

  • The level of anonymization has the greatest impact on the decision whether to share data, followed by the type of data.

  • The most accepted scenario is sharing the social network profile with ‘complete’ anonymization to a public administration for 75€. The application should be protected by a double password protection.

  • The least accepted scenario is sharing financial account data without anonymization to a financial institution for 5€ with the application being protected with facial recognition.

  • Users with high privacy concerns attribute more importance to the anonymization level and less importance to the monetary benefit than users with lower privacy concerns.

Discussion and conclusion

In this paper, we empirically analyzed users’ attitudes and preferences for privacy in user-centered privacy-preserving data markets. These markets are one approach to restore users’ online privacy and informational self-determination as well as to build a viable data market for companies and research. Here, users can decide what data they want to provide to whom for what purpose – and their data are anonymized.

Our research was guided by the question of what users want in order to perceive privacy and to have self-determination in sharing data. A three-fold research approach was used that started with a qualitative study identifying mental models, preferences and relevant conditions. Building on the results, Study II assessed the importance of these motives, barriers, and conditions, as well as the willingness to provide different data types. In a choice-based conjoint experiment, Study III, five attributes (anonymization level, type of data, data receiver, monetary benefit, app security) were weighed against each other in their impact on the users’ decisions to share data. Figure 11 summarizes the key findings. The contribution of this paper is the mixed-method approach to the topic of privacy in data-sharing which provides, on the one hand, a deeper understanding of users’ notions of privacy in data sharing, and, on the other hand, a quantification of users’ preferences and trade-offs in data sharing decisions.

Fig. 11
figure 11

Key findings of the consecutive research process

Identity and Anonymization

The findings of all three studies indicate that the protection of one’s identity is the core element for privacy and, correspondingly, anonymization is the most important factor that influences users’ decisions to share data. The results are important for the design of privacy-preserving data markets as they show that good anonymization techniques are a core element to ensure users’ perceived privacy.

In Study I, a qualitative approach was used to identify mental models of and preferences for privacy in data sharing. To reach the desired level of privacy – that is a balance between withdrawal and self-disclosure – we learn to use physical mechanisms in the ‘offline’ world, like fences, doors, and curtains. In the online world and especially regarding data sharing contexts, these binary approaches are not fully applicable. Privacy is often misunderstood as the complete withdrawal and the so-called privacy paradox is observable when users decide to share data, e.g., on social media. But privacy is a state that is managed via withdrawal and self-disclosure. Self-disclosure is a precondition for social identity and relationships (Lahlou 2008; Taddicken 2014). Our results show that when giving users informational self-determination and anonymized sharing of data, they perceive their privacy to be guarded.

To quantify attitudes and visions about online privacy, the results and the importance of anonymization were confirmed in a subsequent survey study (Study II), including a large German sample (N = 800 participants). Anonymization is rated as an important condition for privacy in data sharing, but other factors were evaluated similarly important, especially data security, the ways and the possibility of deleting data, and transparency. However, evaluating single factors in isolation, as it is typically done in surveys, does not tell us how users would decide in real world settings. People need to decide under which circumstances they would be willing to share data and, also, under which conditions they would not do so. Thus, they weigh the positive and the negative aspects against each other. In order to study these trade-offs, a conjoint experiment (Study III) was developed in which the participants chose between scenarios of data sharing that differed in anonymization level, type of data, data receiver, amount of monetary rewards, and app security. This approach reveals the trade-offs between the factors and what impact each factor has on the decision to share data. The conjoint analysis demonstrated that anonymization level was the most decisive factor for the decision to share data. This result is in line with previous research on data sharing in a medical context (Calero Valdez and Ziefle 2019). Other previous results of Conjoint experiments which showed the importance of rewards and costs for data sharing (Krasnova et al. 2009; Roeber et al. 2015) decisions were not supported. These differences could be attributed to different samples and cultural settings as well as different contexts of data sharing. Also, the selection and operationalization of attributes and levels differed.

Still, the applied conjoint approach has some limitations. The number of attributes was already much for conjoint tasks, as the participants should not be overtaxed in their abilities to validly evaluate complex scenarios. A trade-off decision between the complexity of the research issue and an economic design was made, although Studies I and II showed that still more aspects are important to the participants, e.g., deletion of data, storage conditions, and control of the purpose of data collection. With the use of adaptive conjoint approaches, more attributes could be included into future studies. Additionally, the comparatively small and young sample of the conjoint study needs to be taken into account. Empirical research has found that privacy perceptions and behaviors differ between age groups (Miltgen and Peyrat-Guillard 2014; Van den Broeck et al. 2015). Future studies should therefore include a wider span of age and focus also on the youngest and the oldest that have been described as vulnerable groups regarding online privacy (Moscardelli and Divine 2007; Van den Broeck et al. 2015).

The levels of the attribute anonymization were chosen to be comprehensible to participants without background knowledge of anonymization techniques. The level of ‘complete’ anonymization does not exist in reality and was only used to compare it to k-anonymity levels of k = 5 and k = 2. By that, we see that there is large gap in utility between an anonymization of “1 of 5” to “complete anonymization”, emphasizing that “1 of 5” is not perceived as anonymization level equal to complete anonymization. This small deception may still have influenced the evaluation of the other presented anonymization levels. Future studies should try to understand users’ risk perceptions in data sharing and how the perception of anonymization level is distributed.

Further important aspects for user-centered privacy-preserving data markets

The qualitative approach revealed many aspects that are important to users for such data markets. Informational self-determination – meaning in this context giving users full control over what data they share with whom for what purpose – and trust in the data market, its data security, and the handling of the data by the data receivers are the most important factors from the users’ perspective besides anonymization. The most relevant barriers against using a privacy-preserving data market are privacy concerns or mistrust into the data security of data markets. Moral concerns about the trade with personal data are also influential. Full transparency and control by the users are needed to build trust. Users want to know what their data are used for and thus observe the benefit besides the monetary reward.

The willingness to provide data varies between data types and the type of data is the second most important attribute of the conjoint study. The sensitivity of information plays an important role for the risk perceptions and the willingness to share data (Calero Valdez and Ziefle 2019; Milne et al. 2016; Wirth et al. 2019). Users are willing to provide (certain) data when it has a benefit for themselves or the society and when trust in the data protection is prevalent. These trade-offs between privacy and utility goes in line with the privacy calculus theory (Dinev and Hart 2006) and can thus explain the ‘paradoxical’ behavior of internet users providing data even if they are concerned about their privacy. The operationalization of benefit in this study as monetary reward between 5€ and 75€ may also have influenced the importance scores. If the type of benefit would have included rewards of 1000€, we would hypothesize higher importance scores for monetary reward. Also, the value of money is different for each participant and some participants may wish for other benefits than money. All in all, the findings of this study only apply to the investigated attributes and levels.

But the results show that privacy is very complex. The new European Data Protection Regulation (GDPR) applies only to personal data which is defined as any information relating to an identified or identifiable natural person (cf. Art. 4 (1) GDPR). Thus, for the use of anonymized data, no consent is needed by the individual. The results of our study show that not only anonymization is important for informational self-determination. Even with anonymization, users want to control what data about them are used for which purposes and they want to be rewarded. Here, privacy-preserving data markets can offer a solution to restore the users’ informational self-determination and let users get a share of the huge profits made with their data.

In contrast to data monopolies by large online companies who operate as ‘walled gardens’, making anonymized, high quality user data accessible with a low market entry barrier, higher legal certainty for the buyers, and transparency for all involved parties offers great chances to companies and lays the foundation for innovations. But there are challenges for privacy-preserving data markets. For one, anonymized data may lose value to paying companies and thus the data may represent a bad alternative compared to existing data repositories. However, customers might prefer companies accepting their boundaries and need for privacy, thus making the use of a privacy-preserving data market as source for marketing data a relevant business advantage. For another, to offer a viable and competitive data repository, a minimum amount of user data needs to be accumulated, thereby making the initiation phase critical for such a privacy-preserving data market. This argument raises the question whether a privacy-preserving data market – which offers benefits and informational self-determination for users as well as innovation potential for small and medium sized companies – could or should be provided by a governmental or a non-profit organization. Initiatives into a similar direction have been started in Europe (Poikola et al. 2015; Matzutt et al. 2017). Our results show that users are willing to share data on their terms when provided with informational self-determination and, particularly, anonymization. This holds especially true for those users with less privacy concerns.

Still, it has to be considered that the results of our studies show preference ratings and not actual behavior. In privacy research and other contexts, the discrepancy between stated attitudes and actual behavior has often been revealed (Gerber et al. 2018). Thus, the findings of this study indicate that anonymization level is very important to the participants, the medical history is shared less willingly than location data, and public institutions are more accepted as data receivers than companies. But actual behavior might present differently according to the privacy calculus theory. This may implicate that attractive benefits may override privacy concerns. Also, the privacy calculus is influenced by situational and affective variables (Acquisti et al. 2015; Kehr et al. 2015). During our empirical research, regardless whether it is within a focus group, online questionnaire, or conjoint study, the participants are primed to privacy as topic and answered mostly rationally. This may not be the case in real data sharing decisions, where the immediate benefit is more foreground. Future studies therefore need to explore how users might be adequately warned or informed about their privacy settings. Here, user-centered recommendation systems might prevent imprudent online behaviors by illustrating consequences for possible privacy intrusions.

User diversity

Throughout all three studies, differences between groups of participants with lower and higher privacy concerns could be observed. Users with lower privacy concerns in general are more motivated to use privacy-preserving data markets and less concerned for their privacy when doing so. They are probably the first user group to participate in such data markets. This shows how diverse users are in their privacy attitudes and preferences and that different user groups exist and need to be considered in research and implementation (e.g., Schomakers et al. 2019a; Sheehan 2002). Here, only these two groups were studied, but users’ preferences probably vary also depending on other characteristics and attitudes, e.g., experiences, knowledge, or trust (e.g., Riquelme and Román 2014).

Also, the cultural setting needs to be kept in mind: All three studies were conducted in Germany. Historical experiences with dictatorship and surveillance as well as a long tradition of privacy regulation may have an influence on Germans’ privacy attitudes. Empirical studies have shown that Germans perceive more risks in comparison to other countries and perceive information to be more sensitive (Krasnova and Veltri 2010; Trepte et al. 2017; Schomakers et al. 2019b). Comparison of results to other countries would be insightful.