Keywords

1 Introduction

Although there is broad consensus that internet users’ privacy awareness should be increased and encouraged and that informed consents can strongly support highly sustainable means of income, the effects of privacy regulations and privacy awareness on the profits of IT businesses remain unclear. Numerous studies have sought to quantify the value internet users allocate to their personal data, but the topic continues controversial [1]. User consent to the processing of personal data is typically requested at the beginning of an interaction; however, the literature suggests that the consent may change over time and should therefore be considered an ongoing activity [2] (cf. Sect. 3.2).

Generally speaking, user engagement has become essential for many websites and publishers [3], and in most cases it involves collecting and processing users’ personal data at some level of aggregation. One of the major sources of income for these businesses is behavioral targeting, which has become widely used throughout many web applications. However, the decision to employ behavioral targeting should be based on nuanced consideration: For small publishers whose advertising space is in low demand traditional advertising may be preferable under certain circumstances [4].

Responsible treatment of the personal data collected is essential for businesses, as the legal requirements and corresponding fines are becoming increasingly restrictive and severe. However, sometimes this need for privacy is mixed up with the need for data security, and there is evidence that 38% of organizations address the latter but not the former [5]. Furthermore, in recent years users have become better informed and aware of their privacy rights.

The contribution of this work is twofold: It (a) follows earlier evidence that privacy concerns are greater in the context of sensitive goods and investigates the effect of the sensitivity of a website’s content on its user engagement rate, and (b) examines the engagement rate over time before and after introduction of the General Data Protection Regulation (GDPR).

2 Privacy

Protection of personal data and the concept of privacy in general is currently one of the most widely discussed topics in the IT sector. At a very abstract level, two fundamental perspectives on privacy can be distinguished:

  • Privacy of the personal sphere as “the right to be let alone”, which was already introduced in 1890 by Warren and Brandeis [6]. Their influential work defined privacy as (a) the secrecy of everyone’s own thoughts, properties and actions and (b) the flow towards the individual of these data about others and that might affect him or her.

  • Privacy of one’s own personal data as “the right to select what personal information about me is known to what people” [7]. This is the traditional viewpoint of many computer scientists, where the focus is on control over information about oneself, one’s conversations and actions. Unlike the concept of solitude, privacy of personal data is actively determined by the individual.

Both perspectives play an important role in the process of designing and implementing new applications that make use of personal data, and should be considered accordingly. Equally important is the extent to which users are aware of what happens to their personal data (and similar privacy-related issues) and whether they act accordingly. Several studies have shown that people’s intentions and their self-reported privacy preferences are not always reflected in their behavior, as many tend to disclose significantly larger amounts of personal data than they their initial assertions would indicate (“Privacy Paradox”) [8, 9].

2.1 Privacy Awareness

As previously mentioned, not only are data protection and privacy by themselves important in the context, but it is also of interest whether users are aware of the usage of their personal data. As a generic term, awareness can be defined as an individual’s attention to, perception of or cognition about both physical and non-physical objects. It plays a crucial role in the field of privacy, where privacy awareness comprises attention to, perception of and cognition about [10]:

  • whether others receive or have received personal information about oneself,

  • which personal information is affected,

  • how this information can or could be processed and used, and

  • what amount of information about others flows towards, and might affect, one.

When users are aware of privacy-related information, they are able to come to an informed decision on whether they agree to a specific request. In essence, this can be expressed by a very basic utility function [11]:

$$\begin{aligned} U(X) = Benefit - Cost, \end{aligned}$$

where Benefit is the individual value of a personalized service and Cost a combination of previous privacy invasion, consumer privacy concern, and the perceived importance of privacy policies and information transparency. Other scholars conceived of a similar function by substituting Cost with (privacy) Risks, for which they identified seven dimensions [12]:

  • Physical Risk: Fearing the loss of physical safety, arising from access to personal data

  • Social Risk: Fearing a change in one’s social status

  • Resource-Related Risk: Fearing the loss of resources

  • Psychological Risk: Fearing a negative impact on one’s peace of mind

  • Prosecution-related Risk: Fearing that legal actions will be taken against one

  • Career-related Risk: Fearing negative impacts on one’s career

  • Freedom-related Risk: Fearing a loss of freedom of opinion and behavior.

Recent literature also shows that the willingness to disclose personal information differs across cultures [13] and that a big data fair collection and use policy could assist in building awareness of data collection and usage [14]. These policies are also encouraged by the legal regulations presented in the next section.

2.2 GDPR and ePrivacy Regulation

The European Union’s General Data Protection Regulation came into effect in 2016, and since May 25, 2018, organizations have been required to comply to it. In contrast to its predecessor – the Data Protection Directive, which required local regulations for implementation – the GDPR is self-executing and legally binding in all EU member states. This promises to eliminate inconsistencies and different perceptions. The GDPR defines rights for individuals in the EU regarding protection of their personal data and requirements for organizations regarding collection, storage and processing of these data. One of the most important aspects of the GDPR is Privacy by Design, for which the European Union Agency for Network and Information Security defined 8 strategies for organizations [15]:

  • Minimize: Restricting the amount of personal data to be processed to a minimum

  • Hide: Hiding personal data and their interrelations from plain view

  • Separate: Processing of personal data in a distributed way whenever possible

  • Aggregate: Processing of personal data at the highest level of aggregation and the lowest level of detail possible

  • Inform: Informing data subjects whenever they are being processed, and thus increasing transparency

  • Control: Providing data subjects with agency over the processing of their personal data

  • Enforce: Compliance with a privacy policy compatible with legal requirements

  • Demonstrate: Ability to demonstrate this compliance with privacy policies.

Aspects of the technical implementation of Privacy by Design are well researched, while other steps of the Design Science Research Methodology Process Model, such as Design and Development or Demonstration, have hardly been covered in recent literature [16].

The ePrivacy Regulation is meant to replace the current ePrivacy Directive and to particularize and complement the GDPR. At the time of writing, it is still a draft and under discussion, but it aims to protect “fundamental rights and freedoms of natural and legal persons in the provision and use of electronic communications services” (Art. 1(1) ePR-Draft) while ensuring “free movement of electronic communications data and electronic communications services” in the EU (Art. 1(2) ePR-Draft).

The consequences of both regulations remain partly unclear; in particular, the draft of the ePrivacy Regulation leaves many questions open, for instance, in the area of consent for setting cookies [17]. Some studies have identified uncertainties for both businesses and consumers and have therefore suggested standardized privacy labels [18], while others have even claimed negative effects of the GDPR on investment in the European IT sector [19] – clear reasons to research this topic further.

3 User Engagement

This section gives an introduction to the subject of online polls and to user engagement in this field.

3.1 Online Polls

One of the main attributes many modern internet applications have in common is that they rely heavily on engaging their users to provide content, to spread this content, or simply to make their personal data available. These business models are working well [3], as internet users increasingly desire to share their opinions and contribute to website content. The literature shows that this phenomenon can be observed in various forms, for instance, in the comments sections of online newspapers [20] and in social media [21].

However, as investments in online personalization can be very expensive, their justification can be severely undermined if consumers do not value them because of privacy concerns [22]. Including online polls in websites’ content is an way of increasing user engagement at relatively low cost. A typical poll is structured as a single question with typically between two and five answer options. Figure 1 shows such an online poll after the voting process.

Fig. 1.
figure 1

Example online poll

Publishers either have their polls programmed in house or employ commercial systems. In many cases, these systems – like other analytics tools – make use only of polling metadata. However, previous studies have shown that user opinions in the form of poll responses are an untapped source of knowledge that can help publishers to monetize their content [23]. Combining poll responses with polling metadata allows even better profiling. One way of combining these two sources of information is a request for providing more personal information after the poll, as shown in Fig. 4. Requests of this kind are frequently integrated into another form of Benefit (see Sect. 2.1), for example, prize games.

3.2 Engagement Process

O’Brien and Toms [24] defined a process model of engagement consisting of the phases of initiating and sustaining engagement, disengaging, and potentially reengaging during a single interaction. The Point of Engagement with an online poll is typically a news article in which it is included. Reading the article, users see the poll and want to share their opinions or see what others think about the topic. This is the start of the Period of Engagement, which typically ends after seeing the results of the poll (Disengagement). Means of Reengagement are for example, showing subsequent polls with related content or showing additional offers, such as prize games or other forms that request personal data.

These phases all have attributes that emerge during and affect further interaction, such as aesthetics, novelty, interest, motivation, awareness, interactivity, and challenge. Most of these attributes are also relevant to the polling system under investigation. The definition of engagement used here is therefore “a quality of user experiences with technology that is characterized by challenge, aesthetic and sensory appeal, feedback, novelty, interactivity, perceived control and time, awareness, motivation, interest, and affect” [24].

4 Empirical Study

4.1 Theoretical Model

Our research is based on the model by Awad and Krishnan [11], which examines the effects of consumers’ demographic attributes (gender, education and income) and other factors on the perceived importance of information transparency. These factors are specifically:

  • Previous Online Privacy Invasions: Many users have already experienced various forms of privacy invasion, such as e-mail spam and identity theft. These individuals may place less value on personalization and therefore tend not to accept corresponding requests.

  • Privacy Concerns: A higher level of privacy concern will likely lead to a lower willingness to be profiled online.

  • Importance of Privacy Policies: Consumers who value the privacy policy as an aggregate view are also likely to value specific information transparency features.

In the first step, these factors are aggregated in the perceived importance of information transparency, which in the second step has an effect on both the willingness to be profiled online for personalized services and the willingness to be profiled online for personalized advertising. Figure 2 shows the described basis for our model.

Fig. 2.
figure 2

Basic theoretical model [11]

These findings are in line with other research in the field, suggesting that the effect of user concerns for information privacy on their willingness to transact in an e-commerce setting is mediated by risk perceptions and trust, and that information privacy is more important in the context of well-known than of lesser known merchants [25].

In an earlier study, we extended this model by introducing the variables of Actual Information Transparancy and Real-World Setting, and examined their influence on the willingness to be profiled online [26]. Additionally, the question arose of whether the constructs of Privacy Awareness and consequently Information Transparency also have a direct effect on the user engagement rate. This work extends the basic model to include the independent variables of Content Category and Sensitivity and Public Privacy Awareness; the relevant parts of this extension can be seen in Fig. 3. We expected both variables to be inversely correlated with the users’ willingness to be profiled, which is expressed by the dependent variable User Engagement Rate. The reasoning behind this extension is discussed in the next section.

Fig. 3.
figure 3

Research model

4.2 Motivation and Methodology

This empirical study was carried out as part of an ongoing research project in collaboration with an industry partner providing a tool for creating online polls to publishers and other website owners as described in Sect. 3.1. In work published earlier, we sought to make the formerly discrete online polls more intelligent by applying semantic technologies [27] and examined variations of visualization [28]. One of the main findings was that the profiling capabilities of an online polling system can be improved by adding more general knowledge about the pollees and classifying the knowledge coming from the polling responses.

The following analyses are based on a real-world data pool of more than 22M votes in more than 60k polls with more than 65k individual user data sets collected mainly in German-speaking countries. These polls, many of which were supplements to online newspaper articles, were published on a variety of websites. As the polling system investigated is a commercial system utilized by publishers, it is considered a third-party data processor as defined by the GDPR. Users have to accept the terms and conditions of the polling system before they use it for the first time, thereby performing a revocable opt-in.

Fig. 4.
figure 4

Request for personal information

Fig. 5.
figure 5

Consent notification

While the literature shows that opt-outs should remove personal user data from use rather than simply result in termination of sending out marketing messages [2], users in our case study showed little interest in this option. Only 191 people (i.e., 0.002%) out of 7,366,014 unique visitors in the period under consideration clicked on the corresponding link shown in Fig. 5. Further, interest in the terms and conditions of the polling system was also somewhat limited: only 12,513 people (0.15%) clicked on the link leading to the terms (see Figs. 1 and 4).

The first question investigated was, how a poll’s content sensitivity affects the user engagement rate. Online polls are often provided as supplements to online newspaper articles, the topics of which are typically also reflected in the polls. In our setting, we defined the user engagement rate as the proportion of users who voluntarily disclose personal data, and expected it to differ between categories of poll content. Since earlier studies had shown that privacy concerns can be greater when dealing with sensitive goods [29], we assumed that this also applies to polls with sensitive content, which is reflected in the first hypothesis:

Hypothesis 1

(H1). The pollees’ willingness to participate in online profiling is lower after answering polls with more sensitive content.

Hypothesis 1 was tested by categorizing polls that included some kind of request for personal data and were published within the period from October 1, 2018, to November, 30, 2018. We established a categorization system based on the top level (Tier 1) of the IAB Tech Lab Content TaxonomyFootnote 1 with minor adaptions to better reflect the polling system investigated. We then allocated each poll to one of the resulting 23 categories (see Table 1). Publishers chose a variety of requests, for instance, for e-mail addresses (see Fig. 4), or more complex forms for postal addresses or for demographic data. These requests were shown to the user after voting in a poll. We calculated the engagement rate (ER) by relating the number of unique visitors seeing the requests for personal data after voting to the actual number of data sets collected.

The next step built on an online questionnaire (October 2018, N = 41) in which we asked internet users to express their views on the sensitivity of the established categories. The median results of this survey were then consolidated to assign to each category a content sensitivity score between 1 (=low) and 5 (=high). We then again calculated a mean engagement rate for each group of categories with the same score. By employing an ordinal regression, we investigated the correlation between content sensitivity and engagement rate.

As a second research focus, we investigated the effect of public privacy awareness on users’ personal privacy awareness and consequently on the user engagement rate. We expected the user engagement rate to change over time, especially during the phase of GDPR introduction. Media coverage of privacy topics was exceptionally high during that time and public discussion around data protection very intense. We assumed that a correlation between these factors would emerge. This led to the second hypothesis:

Hypothesis 2

(H2). The pollees’ willingness to participate in online profiling decreases with high public privacy awareness.

4.3 Results

The results presented in Table 1 show each category’s number of unique visitors, number of collected data sets and engagement rate. Furthermore, the statistical evaluation of the online survey regarding perceived sensitivity of content categories is given; the median scores were used to distinguish between content sensitivities. There are clear differences between the categories of polls identified: While in categories such as Education and Family & Parenting the acceptance rate of submitting personal information was more than 0.6%, it did not even reach 0.1% in Health.

Table 1. Engagement Rate (ER) by Poll Category and sensitivity

Table 2 presents the results of aggregating the poll categories according to sensitivity. A significant effect of content sensitivity on the engagement rate cannot be postulated (\(\mathcal {X}^2 = 93.572\), \(p = 0.223\), Nagelkerke Pseudo \(R^2 = 0.286\)). The highest ER was found for the – generally perceived as sensitive – category of Religion & Spirituality, while the third lowest rate was observed for the minimally sensitive category of Sports. Thus, we can neither accept Hypothesis 1 nor reject the null hypothesis, as there are more influential factors than content sensitivity. For example, the relatively high ER of Careers can presumably be explained by pollees wishing to be contacted by potential employers.

Table 2. Engagement Rate (ER) by content sensitivity (1 = low, 5 = high)

To test Hypothesis 2, we redefined the engagement rate as the proportion of people voting in a poll, regardless of their willingness to disclose any further personal data, thereby taking into account all polls during the time from January 2017 until November 2018. We compared the phases before and after May 25, 2018, the date on which the GDPR came into force. At the same time, the polling system investigated introduced a cookie consent notification (see Fig. 5) and an option to opt out. Whenever visitors used the polling system for the first time, they had to accept the terms and conditions for their vote to be counted. In contrast to the previous analysis, the ER in this case was calculated by dividing the unique views of all polls by the number of actual votes during the same period; in other words, we calculated the proportion of all users who saw a poll and decided to vote in it. Table 3 summarizes the results.

Table 3. Impact of GDPR introduction on Engagement Rate (ER)

Contrary to our expectations, introduction of the consent notification and applicability of the GDPR did not have a negative effect on the engagement rate. Holt’s exponential smoothing shows an almost continuous growth with an \(R^2\) of 0.752 and RMSE of 0.685. The engagement rate climbed from around 9% to nearly 11% in the months after GDPR, thereby disproving Hypothesis 2. Generally speaking, these are remarkably high numbers in the context of online marketing applications.

5 Conclusion and Future Research Directions

The research presented in this paper forms part of a research project that is currently being carried out together with an industry partner focused on providing online polls to website owners and publishers. It concentrates mainly on investigating two questions: (i) How likely are users to disclose personal information after voting in polls from various content categories, and does the sensitivity of the content correlate with their willingness to be profiled? (ii) How does the public discussion around data protection and privacy that was triggered by introduction of the GDPR and the related media coverage affect the users’ individual privacy awareness and thus their engagement rate?

To answer these questions, live data from online polls published by various websites was analyzed. The polls were categorized using an industry standard taxonomy, and the user engagement rate was calculated by dividing the number of collected data sets by the number of unique participants in these polls. An online survey was then conducted to determine each category’s sensitivity and the engagement rates for the established sensitivity groups were calculated. We found that the engagement rate differed considerably between categories, but a significant effect of sensitivity on engagement rate could not be identified. Additionally, the engagement rate given by the number of unique visitors and their votes in polls published over time before and after GDPR introduction in May 2018 was investigated. The result showed that – contrary to our expectations – the engagement rate did not decrease post-GDPR.

User behavior in the context of online polling offers several further interesting avenues for research. For example, we expect users’ privacy awareness and engagement rate to differ across categories of websites, for instance, professional journals, tabloid press or companies from various lines of business. Furthermore, since the period after GDPR introduction considered in this work was relatively short, it would be interesting to see how the engagement rate continues to develop.