1 Introduction

The appetite for consumer data sharing is a critical part of modern business, and is driven by technological advancements and increasing digitisation. In the insurance industry, the impact of digital transformation in practice means enhancing the customer experience, improving business processes, offering new products, and preparing for competition with other industries (Eling & Lehmann, 2018). The industry is always innovating (Lanfranchi & Grassi, 2021), businesses are developing new products and processes, exploiting existing technology and tapping into their customers’ needs (Lanfranchi & Grassi, 2022). As such, data and data sharing are today at the basis of most economic activity (Loi et al., 2022).

Consumer data are now a defining force pervading every aspect of financial businesses (He et al., 2023). Data have been central to the insurance industry for years, although they were mainly used to assess risks and calculate premiums (Campo & Antonio, 2022; Gordon et al., 2003; Hosein, 2023; Lledo & Pavia, 2022). More recently, advancements in data analytics and the emergence of new data sources, among which telematics devices (e.g. black boxes), connectivity and wearable technology, have enabled insurers to access more granular, behavioural and real-time data about customers (Baecke & Bocca, 2017; Feng et al., 2022). Data collected from sensors, IoT devices and similar sources were added to the traditional information already in the insurers’ hands. The data now available are highly accurate, integral, and ensure continuity of service (Handel et al., 2014); they improve safety and security, help to prevent accidents or mishaps and lower the risk of accidents and injuries (and so claims) (Saliba et al., 2021; Van der Boom, 2023; Wiegard et al., 2019). Fitness trackers, smart watches, smart clothing and other such wearable devices are another source of customer data that can be used by insurers to predict and assess the risks associated to each customer, adjust their premiums and offer personalised insurance products (McCrea & Farrell, 2018; Paluch & Tuzovic, 2019). The availability of these data has opened the way to new usage-based insurance policies, new pricing models and more customised policies (Ayuso et al., 2019; Husnjak et al., 2015; Verbelen et al., 2018).

In return, this situation has brought up questions concerning discrimination, as, on the one hand, usage data could matter more than demographic information such as gender (Ayuso et al., 2016), but, on the other, using these data could potentially reduce a person’s life chances or erect barriers to universal rights such as healthcare (Banerjee et al., 2018). An additional negative side effect of data sharing is that it introduces security vulnerabilities, including the risk of data breaches and cyberattacks (Klumpes, 2023) that target sensitive privacy information, leading to potential theft, manipulation or unauthorised financial transactions, particularly in illicit markets such as the dark web. Moreover, data sharing exposes individuals to heightened identity fraud risk (Piquero et al., 2011), enabling malicious actors to misuse shared data for criminal activities or to disclose private information that may adversely impact a person’s reputation or public image. As a consequence, the regulators started limiting the usage of consumer data to protect the consumers’ privacy and prevent personal data from misuse linked to discrimination, and other such improper use (Delcaillau et al., 2022).

A number of regulations were put in place, including Europe’s General Data Protection Regulation (GDPR) and Data Act. The GDPR is designed to protect natural persons and set restrictions on personal data and the free movement of such data (European Commission, 2016), where personal data is any piece of information that can identify an individual, including digital identifiers such as IP addresses, cookies, digital fingerprinting and location data (Goddard, 2017). Under the GDPR, only the minimum amount of data needed for a specific purpose can be collected, and these data must be checked for accuracy, kept updated and stored in a secure way. The European Data Act enables connected device users to access and share the data they have generated and share such data with third parties to provide aftermarket or other data-driven innovative services (European Commission, 2022). Another of the regulation’s aims is to prevent companies in a dominant position from resorting to unfair practices to obtain data from weaker organisations, and it also puts the spotlight on the compensation due to businesses for making data available.

This corpus of regulations is intertwined with the Open Finance framework, whose arrival in the financial industry has placed customers and their willingness to share data at the centre of operations in a range of services, from banking and insurance to asset management (De Pascalis, 2022; Grassi et al., 2022a; Standaert & Muylle, 2022). On the contrary, in the past with Open Banking, the only sphere involved was bank account information (Chan et al., 2022). Similarly to Open Banking, Open Finance is based on the principle that data supplied by and created on behalf of financial services customers are owned and controlled by the customers themselves (FCA, 2022), increasing the democratisation of finance, in that customers can provide raw financial data to their current financial services providers and also, critically, to the latter’s competitors (Zetzsche et al., 2020). Other challenges for insurers are the risk of disintermediation, loss of reputation and transformational failure (Gozman et al., 2018). In Europe, the legal position is that customer data can only be accessed and reused with the customer’s consent (European Commission, 2020), including when the purpose is to provide a range of financial services, and data are always subject to data protection rules and security safeguards (European Commission, 2021; European Commission, 2023).

In previous studies on Open Banking, this topic was either perceived in technical terms (Farrow, 2020), or else it dealt with industrial competition (Buckley et al., 2020; Ramdani et al., 2020) or potential new sources of risk (Chan et al., 2022). In this paper, we are saying that the debate is ready to move on. Customer centrality, or, better still, the customers’ willingness to share data, has become strategically significant in the financial system, because, if customers are reluctant to do so, banks and insurance companies will be powerless to build on their data. Consequently, understanding whether such willingness exists is now fundamental, as is what form such willingness takes according to the data to be shared or the expected benefits of such sharing. On this point, this paper is, to the best of our knowledge, one of very few studies that considers the customers’ point of view on digitalisation and Open Insurance, capturing this significant feature alongside upcoming works such as that by Kandil et al. (2024).

Gaining this insight is thus the focus of this study, where we investigated the sharing of data and, more specifically, how willing customers are to share data with their insurance company. To ensure that our results were sound, we varied the type of data to be shared (in our case, data on the person’s physical health, safety and security measures installed at home, driving style, journeys and travels, concerning their family and linked to their social network profiles). We also varied the rewards underpinning why customers are willing to share their data (proposals for personalised products and services that meet a specific need, lowering their perceived insurance claims risk and insurance premiums adjusted to their habits and behaviour). After formulating our research questions about people’s willingness to share their data, we explored the topic using information retrieved from a large-scale survey on 1501 consumers.

After these initial considerations, the study detailed in this paper sets out an outline of the theoretical background (Sect. 2), followed by the description of the underlying methodology (Sect. 3), and the presentation and discussion of the results (Sect. 4). Lastly, the main points emerging from the study are drawn together in our contribution (Sect. 5) and conclusions (Sect. 6).

2 Theoretical background and research questions development

While the authorities and the financial players (incumbents and new fintech entrants) are turning to Open Finance (Adke et al., 2022), as of today, there is very little published work on the central role held by customer data in this industry.

So far, data-centricity in the financial industry has been studied mainly along three streams. The first is “open data in finance”, and it relates to publishing financial information data (e.g. accounting data, budget data, personal income data) in public databases or ledgers (e.g. Bolgov & Filatova, 2022; Parkhimovich & Minina, 2017). The second stream concerns “openness” in the sense of transparency in data tracing systems—for example blockchain ledgers or distributed ledger technology—that underpin decentralised financial systems (Ali et al., 2019) and DeFi (Grassi et al., 2022b; Smith, 2021). The third stream is open data as the basis for developing new areas of financial crime and fraud (De Koker & Goldbarsht, 2022). What is lacking in all three streams is a thread that concentrates on analysing the customers’ willingness to share their data, which is the keystone principle on which Open Finance is built, as is all future regulation, because Open Finance seeks to give customers back the control of their financial data. Where there are no particular technological matters about data sharing—in the sense that Open Finance is built on Application Programming Interfaces (API)—others crop up in the field of human will and privacy considerations, or in relation to the final goal, e.g. what specific innovative product, what specific new service.

From a theoretical perspective, privacy calculus theory suggests that consumers evaluate the trade-off between the perceived benefits of sharing their personal data and all the associated privacy risks. It follows that consumers are only willing to share their data when the perceived benefits outweigh the potential loss of privacy (Gao et al., 2015; Liu et al., 2016; Ryu, 2023) and that privacy is not an absolute value (Jabbar et al., 2023). In order to make an impact on the final weight, organisations wishing to increase the likelihood that their customers will share personal data may take actions relating to perceived benefits and costs, such as improving rewards and limiting the effect of costs. For instance, personalised benefits and social and monetary rewards were all found to play a role in the customers’ decision process (Blakesley & Yallop, 2020; Li et al., 2010; Tang & Ning, 2023). Additionally, customers do not take their decision once and for all for each organisation, but are free to express their choice every time that organisation asks for (personal) information (Dinev & Hart, 2006; Laufer & Wolfe, 1977; Vimalkumar et al., 2021), potentially leading to a situation where people change their minds continuously. What is more, individuals can even base their decisions on personal features (Trepte et al., 2017). The context also matters, both in terms of trust, requesting stakeholder and purpose for which the data is to be used, and also in relation to perceived control over the situation and kind of data (Anderson & Agarwal, 2011; Gutierrez et al., 2019; Rahman, 2019).

Although privacy calculus theory has been widely employed to examine consumers’ self-disclosure in technological contexts (e.g. e-commerce websites, location-based services, social networking sites, and so on), the financial industry has been largely overlooked (von Entreß-Fürsteneck et al., 2019; Yang et al., 2020), and only a few papers explore the subject. Yang et al. (2020) found that people using mobile payment apps felt that revealing personal information gave them access to beneficial financial services, but, conversely, it was often linked to lesser perceived value and lower mental well-being. Yuan et al. (2022) introduced the concept of trust in institutions, as a structural assurance for consumer rights, the protection of which is an obvious insurance industry feature. Chatterjee et al. (2023) highlighted the customers’ feelings of misuse when sharing data with financial institutions. In the insurance industry, the extant literature has mostly focused on car- and health-related customer data. For example, Festic et al. (2021) looked at the Swiss willingness to share tracking-device data with insurance companies in exchange for a financial benefit, concluding that Swiss people rate the benefits higher than the potential risks. The authors established that 43% would generally be willing to share their data, finding no significant difference in education, but a weak tendency among older people and women to be less willing to do so. Von Entreß-Fürsteneck et al. (2019) analysed a sample of 103 respondents in a scenario-based experiment, showing that the positive effects of privacy benefits are partly dependent on data sensitivity. As the level of sensitivity for different kinds of data can be instrumental in people’s willingness to share data, because data can be perceived with varying privacy trade-offs, we formulate our first research question as follows:

RQ1

Is the type of data related to the customers’ willingness to share their data?

Thus, under the privacy calculus theory, customers will accept loss of privacy if it rewards them in ways they would not otherwise secure (Fernandes & Costa, 2023). What also emerges is that a specific reward can play a part in driving the customer’s decision about their willingness to share data (Hsieh & Li, 2022). Similarly, Tenopir et al. (2011) found that willingness to share data can be linked to given conditions, as in a transaction where consent is given in exchange for personalised services (Huang & Huang, 2023; Kang & Namkung, 2019; Lv & Wan, 2019), or monetary rewards (Bijlsma et al., 2023; Huang & Huang, 2023; Tsarenko & Rooslani Tojib, 2009). According to Beke et al. (2022), rewards can be grouped by type into, for instance, performance, psychological and financial dimensions. Sharing performance-type information helps companies gain a clearer understanding of their customers’ needs and preferences, and thus create personalised products and services (Simonson, 2005; Wedel & Kannan, 2016). At the same time, when customers share information, it affects how they feel about the firm, producing psychological consequences (Acquisti et al., 2015) that range from feeling they are special, to having the sensation that they are under observation and being controlled. Lastly, sharing data can bring customers monetary gains or lower costs (Ravula, 2022), meaning that some rewards are more strictly associated to the financial dimension. Apropos of financial rewards, von Entreß-Fürsteneck et al. (2019) found that they are a strong positive indicator for a customer’s willingness to disclose self-tracking data. It follows that we can formulate our second research question thus:

RQ2

Are different reward dimensions related to the customers’ willingness to share their data?

Aligned with privacy calculus theory, a consumer’s decision and willingness to share personal data hinge upon multifaceted factors, prominently innovativeness, individual characteristics and pre-existing knowledge (Fox et al., 2022; Kehr et al., 2015; Yuan et al., 2022). Today’s digitally empowered landscape, with heightened expectations for personalised and empathetic services (Adke et al., 2022), includes a spectrum of openness towards novel concepts (Agarwal & Prasad, 1998), exemplified by variations in people’s positive reception to innovations in services such as insurance (Gironda & Korgaonkar, 2018). A person’s inclination towards innovativeness and early adoption is intricately linked to his or her knowledge and experience in the relevant domain (Briones de Araluze, 2022), and these are considered pivotal factors in shaping an individual’s behaviour (Karjaluoto et al., 2002). More broadly, personal features were found to be both relevant and significant when investigating customer decisions relating to innovation in finance (Karjaluoto et al., 2002). Van Dijk (2020) found that younger, more affluent male members of a society tend to reap more benefit from their internet usage and know how to manage the associated risks more effectively than other users. Chen et al. (2023) observed a gender gap in people’s use of innovative financial products, in their willingness to use new financial technology and readiness to switch to a new entrant. Women are also found to be more conservative about their privacy, more concerned about sharing information and generally more anxious about the implications of data-sharing on their personal safety (Armantier et al., 2021; Cho & Hung, 2011; Rowan & Dehlinger, 2014). Age is also relevant in matters of innovation and data sharing, as it influences the risk–benefit trade-off (Fernandes & Costa, 2023; Wottrich et al., 2018). Tenopir et al. (2011) found that there is a younger vs older difference in the likelihood of agreeing to share one’s data, Goldfarb and Tucker (2012) highlighted “large” effects, and Wottrich et al. (2018) noted that privacy is more important for older than younger people. Self-disclosure behaviour was found to differ between younger and older individuals (Parker & Parrott, 1995). It emerges overall that older people are much less likely to reveal personal information than younger people, they have higher privacy concerns, and are more cautious about sharing their data. Education is another factor frequently considered under privacy calculus theory (e.g. Duan & Deng, 2021; Fox, 2020; Tran & Nguyen, 2021). Al-Ashban and Burney (2001), Suoranta and Mattila (2004), and Howcroft et al. (2002) found that education plays a part in financial innovation and in the adoption and use of innovative services, and Or and Karsh (2009) that it impacts on people’s privacy concerns and adoption intentions. The possible explanation according to Belanger and Crossler (2011) is that more highly-educated people may be more aware of the potential benefits and risks associated with data sharing. Additionally, there is apparently a kind of cultural predisposition towards being willing to share data which is linked to where a person belongs (Robinson, 2017). Putting all this information together, personal attitude is likely to affect someone’s willingness to share data rather than the broader categories of features into which they can be placed (such as gender, age, occupation and geographical location). Our third research question thus is:

RQ3

Is personal innovativeness related to the customers’ willingness to share their data?

3 Methodology

3.1 Data gathering

In order to achieve the purpose of our study, and explore the willingness of customers to share data, in September 2020 we conducted a web-based survey on a sample of Italian consumers who had looked up at least one thing on the internet in the one month prior to the analysis. The questionnaire was written in Italian and pre-tested by two scholars, four PhD students and research fellows, and five C-level managers at financial outfits (both incumbents and startups). All testers were encouraged to comment on the questionnaire’s clarity, and on weaknesses in the survey’s design and instrumentation (Ciunova-Shuleska & Palamidovska-Sterjadovska, 2019). We used a quota sampling approach to collect the answers (Bernard & Bernard, 2013; Demmers et al., 2018), in order for our sample to be significantly representative of the overall population, in terms of gender, age, geographical area of residence and occupation. In total, we received 1501 completed questionnaires. To our knowledge, this is one of the largest consumer surveys on innovative financial services conducted in Europe and the first investigating how willing customers are to share their data.

We asked the respondents individually about their willingness to share data for a reward, specifically scrutinising six types of data in our study, each type reflecting a different level of privacy (i.e. physical health, home safety and security, driving style, travel, family information and social network profiles). Looking at the rewards offered, our survey covered three situations where customers can share their data with their insurance company, namely proposals for personalised products and services that meet a specific need, a lower perceived insurance claims risk and insurance premiums adjusted to the customers’ habits and behaviour. We thought that these three situations would clearly outline the benefits for customers of sharing their data, and ensure that the trade-off with the perceived benefits was perfectly plain. In each situation, the respondents were either willing to share their data (code 1) or had no intention to do so (code 0). The answers were all coded and recorded in the database.

3.2 Data analysis

Considering our research questions and the richness of our data, we opted for a test of independence for both RQ1 and RQ2. Because we are dealing with categorical variables relating to different kinds of data to be shared plus a variety of rewards, in order to test for independence of choice in statistical terms, we performed a bivariate analysis using Pearson’s chi-square test, a method frequently used to test relationships between categorical variables (Adam et al., 2018; Mishra et al., 2019).

For RQ3, we opted for a multivariate multiple regression, as this would give us multivariate results. We ran three distinct models simultaneously, modelling the consumers’ willingness to share their data in return for a given reward. Model 1 relates to proposals for customised insurance products and services (Custom). Model 2 relates to the customer being seen as a lower perceived insurance claims risk (Risk). Model 3 relates to the customer securing insurance premiums adjusted to personal habits and behaviour (Premium). This enabled us to examine the differences from case to case, instead of using separate probit regression analyses for each outcome variable. Operationally, we used MANOVA and mvprobit in Stata to explain the variation in the likelihood that people share data as a function of the consumers’ personal innovativeness and demographic characteristics.

Consistently with previous literature relating to privacy calculus theory, our independent variable is the consumer’s personal innovativeness (innovativeness), while we controlled for gender (gender), age (age), education (educ), occupation (job) and geographical area. The geographic area is a combination of two items, the area where the consumer is located (geo_area) and the size of his or her town or city (geo_scale). We explicitly asked respondents about all these items in the survey. The consumers’ personal innovativeness (innovativeness) was measured on the basis of several digitally-enabled insurance services. The services we tested were buying insurance where the premium as calculated on behaviour (z1), buying on-demand insurance (z2), managing claims on a smartphone (z3), altering insurance cover digitally (z4), accessing telemedicine services/remote medical consultations as part of the insurance policy (z5), and buying/renewing insurance policies digitally (z6). For each service, participants could either answer “I’ve never heard of it” (code 1), “I’m not familiar with it but I’ve heard about it” (code 2) or “I’m very familiar with it” (code 3). In our sample, 86 respondents (5.7%) achieved the maximum score, meaning that they were rated as highly innovative, while 157 (10.5%) only reached the minimum score, showing a poor mental openness to innovation. However, these data are highly correlated, so we clustered the different types of information together through factor analysis to obtain a smaller set of uncorrelated variables, and also to create an index that we could use to measure people’s innovativeness. We obtained a single factor (Eigenvalue 3.69, Table 1), which we labelled innovativeness. This variable weighs about equally for each of the variables (factor loadings from 0.74 to 0.84), thus, the higher its value, the more a person is receptive to innovation.

Table 1 Factor loadings in the factor analysis, i.e. the weights between each variable and the single factor (innovativeness)

3.3 Data modelling

As shown in Table 2, we obtained three simultaneous regressions (our three models), each with six specifications based on the six types of data we were examining (i.e. physical health, home safety and security, driving style, travel, family information, social network profiles):

$$\begin{aligned} Custom_{j} & = \beta_{1,j,0} + \beta_{1,j,1} \times innovativeness_{j} + \beta_{1,j,2} \times gender_{j} \\ & \quad + \beta_{1,j,3} \times age_{j} + \beta_{1,j,4} \times educ_{j} + \beta_{1,j,5} \times job_{j} + \beta_{1,j,6} \times geo\_area_{j} \\ & \quad + \beta_{1,j,7} \times geo\_scale_{j} + e_{j} \quad per \, each \, j \, data \\ \end{aligned}$$
(1)
$$\begin{aligned} Risk_{j} & = \beta_{2,j,0} + \beta_{2,j,1} \times innovativeness_{j} + \beta_{2,j,2} \times gender_{j} + \beta_{2,j,3} \times age_{j} \\ & \quad + \beta_{2,j,4} \times educ_{j} + \beta_{2,j,5} \times job_{j} + \beta_{2,j,6} \times geo\_area_{j} \\ & \quad + \beta_{2,j,7} \times geo\_scale_{j} + d_{j} \quad per \, each \, j \, data \\ \end{aligned}$$
(2)
$$\begin{aligned} Premium_{j} & = \beta_{3,j,0} + \beta_{3,j,1} \times innovativeness \, + \beta_{3,j,2} \times gender_{j} \\ & \quad + \beta_{3,j,3} \times age_{j} + \beta_{3,j,4} \times educ_{j} + \beta_{3,j,5} \times job_{j} + \beta_{3,j,6} \times geo\_area_{j} \\ & \quad + \beta_{3,j,7} \times geo\_scale_{j} + t_{j} \quad per \, each \, j \, data \\ \end{aligned}$$
(3)

where Customj, Riskj, and Premiumj stand for the three different cases of yi,j, i.e. willingness to share data j with one’s insurance company for proposals for customised products and services (y1j, Customj), for one’s insurance claims risk being lowered (y2j, Riskj) or for insurance premiums being adjusted according to one’s personal habits and behaviour (y3j, Premiumj). These three situations and relative rewards also vary by type of j data shared (i.e. physical health, home safety and security, driving style, travel, family information, social network profiles) where innovativeness is the independent variable, and gender, age, educ, job, geo_area and geo_scale are the control variables, i represents the three rewards, j represents the six types of data, β are the coefficients to be estimated and ej, dj, tj are the error terms.

Table 2 Description of the different models estimated simultaneously

4 Results and discussion

Looking at the study sample (Table 3), the participants are between 18 and 74 years of age, with an average age of 44, and gender is well-balanced, with 761 women and 740 men. On average, they are well-educated and hold at least a secondary school diploma (80.4%). A good percentage of people in the sample describe themselves as white collar employees with a permanent employment contract (402 participants, 26.8%), while there are fewer doctors (5, 0.3%) and farmers/labourers (8, 0.5%).

Table 3 Variables description and descriptive statistics

From an initial descriptive analysis on our dependent variables (yij), the data show some meaningful patterns. Out of the total sample, 314 respondents (20.9%) are willing to share their data, irrespective of the data or reward (i.e. any data and reward are acceptable). On the contrary, several are not willing to share their data, no matter what kind of data is shared or what reward is offered (290 cases, 19.3%). These people apparently perceive only the risk associated with data sharing, without taking into account the level of perceived privacy for each piece of information and the associated benefit. Age is apparently not a factor, with respondents ranging from 19 and 74, and neither is gender, as 126 are men and 164 women, although home makers and unemployed people make up a higher quota than in the sample overall (12% and 17% vs 8.8% and 10.7%, respectively). In this group, the share of respondents with a secondary school diploma is slightly above the sample average (57.6% vs 52.6%), while the share of those with a degree is slightly lower (21.0% vs 27.8%).

4.1 Relevance of kinds of data and rewards in the customers’ willingness to share their data

If we consider the role of perceived privacy associated with the kind of data to be shared (of the six types of data we tested), it seems that while people can be more or less willing to share data on physical health, house safety and security, and driving style, the average percentage follows a similar pattern, and it flips to the inverse situation (i.e. from positive to negative, and vice-versa) for data on social network profiles and family, while travel-related information straddles the cross-over point (Fig. 1). It also emerges clearly that customers are particularly reluctant to share their social network data, as the peak for willingness to share these data was the lowest of the six types of data (about 40%, and the result is consistent for any reward or benefit). This finding does not sit immediately well with the fact that social network data are intrinsically data to be shared, they are public by definition but above all by choice, and the related level of privacy should be as low as possible. There could thus be two explanations for this situation. (i) While social network data are innately shared data (Giovannetti & Hamoudia, 2022), customers may not rationally understand that their social network data are public anyway (and can be accessed by insurance companies, and any company in general). (ii) Alternatively, it could be that customers’ are thinking about and so referring to the more private part of their social network, where information is exchanged bilaterally or in small groups. People’s behaviour runs along similar lines for data about their family (their willingness to share this information is about 50%). Conversely, people are much more ready to share information on their driving style (60–67%), which we explain through the mechanism of black boxes installed in cars.

Fig. 1
figure 1

Willingness to share or not to share j data (physical health, home safety and security, driving style, travel, family information, social network profiles) for i rewards (proposals for customised products and services, lower risk of insurance claims, insurance premiums adjusted to personal habits and behaviour)

We believe that current decisions are also influenced by a customer’s normal interaction with his or her insurance company. While customers already have to give their insurance company data on their physical health, house safety and security, driving style and travel in order to access certain policies (life, health, fire, car, travel insurance), as of today, they share their data on social network profiles and family with their insurers only very infrequently. Customers may perceive these data as more sensitive in part for this reason, seeing more risks than benefits for themselves, as set out in privacy calculus theory.

Our preliminary descriptive analyses show also that the reward plays a significant part in the customers’ decision. Regardless of the kind of information shared, people are always more willing to share specific information if the associated benefit is linked to the insurance company offering premiums adjusted to their personal habits and behaviour (Fig. 1). At the same time, they are always less willing to share these data if the reward is a proposal for customised products and services. Between the two lies a lower risk of insurance claims, which results in a better reward (i.e. seen to have a more favourable risk–benefit trade-off than proposals for customised products and services, but worse than a customised insurance premium). The implication is that customers (and consumers more in general) seem most attracted by financial rewards, followed by psychological rewards and lastly by those related to performance, in line with the classification given by Beke et al. (2022). As shown in Fig. 1, the maximum peak in data-sharing willingness relates to customers sharing data in exchange for their insurer adjusting the premium on the basis of their personal habits and behaviour, with the most common method so far being for them to share data on their driving style via black boxes installed in their cars.

For insurers (and indeed others), knowing that they can offer alternative rewards and collect the same kind and potentially more or less the same amount of data, could help to bring down the collection cost while maximising the amount of data collected. While the discussion so far has been about customer acquisition costs (Gupta et al., 2004), in the new Insurtech landscape (Stoeckli et al., 2018), and in an economy where data are considered the new oil (Hirsch, 2013), the “data acquisition cost” will be very central to understanding how the modelling evolves, especially artificial intelligence modelling, riding the technology wave of Open Data (O’Leary et al., 2021; Perkmann & Schildt, 2015).

4.1.1 Chi-square test of independence

We started from analysing people’s willingness to share different kinds of data for a specific reward (those listed previously). We specifically want to understand whether, if we take the reward as a constant, the end users’ decisions about sharing their data or not depend on the data they are asked to share (RQ1). The three analyses gave us a p-value of 0.000 in each case (chi-square with five degrees of freedom = 212.7345, 294.2543 and 300.7776, respectively), allowing us to conclude that not only is there a statistically significant relationship between the kinds of data to be shared and willingness to share them, for each given reward, but also that this statistically significant relationship is consistent for all the rewards. This finding implies that we have found a relationship between the kind of data to be shared and people’s willingness to share the data, and this relationship holds for different rewards.

Similarly, we tested people’s willingness to share a given kind of data considering all the various rewards one by one i.e. we tested whether there is a relationship between the reward and consumers’ willingness to share data, to help us understand whether the risk–benefit trade-off in the decision is dependent on the reward (RQ2). Here, there are differences for the six kinds of data (physical health, house safety and security, driving style, travel, family information, and social network profiles). The results indicate that there is no statistically significant relationship between the type of reward promised and willingness to share data if the data relates to social network profiles (chi-square with two degrees of freedom = 3.2590, p-value = 0.196) or is family information (chi-square with two degrees of freedom = 0.8454, p-value = 0.655). This result means that, when people are asked to share social network and family data, varying the rewards has no effect on inducing the sharing of data and the willingness to do so will be, on average, quite negative (see Fig. 1).

On the contrary, there is a statistically significant relationship between the type of reward promised and willingness to share data if the data concerns physical health (chi-square with two degrees of freedom = 16.1776, p-value = 0.000), home safety and security (chi-square with two degrees of freedom = 5.0956, p-value = 0.078), driving style (chi-square with two degrees of freedom = 15.2741, p-value = 0.000) or travel (chi-square with two degrees of freedom = 7.7220, p-value = 0.021). Returning to privacy calculus theory, we can conclude that consumers associate different levels of privacy to the different kinds of data they are asked to share, and this level of privacy for each of the six kinds of data affects the risk–benefit trade-off. Thus, we could divide data into more privacy-sensitive and less privacy-sensitive information.

4.2 Relevance of personal innovativeness for the customers’ willingness to share their data

Overall, the multivariate analysis of variance (MANOVA) in all three situations (a customer’s willingness to share their data in return for proposals for customised products and services, for lowering their designated risk of insurance claims, and for their insurance premium to be adjusted on the basis of their personal habits and behaviour) yielded significant results in the multivariate test statistics. In the three situations, Wilks’ Lambda, Pillai’s trace, Lawley–Hotelling’s trace and Roy’s largest root all have 0.0000 p-values, confirming that all the equations—for each type of data—are together statistically significant.

Several interesting results emerged from the mvprobit estimation (Table 4), which can be grouped into the following observations.

Table 4 Results of the multivariate probit model estimations

Firstly, the results are robust in supporting our findings that personal innovativeness is related to the consumers’ willingness to share their data (RQ3). The statistical significance is always below 1%, independently of the data shared and the reward expected. Furthermore, in all cases, the coefficient β always has a positive impact. It is worth remembering that a maximum score for innovativeness means that that a person’s overall innovativeness level is very high. Thus, the positive β must be read as the fact that the more consumers are familiar with innovative insurance services, the more they are willing to share their data. In other words, if someone is open to innovation, that person will be more ready to share any data (of the six types tested) and for any reward (of the three tested). Our interpretation of these findings is that, set against the background of privacy calculus theory, there are in reality some aspects that will weigh heavily on the risk–benefit balancing act, both personally and recursively for society. In substance, it is as if the more innovative people, those most familiar with innovative insurance services, weigh up the specific rewards and benefits that sharing their data could bring them, and consequently move immediately in a direction that is certain, that of sharing their data, regardless of the data in question or the specific reward. The result is even more significant for policyholders, who are investing extensively to spread and increase financial innovativeness and education among consumers across the world. For insurance companies, this result could mean that, to convince customers to share their data more than they do today, they could first act on their customers’ knowledge of financial innovation, and be the bridge connecting to greater openness in sharing data.

Looking at the control variables, similarly to Chang et al. (2022)’s suggestion that different value dimensions are relevant at different points in the purchase-related decision-making process, we found that different customer features are relevant in terms of sharing different kinds of data associated with a variety of rewards.

Our results show that gender is statistically significant only when people share data on their social network and about travel, but, even then, only when the reward is about adjustments to their insurance premium (β = − 0.15 and β = − 0.11, p-value = 0.024 and 0.075, respectively). As we found, men are more willing than women to share their social network data, but this is nothing new. Taddicken (2014), especially, noted that gender differences in disclosing sensitive data only exist when such data can be publicly accessed, which is the situation for social network data. Our research expanded the argument, showing that the same also applies to other sensitive data, those linked to journeys and travels. Looking at the rewards, the fact that there is no specific gender factor can be linked to men and women being “virtually indistinguishable in their typical insurance coverage” (Brown & Finkelstein, 2007), meaning that they act in almost the same way, and also to the possible changes taking place year by year. Specifically, in 1995, women were found to be generally more willing to reveal personal information and more often than men (Parker and Parrott, 1995), a difference that today seems more tenuous albeit still there, obviously after controlling for other factors.

Secondly, the regression coefficients and their respective statistical significance show that age matters when sharing data. Age plays a substantial part in a person’s willingness to share social network data, travel data or family data, whatever the reward (p-value ranging from 0.000 to 0.008). Additionally, the coefficient is always negative, meaning that younger people not only act differently to older ones, but also that people’s willingness to share these data decreases as they get older. Said otherwise, younger people are more open to sharing these data, in return for a reward, in all situations where age plays a significant part in a person’s willingness to share data. Additionally, age comes into play when people share their health data to receive customised products (β = − 0.005 and p-value = 0.057) and to reduce their classified risk of insurance claims (β = − 0.006 and p-value = 0.009).

Lastly, our results support the finding that higher educated consumers are less willing to share their data, especially social network data. To start with, when significant, the coefficient is always positive, with the variable’s highest values being related to the lower levels of education (Table 3). The effect becomes significant for social network data when any kind of reward is on offer (β = 0.11, 0.14 and 0.10 and p-value = 0.021, 0.007 and 0.040, respectively), for health and travel data when the reward is to receive customised products (β = 0.08 and β = 0.13, p-value = 0.099 and 0.008, respectively), and for family data when the reward is a lower insurance claims risk (β = 0.10 and p-value = 0.044). Overall, the results indicate that insurance companies should leverage on well-educated and innovative customers to increase the amount of data they collect, where possible targeting younger, well-educated and preferably male customers. However, not all customers act the same when offered the same rewards. This finding highlights a good opportunity for segmentation by kind of data the insurance company would like to collect and kind of customer it wants to attract. Policy makers should be wary concerning the specific findings of this survey. If insurance companies follow them in toto when collecting data from their customers, they will gather and store data that are not representative of the entire population. As a consequence, there could be possible biases when those charged to do so interpret the analytics built on these data, and thus lead to further potential segmentation or new kinds of financial exclusion (Urueña-Mejía et al., 2023), and potential new target customers could be selected on the basis of features that drive their unwillingness to share data. By contrast, we should issue a warning to new generations and technophile consumers who may underestimate the value of their data.

5 Contribution

In this study, we refer to three streams of literature, and our intention is to contribute to all three.

Firstly, we add to the stream on Open Finance and Fintech more broadly. Briones de Araluze (2022) noted that, currently, the literature is lingering over service provider infrastructure and the ecosystems around new entrants. Against this backdrop, we decided to focus our research on the consumers’ willingness to share their data, and we believe that the future of the industry will pivot around this consumer willingness. We are not making the claim that Open Finance is definitely the channel whereby the financial system will become a true electronic market, sharing Dratva (2020)’s concerns. It is also probable that there will be no single interpretation of Open Finance on the international stage, because of different regulations and contexts (e.g. advanced vs emerging economies), as was previously the case with Open Banking (Rastogi et al., 2020). What we believe, however, is that the financial industry can only go in the direction of placing the customer at its core (Grassi et al., 2022a), and the sharing and willingness to share Open Finance-enabled data will set the tone. The competitive evolution in the financial industry will also be played out in this setting, with an increase in Fintech and Insurtech startups. On their side, the Big Techs will carve out an even stronger foothold in the sector, helped by laying their hands on more customer data, without being equally committed to sharing them with third parties (He et al., 2023; Standaert et al., 2020).

Secondly, we contribute to the stream on privacy. To our knowledge, we are the first research group to apply privacy calculus theory in the field of insurance. The only antecedent we were able to find was Wiegard et al. (2019), where, however, the main purpose of the research was to analyse the success of wearable technology for insurance companies. We have made the assumption that the insurance field is a case apart and its context has an impact on our contribution to the theory. Insurance companies differ from every other kind of business because of the trust that develops between customer and insurer owing to the insurer’s part in limiting potential financial losses that result from damages (Robinson & Botzen, 2022). Thus, we assume that trust in the insurance company plays a significant role in one’s decision to share data (Alashoor & Baskerville, 2015; Kang & Namkung, 2019). Additionally, our study focuses on willingness to share data, as it has been shown in previous research that coercion to share private data with insurers is wrong because it violates the autonomous choice of a privacy-valuing client, but it could also prevent customers from acting spontaneously and authentically (Loi et al., 2022).

Thirdly, we contribute to the stream on the economics of data and incentives in data sharing, noting that theoretical frameworks on how data affect output, privacy and consumer welfare have been developed in previous research (e.g. Bergemann & Bonatti, 2019; Jones & Tonetti, 2020). However, these frameworks were not directly applicable to Open Finance for several reasons. In Open Finance, under the current regulations in force at least in Europe, it is clearly the case that (i) customers cannot be paid to sell their data, and they should not pay anyone for doing so at their request, (ii) data cannot be shared or sold to third parties, if not under the explicit request of the customer, (iii) financial institutions cannot “bribe” their customers and arrange matters so that they will be charged less in exchange for agreeing not to share their data with others.

6 Conclusion

In the financial landscape, the centrality of customer and data have become even more evident with the arrival of the Open Finance framework. Consumer data sharing has significant implications for the insurance industry, and affects a number of aspects including risk assessment, pricing, underwriting and claims management. Insurers that can secure granular, real-time data can model risks more accurately and offer insurance products tailored to their individual customers’ needs. Their increased ability in this area has potentially led to improved efficiency, better risk management, cost savings and other such benefits for both insurers and customers. However, the end users’ angle of perception and their willingness to share their data has never been examined with any great interest, neither in academic literature nor by the industry as a whole.

The research covered in this paper studies what influences people’s willingness to share personal data with financial incumbents, adopting the consumer’s perspective. We specifically studied people’s willingness to share different kinds of data (data about their physical health, safety and security at home, their driving style, journeys and travels, concerning their family, and social network profiles) in return for different rewards (from proposals for personalised products and services that meet a specific need, or reducing their insurance claims risk, to insurance premiums calibrated to each person’s habits and behaviour).

Our research, conducted on a panel of 1501 consumers, brings up some interesting insights. The findings shine a light on the pivotal role played by rewards—especially financial rewards such as advantageous insurance premiums—in driving the consumers’ decision to share data. When people are asked to share social network and family data, varying the rewards has no effect on inducing the sharing of data (and the willingness to do so will be, on average, quite negative). On the contrary, there is a relation between the type of reward promised and the willingness to share data if the data concerns physical health, home safety and security, driving style or travel. Further, there is a statistically significant relationship between the kinds of data to be shared and willingness to share them, for each given reward. Thus, consumers associate the data they are asked to share with different levels of privacy, and the level of privacy for each data type is what influences the balance between risks and benefits, and in turn their willingness to share data. Consequently, we can classify data into more privacy-sensitive and less privacy-sensitive information. Lastly, the people most familiar with innovative insurance services weigh up the specific rewards and benefits that sharing their data could bring them, and are more ready to share information regardless.

At the same time, our findings may have implications for public policy decisions. Overall, our results suggest when and how far customers are willing to share information with insurance companies. Our findings also open a debate on fairness, potentially leading to a new kind of financial exclusion. If insurers segment their customer base on features that indicate who is more likely to share a given kind of data of interest, they will collect and store data that are not representative of the entire population, leading to possible biases in the interpretation of the analytics and the models they underpin, especially in the era of artificial intelligence. Furthermore, for the insurers (and other interested parties) knowing that there are different weights for the different rewards offered for the same kind of data could contribute to minimising the collection cost, the “data acquisition cost”, while maximising the data collected. As observed, these results open the way for a more data-driven insurance business, yet we should think about warning new generations and more technophile consumers who may be unaware of the true value of their data.

As in all studies, ours has its limitations and opportunities for future research. Our participants were recruited online so, at the very least, they had access to the internet, although we do not believe that this is a condition for inclusion or exclusion in todays’ world. We conducted our survey in September 2020, an interesting year for digitalisation and digital behaviour, the upshot of the first wave of the Covid pandemic. Since then, the consumers’ behaviour towards digital services may possibly have changed, as so many digital practices and applications are now mainstream, particularly in the financial sector. In addition, another limitation of this study could be the reliability of self-reported information, although the respondents knew they were anonymised, meaning that there should be no concerns about self-disclosure, transparency or honesty. Colleagues may expand on our work by analysing the more qualitative aspects, including for example, how the type of data in question relates to the consumers’ willingness to share their data. Lastly, the setting of our research is the European insurance industry, which is considered to be a developed market, with acceptable concerns on privacy raised by regulators, players and consumers. Factors such as differences in regulatory environments, presence of intermediaries, level of trust, cultural attitudes towards privacy and technology adoption can influence the spread and acceptability of data sharing practices in other contexts, such as those in emerging markets.