1 Introduction

Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. This deals with how users consent to storing, sharing with third parties, disseminating sensitive personal information, and expressing moral preferences like, for example, ticking to pay a decarbonization tax. Mobile apps have been becoming increasingly popular as they can provide users with a wide range of functionalities. For different reasons, apps often require access to intimate information about the users and hosting device, raising privacy concerns. This requires proper management of privacy, with the ultimate aim of protecting users’ preferences as well as personally identifiable information.

The completion of this task has demonstrated its complexity as a result of multiple concurrent factors. The interfaces for privacy choice and consent, which play a critical role in informing users about data collection practices and securing their consent, frequently suffer from inadequate design, resulting in challenges in locating them, navigating their complexities, and occasionally encountering deceptive elements. The aforementioned issue results in consent that lacks adequate information and subsequently contributes to a suboptimal user experience (Cranor and Habib 2023). Additionally, these interfaces frequently utilize dark patterns, which are manipulative design strategies that steer users towards options that offer less protection for their privacy. This further complicates the challenge of obtaining informed consent (Cranor 2022; Habib et al. 2022). It is also observed that users frequently exhibit a tendency to hurriedly navigate through privacy interfaces to quickly access the functionalities of an application, thereby making privacy decisions that may be impulsive and subsequently regrettable (Cranor 2022). Hence, it is very important for application developers and regulators to take into account privacy preferences management systems and undertake thorough testing to ascertain their efficacy in meeting users’ privacy requirements.

In this paper, we focus on the privacy dimension of an ordinary user with little technical knowledge of the privacy mechanisms of the digital systems she seamlessly uses but with an evident moral character. While choosing strict settings may help protect her data, this may prevent the complete availability of the functionalities provided by the software. In contrast, loosening privacy settings mitigates the restriction on functionalities, but it may come with the price of compromising her data privacy. In this respect, profiling technologies can empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. Understanding the commonalities and differences among users based on profiles has been recurrently discussed in data privacy research (Westin 2003; Kumaraguru and Cranor 2005; Inverardi et al. 2023a). Categorizing profiles may contribute to better identification of users’ behaviors and may support administrators in comprehending privacy choices. At the same time, personal profiles may enable the design of functionalities that help users set privacy preferences for the digital technologies they use. Various proposals to categorize or group end-users into clusters based on their security or privacy attitudes/behaviors in specific domains have been made (Qin et al. 2008; Lee and Kobsa 2016). In the app domain, users’ preferences were analyzed in an extensive study (Lin et al. 2014) on permission settings from real Android mobile users to recommend personalized default settings. Analogously, Sanchez et al. (2020) analyzed user-privacy preferences in the fitness domain to recommend personalized privacy settings for fitness apps.

Though the above represents practical achievements in supporting users, as discussed by Liu (2020), to go forward in this direction, there is the need to understand how to characterize user privacy behavior in a technology-independent way. Indeed, privacy is a dimension of ethics and should be part of the ethical profile of a user and driven by ethical considerations besides the contextual attitudes or practices in given domains. Relying on the analysis of current or past users’ preference settings as by Lin et al. (2014) may not guarantee a correspondence between what users believe as their general privacy profile and what they actually (can) do when setting privacy preferences. In fact, there exists the so-called privacy paradox, i.e., inconsistencies between user attitude and their actual behavior (Barth and de Jong 2017), where users claim to be very attentive about their privacy, but they actually ignore important steps to defend their personal data. Moreover, data privacy awareness in the digital society is only recently exiting the specialists’ fields (legal, ethical, economic, social) to impact the wider society. The pandemic has also dramatically advanced the penetration of digital technologies in society from market to education (McKinsey 2020; OECD 2020; Lockee 2021; Service 2021). This means that a large body of collected data from the past concerning privacy settings may not reflect the attitude and attention to privacy that present users have and will have in the future.

In this work, by relying on the data of the study in the fitness domain (Sanchez et al. 2020) that were collected by means of a questionnaire and a simulator, we explore the possibility of profiling user’s privacy preferences in a technology-agnostic fashion.Footnote 1 We analyze both general and domain-specific questions with the aim of:

  • (i) - identifying the set of questions that reflect the moral attitudes of the users and not their use of the technology: the privacy dataset (Sanchez et al. 2020), that has been analyzed in this work, encompasses a comprehensive set of 444 questions, spanning both general and technology-specific topics. These questions were originally employed to delve into various facets of user privacy. Through our research, we applied clustering techniques to this dataset, leading to an intriguing discovery: only a minimal subset of these questions, focused solely on general aspects, is required to grasp the users’ privacy orientations effectively.

  • (ii) - recommending privacy preferences accordingly: Based on the selected data, we design and implement a recommender system to provide users with suitable recommendations with respect to privacy choices. The experimental results are positively interesting, revealing that a compact set of general questions helps distinguish users better than the more complex domain-dependent one.

Thus, the main contributions of this work are summarized as follows.

  • We investigate which sets of (general) privacy questions are more relevant for classifying users with respect to their privacy moral preferences.

  • We show that self-assessment about privacy attitudes given by users does not reflect how they act in practice.

  • We develop PisaRec, a recommender system to provide suitable privacy settings that reflect user preferences. This aims to help users relieve the burden of setting privacy configurations when they go online.

We use well-established clustering methodologies to examine the existing data in order to identify general questions from a domain-specific dataset. In this respect, we aim to introduce a new approach in dataset evaluation and extraction. In this perspective, we are bringing a fresh perspective to the challenge of crafting user profiles, particularly concerning privacy. We utilize the dataset from Sanchez et al. (2020) to discern the selection of representative questions that effectively capture users’ privacy orientations. We empirically show that the questions used by the original system are redundant, and it is possible to use a smaller set of features.

We organize the paper into the following sections. In Sect. 2, we present a motivating example and a categorization of privacy profiles. Section 4 describes the proposed approach, which makes use of both unsupervised and supervised learning to handle user profiles. The methods used to evaluate our approach are detailed in Sect. 5. We report and analyze the experimental results in Sect. 6. Discussion related to the limitations and threats to validity is provided in Sect. 7. We review related work in Sect. 8. Finally, Sect. 9 sketches future work and concludes the paper.

2 Motivating example

The following example illustrates the need for personalized automated privacy assistance that a user may require interacting with multiple systems at a time (Fig. 1).

After a long day at work, Alice is at the subway station. After the pandemic outbreak, she will meet pals at the cinema. She is on time but learns she cannot buy a ticket from the subway station attendant due to rigorous hygiene regulations. In addition, vending machines are out of commission for contactless technology upgrades. Instead, a QR code and simple instructions to buy an electronic ticket online are posted in front of the vending machines. Her train is about to arrive; she opens her camera app and frames the QR code. The site structure appears in a split second, but a popup asks for her privacy settings as Alice scrolls down to find the ticket she needs. Above a very long list of radial button options about disclosing GPS position, information about her mobile phone, consent to save various types of cookies on her device, sharing her list of contacts, etc., she is presented with three buttons: accept all, strictly necessary, decline all.

Fig. 1
figure 1

Alice must use a website to buy her ticket (left) and must manage privacy settings to use the site (right)

Alice is very concerned about her privacy, and when not strictly necessary for the purpose she wants to perform, she does not wish to disclose private information. Since the service she is asking for is simple as asking for a one-ride ticket, she clicks decline all. The next page seems to load slowly, and images and structure are shown in a non-adaptive way, so she has to pinch-in to zoom and scroll to read the text that informs her that a cryptographic key used for her session management cannot be stored due to her preferences; thus, the session is not secure also the page asks her to choose language, timezone, type of device and the web browser she is using, payment options, etc. While reading, Alice realizes that her train is about to arrive at the station. So, she decides to click the back button on her browser, reload the page and click strictly necessary when prompted. The site then stays fast and steady, adapted to the display of her device, prompting if she wants to take a one-ride ticket or a full day one. Her mobile wallet handles the payment instantly, and she receives her ticket just before the train comes. On-time to the cinema, Alice enjoys the film with her friends, soon forgetting the online ticket purchase experience. Her preferences are saved on her phone so that she will buy train tickets quickly and easily in the future (Fig. 2).

Fig. 2
figure 2

Clicking “decline all”, Alice is presented with an unoptimized site (left) while accepting all or the strictly necessary, the site is optimized and the user experiece is fast and easy (right)

Alice does not know that the strictly necessary option, although excluding third-party tracking and marketing, includes all alternatives that are strictly essential to all services offered by the booking site, including – the lower price inter-city ticket that requires GPS tracking, the discounted price for kids that requires age disclosure, discount for army and state officials who must check other installed mobile applications, as well as the train pass app to see if the ticket is part of a booklet, etc.

Analogously to various studies notably (Liu et al. 2016), we believe that software technology should assist Alice in automatically selecting the options that, on the one hand, are needed for what she wants to do and, on the other hand, are compliant with her moral preferences (Fig. 3).

Fig. 3
figure 3

Flowchart for the motivating example

In this work, we show that it is possible to protect users by first understanding their privacy profiles, which can be automatically identified by considering a small set of general and domain-independent questions that are shown to be enough to reflect the user’s moral attitude. Thus, our approach is to categorize personal privacy profiles from an ethical perspective (Autili et al. 2019). Profiles can then be used to automate app and web settings, leveraging recommender systems like in this paper or other technologies.

3 Existing studies and the current research gap

Table 1 Privacy categories according to different taxonomies (listed in chronological order)

Table 1 gives a summary of the most notable taxonomies of privacy categories. Starting from the question: “How important is privacy to you?” from left to right of the table, an increasing level of privacy concerns is shown. Westin (Westin 2003) proposed the first categorization of user profiles with three levels, i.e., Unconcerned, Pragmatists, and Fundamentalist. Since then, there have been other studies that follow up and develop this initial taxonomy. In particular, Dupree et al. (2016) expanded it proposing five categories: Marginally concerned, Amateurs, Technicians, Lazy Experts, and Moral right. Schairer et al. (2019) came up with even more, i.e., six categories, where the answer Little is split into Nothing to hide, and Something to hide; and Quite is made of Trade-off, and Personal Resp. Recently, Sanchez et al. (2020) proposed a more compact categorization, where users are grouped into four categories, Privacy concerns, Unconcerned, Fence-Sitter, and Advanced Users.

As it appears from the table, category refinement happens in the middle category and may depend on the application domain as well as on the amount of input data. Further categories are contextual, and they can be obtained as specialization of personal profiles based on the single user’s experience. In the scope of this paper, as starting point, we opt for three categories as they provide three clearly distinguishable moral attitudes.

In this work, based on a thorough analysis of literature, a taxonomy has been developed to categorize users into three main groups that contain a total of four categories: Inattentive, Involved/Attentive, and Solicitous. This classification encompasses the range of user attitudes towards privacy and is the result of synthesizing extensive scholarly discussions on the topic. The foundation of this approach is based on the insights discussed in our recent work (Inverardi et al. 2023b).

The term ‘Inattentive’ characterizes individuals exhibiting minimal proactive engagement with privacy matters, often overlooking standard privacy-preserving practices. This characterization differs from ’unconcerned’, recognizing a possible lack of awareness rather than deliberate negligence. In contrast, the ‘Solicitous’ category identifies those highly conscious of privacy, actively seeking to understand and mitigate potential threats. Here, ‘Solicitous’ denotes individuals who not only care about their privacy but are also knowledgeable and proactive in safeguarding it. Positioned between these two extremes, the ‘Involved’ and ‘Attentive’ categories acknowledge a middle ground, capturing those who exhibit a basic concern for privacy, balancing it against other priorities. These individuals are neither passive nor zealously active in their privacy practices, reflecting a more moderate, potentially context-driven stance. The approach tries to accommodate the fluidity and spectrum of user attitudes.

4 Proposed approach

Typically, users specify privacy preferences by directly interacting with the privacy settings provided by the used software. Similar to other techniques (Autili et al. 2019; Liu et al. 2016; Liu 2020) we propose an approach that relies on a software layer that automatically identifies privacy profiles and interacts with the user or the software system to recommend suitable privacy preferences accordingly.

Fig. 4
figure 4

The proposed approach

Concerning what we present in this paper, the assisted selection phase of privacy preferences started on training data consisting of general, domain-specific, and app-specific answers given to the questions defined in Sanchez et al. (2020) (see “Full set of questions” in Fig. 4). We have empirically analyzed the full set of questions to identify the corresponding subset (consisting of only general questions) that is sufficient to automatically identify our three user privacy profile categories, i.e., Inattentive, Attentive, and Solicitous (see activity \(\textcircled {1}\) in Fig. 4).

The automated creation of user privacy profiles phase (see activity \(\textcircled {2}\) in Fig. 4) relies on an unsupervised clustering module, which can automatically group users in the training data. The automated assignment of privacy profiles to users phase (see \(\textcircled {3}\)) relies on a supervised classifier using a feed-forward neural network to automatically assign to the given user the corresponding privacy profile among one of those identified in \(\textcircled {2}\). Finally, a recommender system is used to further validate the activities \(\textcircled {1}\) and \(\textcircled {2}\) and to provide users with privacy settings recommendations (see activity \(\textcircled {4}\)) according to the privacy settings of other users belonging to privacy profiles as detected in \(\textcircled {3}\).

Details on the results obtained from the performed Empirical Study are given in the Experimental results section, whereas the activities \(\textcircled {2}\), \(\textcircled {3}\), and \(\textcircled {4}\) are described as follows.

The recommendation process is illustrated in the pseudocode in Algorithm 1. Starting from a list of M users, we cluster them into independent clusters using their privacy settings. Afterwards, the user-setting graph is constructed, in which there is an edge from User \(u_k\) to Setting \(s_l\), if \(u_k\) already opt for \(s_l\). Given User \(u_t\) that needs recommendations, we consider the containing cluster \(U_t\), and computer similarity between \(u_t\) and all the users \(u_u \in U_t\). The collaborative-filtering recommendation technique is used to mine the additional settings from the set of top-K similar users S. The recommendation process is explained in detail in the following subsections.

Algorithm 1
figure a

The recommendation process

4.1 Encoding user privacy profiles with graphs

To automatically encode privacy profiles for a considered set of users, we start from the users’ profiles by mining their settings to construct the collaborative relationship between the users and their settings. For illustration purposes, in Table 2, we depict a simplified scenario related to the mobile app domain (Scoccia et al. 2021; Sanchez et al. 2020) with a set of four mobile phone users, i.e., \(u=(u_{1},u_{2},u_{3},u_{4})\), and five permissions, i.e., \(s=(s_{1},s_{2},s_{3},s_{4},s_{5})\). For instance, by analyzing \(u_{1}\), we know that the user has adopted three options, i.e., Camera, Phone, and Contacts, i.e., they allow the app to access their camera, phone, and contact by turning on \(s_{1},s_{2},s_{5}\), while prohibiting the app to get information about location and storage, thereby disabling the remaining two permissions \(s_{3},s_{4}\). Accordingly, by the column representing \(u_{1}\), the three corresponding rows are checked. Following the same process, we can extend the table by reading the other users’ profile.

Table 2 Privacy settings

Starting from the table, we are able to construct the user-setting graph as shown in Fig. 5. In this graph, a node represents either a user or a setting, and there is a directed edge between a user and a setting if the user has opted for the corresponding permission. Eventually, we employ a clustering process by relying on the resulting graph. This representation is also used by the developed neural network for classifying users presented in Sect. 4.2.

Fig. 5
figure 5

Graph representation of users and privacy settings

Each user u is represented by a vector \(\phi =(\phi _{1},\phi _{2},..,\phi _{F})\), where \(\phi _{i}\) is the weight of term \(s_{i}\), computed as the term-frequency inverse document frequency value as follows:

$$\begin{aligned} \phi _{i} = f_{s_{i}} \times log \left( \frac{ \left| P \right| }{a_{s_{i}}}\right) \end{aligned}$$
(1)

The similarity between two users u and v is computed using their corresponding feature vectors \(\phi =\{\phi _{i}\}_{i=1,..,F}\) and \(\omega =\{\omega _{j}\}_{j=1,..,F}\) by means of the cosine similarity function:

$$\begin{aligned} sim(u,v)=\frac{\sum _{t=1}^{n}\phi _{t}\times \omega _{t}}{\sqrt{\sum _{t=1}^{n}(\phi _{t})^{2} } \times \sqrt{\sum _{t=1}^{n}(\omega _{t})^{2}}} \end{aligned}$$
(2)

where n is the cardinality of all settings that were set to 1 by both u and v. Intuitively, u and v are characterized by using vectors in an n-dimensional space, and Eq. 2 measures the cosine of the angle between them. As an example, in Fig. 5, we see that the two users \(u_2\) and \(u_4\) are similar since they both set two settings \(s_1\) and \(s_3\).

A set of n users is grouped into \(\kappa\) pre-defined number of clusters, with the aim of maximizing both the similarity among instances within a single cluster, and the dissimilarity among independent clusters. To this end, we calculate the distance between every pair of users and feed as the input for the clustering engine. The K-medoids algorithm (Kaufman and Rousseeuw 1990a; Park and Jun 2009) has been chosen to group users into clusters due to its simplicity and efficiency.

In the clustering process, the distance scores, computed as \(d_{C}(u,v) = 1- sim_{C}(u,v)\), are used to assign users to clusters. Initially, a set of medoids (users) is generated randomly, then a medoid is selected as the user in the cluster that has minimum average distance to all the other users in the cluster. Afterwards, users are assigned to the cluster with the closest medoid, using a greedy strategy (Kaufman and Rousseeuw 1990a; Park and Jun 2009).

The proposed algorithm by Park and Jun (2009) runs like the K-means algorithm and tests several methods for selecting initial medoids. The algorithm calculates the distance matrix once and uses it for finding new medoids at every iteration step. In our work, we apply the original K-medoids algorithm (Kaufman and Rousseeuw 1990a) for running the clustering of privacy profiles. In this way, we are able to substantially reduce the effort to look for similar users. Thus, our work is substantially different from that of Park and Jun (2009), since we use clustering just only as the initial phase of the whole process, and by the later phase, we go one step further to recommend relevant settings to users (which cannot be obtained by solely using clustering). In particular, once we get independent clusters, given a testing user, we employ a collaborative-filtering algorithm to mine additional settings from users of the same cluster (explained in detail in Sect. 4.3). The algorithm proposed by Park and Jun (2009) can be used to replace the original K-medoids, aiming for a better clustering performance. This will possibly contribute to an increase in the overall prediction accuracy of the proposed recommender system. However. However to confirm this hypothesis further investigation is needed that we leave for future work.

4.2 Automated assignment of privacy profiles to users

Supervised learning algorithms can simulate humans’ learning activities, mining knowledge from labeled data and performing predictions for unknown data (Gurney 1997). Among others, neural networks have been widely adopted in various applications, including pattern recognition (Bishop 1995), or forecasting (Zhang et al. 1998). A feed-forward neural network consists of connected layers of neurons, where the output of a layer is transferred to the next layer’s neurons, except for the output layer.

Fig. 6
figure 6

A three-layer neural network

To classify users into different privacy groups, we built a feed-forward neural network, using preferences as features. The network consists of three layers explained as shown in Fig. 6. In the considered network, the input layer has L neurons, being equal to the number of input settings, i.e., \(X=(x_{1},x_{2},...,x_{L})\). The middle layer consists of M neurons, i.e., \(H=(h_{1},h_{2},...,h_{M})\), M and the number of perceptrons for this layer can be empirically set following on the input data. There are \(\kappa\) neurons in the output layer, corresponding to \(\kappa\) output categories, i.e., \(\hat{y}=(\hat{y}_{1},\hat{y}_{2},..,\hat{y}_{\kappa })\). The predicted value \(\hat{y}_{k}\) for neuron k of the output layer is computed to minimize the error between the real and predicted values. As discussed later in Sect. 6, the conceived neural network plays an important role in the performed analysis, especially in understanding to what extent self-declared privacy profiles reflect the actual user category.

4.3 Privacy settings recommendation

We conceptualize PisaRec, a Privacy settings assistant running on top of a Recommender system to provide users with suitable data protection configurations. PisaRec works based on the assumption that “if users of the same privacy profile already share some common privacy settings, then they are supposed to share additional similar settings” (Schafer et al. 2007). In this way, we utilize the proposed graph-based representation to model the relationship among users and use a collaborative-filtering algorithm (Aggarwal 2016) to recommend missing settings. To feed as input for the recommendation engine, we adopt the user-item paradigm (Noia and Ostuni 2015), in which each user corresponds to one row, and a column represents each setting. In this way, a cell in the matrix dictates the rating given by a user to a setting. The two values 0 and 1 correspond to deny and allow, respectively. An example of a user-setting matrix for a set of four users and five settings is as follows: \(u_1 \ni s_1,s_2\); \(u_2 \ni s_1,s_3\); \(u_3 \ni s_1,s_3,s_4,s_5\); \(u_4 \ni s_1,s_2,s_4,s_5\). Accordingly, the user-item ratings matrix built to model the occurrence of the settings is depicted in Fig. 7.

Fig. 7
figure 7

A user-setting matrix

The following collaborative-filtering formula is utilized (Schafer et al. 2007) to predict the inclusion of a setting \(s_{i}\) for user u:

$$\begin{aligned} r_{u,s_{i}}=\overline{r_{u}}+\frac{\sum _{v \in topsim(u)} (r_{u,s_{i}}-\overline{r_{v}})\cdot sim(u,v)}{\sum _{v \in topsim(u)} sim(u,v)} \end{aligned}$$
(3)

where \(\overline{r_{u}}\) and \(\overline{r_{v}}\) are the mean of the ratings of u and v, respectively; v belongs to the set of top-k most similar users to u or neighbour users, i.e., topsim(u); sim(uv) is the similarity between u and a similar user v, computed using Eq. 2.

Clusters are created based on similarity among the instances. In this respect, instances within a cluster are more similar to each other compared to those in other clusters. We exploit this to narrow down the search scope, approaching the users that share common preferences in privacy settings, given that they belong to the same cluster. In this respect, we can efficiently mine the most relevant settings for any user.

PisaRec works on top of the clustering results, i.e., the clusters obtained from the previous section allow us to identify users with similar privacy preferences. Based on the obtained categorization, given an input user, the neural network assigns the user to a specific category. Afterwards, a graph to represent the relationships between users and settings is built only for this category following the paradigm in Fig. 5. Such a sub-graph contains fewer nodes and edges than a full graph for all categories, aiming to optimize the computation. On top of this, PisaRec recommends missing settings to users. The outcome of the computation is a ranked list of probable settings, and we select the top-N of them to present as the final recommendations.

5 Evaluation

To study the proposed approach’s performance, we first introduce three research questions. Afterwards, we describe the dataset and metrics used in our evaluation.

5.1 Research questions

The following research questions are considered to evaluate our proposed approach.

  • RQ\(_1\): How well does the users’ self-assessment reflect their privacy category? As users in the considered dataset (Sanchez et al. 2020) have been allowed to self-assess their privacy category, we examine if such a self-evaluation reflects their real category.

  • RQ\(_2\): Which sets of questions are relevant for assessing privacy concerns? We are interested in finding the questions that can better distinguish between user profiles. For this research question, we cluster the users with different sets of features and identify the one that brings the best clustering solution. The aim is to find a set of privacy questions that better represents the user profiles.

  • RQ\(_3\): To which extent is PisaRec able to utilize the obtained categorization in recommending relevant privacy settings to users? We investigate how well our recommender system learns from existing profiles, providing users with additional configurations that reflect their preferences. It is important to note that we addressed this research question using the dataset from Sanchez et al. (2020), which we obtained by contacting the corresponding authors. However, we were unable to compare PisaRec with the recommender system presented in Sanchez et al. (2020) due to the unavailability of a replication package.

5.2 Dataset

Within the scope of this study, we use a dataset obtained from the original research conducted by Sanchez et al. (2020). This dataset consists of anonymised user-generated data within the digital fitness domain, that contains privacy-sensitive information. The dataset contains detailed measurements, such as individual workout statistics, preferences for health goals, equipment usage, and session lengths. It has been carefully processed to remove any personally identifiable information in order to protect user confidentiality.

The process of anonymization, emphasized by the authors Sanchez et al. (2020), includes the systematic aggregation and encryption of data, which guarantees the preservation of individual privacy while enabling a comprehensive analysis of user behaviors and trends at a larger scale. Researchers can see generic patterns, such as common health goals or regularly used workout equipment, but they cannot link this data to individual individuals, thereby maintaining anonymity. As shown in Table 3, the dataset consists of 444 entries which have been divided into three main groups as follows:

  • Domain specific: This is the set of questions being explicitly related to the fitness activity. It also contains a subset of privacy relevant questions. There are a total of 202 questions in this category.

  • App related: These questions are about the use or setting of the app, consisting of 113 questions.

  • Generic: This set of questions consists of generic questions that are not related to other groups. There are 129 generic questions in total.

Table 3 Summary of the dataset

5.3 Evaluation metrics

  • Compactness. The metric measures how closely relevant the users within a cluster are Liu et al. (2010). In this respect, a lower value represents a better clustering solution and vice versa.

  • Silhouette. It measures how similar a user u is to all the remaining users of the same cluster (Liu et al. 2010), computed using the following formula:

    $$\begin{aligned} s(u)=\frac{(b(u)-a(u))}{max \{a(u),b(u)\}} \end{aligned}$$
    (4)

    where a(u) is the mean distance between u and the others, b(u) is the minimum mean distance. A silhouette value falls into the range [-1,..+1], where a higher score means a better clustering solution.

    Furthermore, we also use Precision, Recall, ROC curve and AUC to study the performance of the proposed approach.

    First, there are the following definitions: True positive (TP) is the settings that match with ground-truth data; False positive (FP) is the recommended settings but do not match with the ground-truth data; False negative (FN): the settings that should be recommended, but they are excluded. Then, the metrics are as follows:

  • Precision and Recall. Precision measures the fraction of the number of settings properly classified to the total number of recommended items and Recall (or true positive rate – TPR) is the ratio of the number of correctly classified items to the total number of items in the ground-truth data. The metrics are defined as follows:

    $$\begin{aligned} P = \frac{TP}{TP+FP} \quad R = \frac{TP}{TP+FN} = TPR \end{aligned}$$
  • False positive rate (FPR). This metric measures the ratio of the number of items that are falsely classified into a category c, to the total number of items that are either correctly not classified, or falsely classified into the category:

    $$\begin{aligned} FPR = \frac{ FP }{TN+FP} \end{aligned}$$
  • ROC curve and AUC. The relationship between FPR and TPR is sketched in a 2D space, using a receiver operating characteristic (ROC) (Fawcett 2006), which spans from (0,0) to (1,1). An ROC close to the upper left corner represents a better prediction performance.

6 Experimental results

This section reports and analyzes the experimental results by answering the research questions introduced in Sect. 5.1.

6.1 RQ\(_1\): How well does the users’ self-assessment reflect their privacy category?

In the dataset (Sanchez et al. 2020) considered in our evaluation, each user has assigned themselves to one of the following four groups: Privacy Conservative (Class 0), Unconcerned (Class 1), Fence-Sitter (Class 2), and Advanced User (Class 3). We investigate if the self-assessment is consistent, i.e., if all the users properly perceive their real privacy category. This is important since a proper self-clustering can be utilized in additional profiling activities.

Fig. 8
figure 8

ROC curves with generic questions

We conducted evaluation using the conceived neural network as the classifier. Such a technique has been successfully applied to classify various types of data, e.g., text (Minaee et al. 2021), chemical patterns (Burns and Whitesides 1993), metamodels (Nguyen et al. 2019), to name a few. Similarly, we use the privacy settings as features, and the labels specified by humans to train the classifier. We opt for the ten-fold cross validation technique (Kohavi 1995), where the dataset is split into ten equal parts, and the evaluation is done in ten rounds. The evaluation metrics are computed on the test set, i.e., for each user the network predicts a label, which is then compared with the self-assessed label to evaluate the performance. Finally, ROC curves are sketched by combining the scores obtained from all the ten folds.

Fig. 9
figure 9

ROC curves with domain specific questions

Figures 8 and 9 depict the ROC curves obtained from the classification results for generic and domain specific questions. It is evident that the classifier achieves very low prediction performance on both configurations. In particular, the curves bend over the diagonal line, being close to a random guess. Moreover, the AUC values of the four categories are always lower than 0.65. In other words, we encounter negative results, where the neural network fails to predict a proper category for a user. These results suggest that there are noises in the training data (Zhu and Wu 2004), which could possibly be both in the features and the labels.

To confirm the hypothesis, we measure the similarity between each user and all the remaining others. Interestingly, we found out that 96.20% of the users have very similar users in completely different self-assessed categories. This demonstrates that while users share similar preferences, they classify themselves differently, causing a low prediction performance for the neural network.

Answer to RQ\(_{1}\). The self-assessment given by users does not reflect their real privacy category: Users with highly similar settings perceive themselves as completely different groups. In practice, this means administrators should not rely on such a self-categorization, but have to perform privacy profiling on their own.

6.2 RQ\(_2\): Which sets of questions are relevant for assessing privacy concerns?

As seen in RQ\(_1\), the self-assessment given by users is not consistent; thus, finding another way to group users into clusters is necessary. We performed experiments on different subsets of the questionnaire to study the influence of each set on the clustering results. The ultimate aim is to identify a set of questions that helps classify users better. In particular, we are interested in analyzing the following groups of questions:

  • QS\(_1\): It is a set of question sets as follows: Domain specific (D); App related (A); Generic (G) and their combination (i.e., D+A+G) named COM.. Furthermore, we also include the set G+AP2 where AP2 contains generalizable questions like

    “Do you believe the company providing this fitness tracker is trustworthy in handling your information?” Indeed, this is to ask if the company is trustworthy, and therefore we consider it as a general question.

    QS\(_1\) permits to compare compactness and silhouette performances between a single set and combinations of all questions.

  • QS\(_2\): It is a set of questions sets as follows: DP1, AP1, and GP1 that are the subsets of D, A, and G consisting only of privacy relevant questions, respectively. COM. is the union of the three subsets, i.e., COM. = DP1+AP1+GP1. QS\(_2\) permits to understand the actual influence of the privacy-related questions.

  • QS\(_3\): It consists of subsets of generic questions G defined as follows: G1 are the questions related to disclosure of information about user’s identity with the app; G2 the questions related to the time spent by the user in completing the survey; G3 the questions related to user’s identity; G4 the questions related to disclosure of private information with the app; G5 the questions related to concerns about privacy. COM. is the combination of all the subsets, i.e., COM. = G1+G2+G3+G4+G5. QS\(_3\) is to ascertain the influence of the generic questions with respect to the overall set of questions.

Fig. 10
figure 10

Compactness and silhouette scores

We compute and report for each set the corresponding compactness and silhouette scores. Figure 10a, c, and e report the compactness scores computed for the three question sets.

As it can be seen in Fig. 10a, using A as input yields the most compact clusters. In particular, most of the scores are smaller than 40. When domain specific questions (D) are used as the features, we also obtain low compactness scores, albeit being larger than using A. If only generic questions, i.e., G, are utilized, worse clustering solutions are seen. When comparing the results obtained by using G with those of using G+AP2, we can see that adding AP2 to G contributes to a better clustering. Concerning QS\(_2\) where only privacy relevant questions are considered, we see that using domain specific privacy relevant questions (DP1) allows us to gain the most discriminative clusters. Using the subset of privacy relevant questions, i.e., AP1 is also beneficial to the clustering of user profiles.

For QS\(_3\), there are comparable clustering solutions when using the features sets G\(_1\), G\(_3\), G\(_4\), and G\(_5\). The best clustering is obtained with G\(_2\).

The silhouette scores in Fig. 10b, d, and f further enforce the compactness ones. A is the feature set that achieves the best silhouette for QS\(_1\). Adding AP2 to G helps achieve a better clustering solution, compared to using only G.

Answer to RQ\(_{2}\). According to the performed evaluation, generic questions plus generalizable ones (i.e., G+AP2) provide the best clustering solution.

6.3 RQ\(_3\): To which extent is PisaRec able to utilize the obtained categorization in recommending relevant privacy settings to users?

An issue with clustering is whenever there is a new user to be classified, it is necessary to re-run the whole process. This is a time consuming phase, especially where there is a large number of users. Thus, we propose a more feasible way to assign new users to clusters, avoiding repetitive clustering.

Fig. 11
figure 11

ROC curves, three categories

Given that there is an existing categorization of user profiles, the feed-forward neural network presented in Sect. 4.2 is used to classify a new user into a suitable group. Once clusters have been obtained, we feed them as input to train the neural network and perform the testing using the ten-fold cross-validation procedure. It is worth mentioning that we use the proposed categorization consisting of three clusters, as explained in Sect. 3.

The final performance measured by means of ROC curves is depicted in Fig. 11. In particular, the AUC values for Class 0, Class 1, and Class 2 are 0.83, 0.85, and 0.76, respectively. The three classes’ curves reside near the upper left corner, implying a good prediction performance. Overall, the curves and the AUC values demonstrate that the obtained performance is much better compared to that before clustering in Figs. 8 and 9. This suggests properly clustering user profiles can substantially increase the neural network’s prediction performance.

Next, we validate the performance of PisaRec as follows. We opted for the ten-fold cross-validation technique (Kohavi 1995), where the dataset is split into ten equal folds, and the evaluation is done in ten rounds. By each round, one fold is utilized as testing, and the other nine folds are merged to create the training data. In a testing fold, for each user, the features are split into two parts, one part is fed as a query, and the remaining part is removed to be used as ground-truth data. The ratio of the number of settings used as a query to the total number of settings is called \(\alpha\). This simulates a real scenario where the user already specified some settings, and the system is expected to recommend the rest, corresponding to the ground-truth data. For each user, PisaRec returns a ranked list of N settings (N is configurable), and the evaluation metrics are computed on the test set as follows. The recommended items are then compared with the ground-truth data to evaluate the performance. Eventually, we average out the metrics obtained from the testing folds to produce the final results.

We experiment with different configurations by varying \(\alpha\), k: the number of neighbour users used for the computation, and N: the number of recommended items. In particular, \(\alpha = \{0.1,0.3,0.5\}\); \(k = \{3,5,10,15\}\); and N is varied from 1 to 50, simulating a real-world scenario where users have to set several settings. The precision-recall curves are then sketched following these parameters.

Fig. 12
figure 12

Configuration C\(_1\)

As seen in Fig. 12, when \(\alpha =0.1\), i.e., only a small amount of data is used as query, PisaRec recommends relevant settings to users, however with considerably low precision and recall. For instance, when \(k=3\), a maximum precision of 0.52 is obtained and the maximum precision is 0.7 when \(k=15\). Similarly, the recall scores are low, i.e., smaller than 0.4 by all the configurations. Altogether, this implies a mediocre performance which is understandable as the configuration with \(\alpha =0.1\) corresponds to the case where the user only specified a few settings, and the system has limited context to recommend additional settings.

Fig. 13
figure 13

Configuration C\(_2\)

When we increase \(\alpha\) to 0.3, there is an improvement in both precision and recall as in Fig. 13, compared to the results obtained with \(\alpha =0.1\) in Fig. 12. Precision scores are always larger than 0.55 in the configurations, with 0.80 being the maximum value. Similarly, we also see that recall scores are gradually improved. For instance, a maximum recall of 0.35 is achieved with \(k=3\), and the corresponding maximum for \(k=15\) is 0.48.

Fig. 14
figure 14

Configuration C\(_3\)

Such an improvement is more evident when \(\alpha =0.50\), i.e., a half of the settings is used as query. In Fig. 14, apart from some outliers, most of the precision scores are larger than 0.70, with 0.85 as the maximum value. Compared to the previous configurations with \(\alpha =0.1\) and \(\alpha =0.3\), the recall scores are also better, i.e., with a longer list of items, recall increases substantially. In particular, a recall of 0.73 is seen when \(k=10\) and \(k=15\).

Concerning the number of neighbors used for computing recommendations, i.e., k (see Sect. 4.3 and Formula 3), by considering Figs. 12,  13, and 14 together, it is evident that adding more users for the computation contributes to a better prediction performance. For instance, by increasing k from 3 to 5, 10, and 15, we boost both precision and recall by all the cut-off values N.

Altogether, the experimental results show that even if users perceive their categories differently as shown in RQ\(_1\), once we have identified their right privacy group PisaRec can exploit the categories to provide relevant settings to users, though the considered dataset is pretty small. We anticipate that its performance can be further enhanced, if there is more data for training.

Answer to RQ\(_3\). PisaRec recommends highly relevant settings to a user, though there is limited amount of data available for training. The prediction performance improves alongside the amount of data fed as input.

7 Discussion

This section provides discussion related to the possible extensions of our work, as well as the threats to validity of our findings.

7.1 Extendability

Dataset. In our work, we utilized a small dataset for the evaluation. The amount of training data may impact the performance of both the clustering and classification phases. Moreover, as PisaRec is a collaborative-filtering recommender system, its performance is heavily driven by the quality and amount of data. We anticipate that we may need to calibrate the systems’ parameters to maintain both timing efficiency and effectiveness with more data.

The unsupervised algorithm. In the scope of this paper, we used the K-Medoids algorithm to cluster the user profiles. Such a technique has been chosen due to its simplicity and effectiveness. In fact, several clustering algorithms could be employed to categorize user profiles. Thus, the outcome of a clustering solution depends heavily on the considered techniques. We plan to extend our work by considering other clustering algorithms, such as CLARA (Kaufman and Rousseeuw 1990b), or DBSCAN (Ester et al. 1996).

The supervised classifier. The neural network used to classify user profiles may be suitable only for the considered dataset. For a different dataset, it is necessary to find adequate network configurations employing an empirical evaluation. For instance, the number of hidden layers, or the number of neurons for each layer, should be considerably increased to deal with a larger number of user profiles.

Through RQ\(_2\), we showed that using a subset of features, we get a good clustering performance with respect to Compactness and Silhouette. This demonstrates that not all the features used by the original system are necessary, and that we can obtain more compact clusters by means of a smaller set of features, that represent user’s general moral preferences. The recommendation engine of PisaRec is built on top of a collaborative-filtering technique, mining settings from similar users to the user under consideration. Such a technique has been originally proposed for online shopping systems to leverage the relationships among users and products to predict the missing ratings of prospective items. The technique is based on the assumption that “If users agree about the quality or relevance of some items, then they will likely agree about other items” (Schafer et al. 2007). Our tool is built on top of this foundation to solve the problem of privacy setting recommendation. Instead of recommending goods or services to customers, we recommend additional settings to users by means of an analogous mechanism: “If users opt for some common settings, then they will probably opt for other additional common settings.”

7.2 Threats to validity

We are aware of the existence of threats that might harm the validity of the performed experiments as they are presented as follows.

  • Threats to construct validity are related to any factor that can compromise the validity of the given observations. The main threat to construct validity is related to the size of the analyzed data. The used dataset is indeed relatively small but has the advantage of coming from a recent work (Sanchez et al. 2020) thus reflecting users’ contemporary privacy behaviors. More extensive experiments are being conducted encompassing ethical dimensions, for example (Alfieri et al. 2022) correlated ethical positions, personality traits, and worldview to predict privacy-related digital behaviors; and Alfieri et al. (2023) analyzed ethical dimensions such as self-regarding and other-regarding concerns, consequences of action, and compliance with rules, principles, or laws relating to implications of human interactions with artificial agents such as algorithms, robots, platforms, and ICT systems

  • Concerning the threats to internal validity, i.e., any confounding factor that could influence our findings, we attempted to avoid any bias in the automatic creation of user profiles and in the way we split the full data into groups. We tried to mitigate this threat by semantically analyzing and double-checking the clusters obtained by the proposed approach.

  • Concerning the threats to external validity, they are related to the generalizability of our results. This is about checking the adequacy of our privacy profiles in other contexts, notably, in the traveling or IoT domains. Generalizability is actually our initial driver for extracting privacy profiles from general moral questions. Thus, further experimental evidence is planned to support the reported paper results.

8 Related work

This section reviews the related work and their main characteristics to position our approach in the current scenario for eliciting, profiling, and predicting user privacy preferences.

8.1 Overview

The work presented in this paper has been done in the context of the EXOSOUL research projectFootnote 2 that aims at providing users with a personalized software layer, the exoskeleton, that mediates users’ interactions with the digital world according to user’s ethics, including privacy preferences (Autili et al. 2019).

According to various studies (Milne and Culnan 2004; Obar and Oeldorf-Hirsch 2020), the vast majority of users do not bother to read privacy agreements because of the excessive language and confusing explanations (Bhatia et al. 2016; Jensen and Potts 2004; McDonald et al. 2009; Reidenberg et al. 2015); it is unreasonable also to expect they will read them on a regularly basis (McDonald and Cranor 2008). Resignation from privacy choices may also result from dissatisfaction with the lack of options and excessive complexity (Colnago et al. 2020).

Privacy profiling is at the core of our work therefore, most related studies are on user clustering, privacy profiling, and privacy preferences settings. The more significant part of existing studies about privacy profiling develop on the work of Westin (Westin 2003). Based on a series of privacy-related surveys, the author established “Privacy Indexes” for most of these polls to summarize results, indicate trends in privacy concerns, and suggest a widely recognized segmentation methodology of “Privacy Profiles.”The methodology he applied classifies people into three categories: privacy fundamentalists, pragmatists, and unconcerned. Because of the commercial nature of Westin’s surveys, the methodology and the details of how privacy indexes were calculated are not fully disclosed, so we rely on subsequent works (Kumaraguru and Cranor 2005) that deeply analyzed and reported them.

Considerable discussion and academic efforts have been dedicated to the topic of privacy categorization. Hoofnagle and Urban (2014), Urban and Hoofnagle (2014)) also with Dupree et al. (2016), Schairer et al. (2019), and Sanchez et al. (2020), all contributed with critics and unique classifications to the privacy discourse. The paper “Systematic Review on Privacy Categorization” (Inverardi et al. 2023a) provides a comprehensive synthesis of the aforementioned studies, as well as other relevant research in the field. This review provides an exploration of the development and complexity of privacy categories, providing an in-depth understanding of the current status and possible prospects for the future within this domain.

Barth and de Jong (2017) conducted a literature review to investigate discrepancies between expressed privacy concerns and actual online behavior. The authors determined that two following issues drive the willingness to reveal privacy: (i) risk-benefit evaluation and (ii) risk assessment deemed be none or negligible. Based on these issues, the authors compiled a comprehensive model using all the variables mentioned in the discussed papers. The literature review studied the nature of decision-making (rational vs. irrational) and the context in which the privacy paradox takes place, with a special focus on mobile computing.

To help developers choose suitable permissions for Android app, Liu et al. (2019) proposed an approach named PerRec, being built on top of mining-based techniques and data fusion methods, providing apps with permissions according to their used APIs and API descriptions. Compared to PerRec, PisaRec is different as it mines privacy settings by clustering users based on their similarity in general privacy preferences, and then uses a collaborative-filtering technique to recommend additional settings.

8.2 Profiling and clusterization

Different approaches were used in recent works to create user profiling starting from data collection and analysis.

Lee and Kobsa (2016) performed a cluster analysis on online survey data composed of IoT scenarios and user responses like reaction parameters. Because all parameters have either categorical or ordinal values, the authors utilized K-modes, a variant of the K-means clustering algorithm.

Qin et al. (2008) proposed a user’s preferences prediction through clusterization of partial preference relations on the MovieLens dataset, which is commonly used to test collaborative filtering technology. According to Fernquist et al. (2017), users may also be identified by their data profiles created by their device based on time and events: researchers gathered information on how and when people use their networked devices, recording the time period in which a user interacts or transmits data and the specific place. For the sake of interpreting their findings, the researchers took into account three distinct sorts of events: voice calls, texts, and data transfers, as well as combinations of these. The findings showed that the profiles studied may be used to identify the user.

The introduction of data protection law has triggered the need to address privacy concerns in developing software applications. Starting from the fact that constructing an alignment between issues and privacy requirements plays an important role in developing privacy-aware software systems, Sangaroonsilp et al. (2023) proposed an approach to the classification of privacy requirements in issue reports. They explored a wide range of machine learning and natural language processing techniques to classify privacy-related issue reports in Google Chrome and Moodle projects.

8.3 Automating privacy settings

Concerning automating privacy preferences settings, the closest approach is by Lin et al. (2014), Liu (2020) and by the Personalized Privacy Assistant Project team (Sadeh et al. 2021). Their approach employs user categorizations that are obtained by mining existing privacy settings in the app domain, complemented with an initial dialogue with the user to select the appropriate profile. Our approach is also based on privacy profiles. However, they are obtained by analyzing data resulting from questions that relate to the user’s ethics and are not concerned with any specific domain.

Wilson et al. (2013) identified the impact of privacy profiles on the preferences, sharing inclinations, and overall satisfaction levels of users of location-sharing apps. Their findings demonstrate that privacy profiles for location sharing settings can have a long-lasting impact on how users perceive their privacy, even in the face of ongoing opportunities to reflect on the sharing outcomes that result from their chosen settings. This implies that attempts to simplify privacy settings should be performed with caution since such simplicity may easily impact the people with whom the settings are intended to interact and educate.

8.4 Surveys and regulations

Based on previous research in survey technique and related domains, Redmiles et al. (2017) provide a set of important recommendations for conducting self-report usability studies. There are established criteria and suggestions for collecting good quality self-report data in other sectors that depend on self-report data, such as health and social sciences. We used this information as a guideline for selecting and refining question groups.

As discussed by Emami-Naeini et al. (2020), surveys and interviews can be administered with consolidated methodologies like the Delphi Method. This method is “a method for the systematic solicitation and collection of judgments on a particular topic through a set of carefully designed sequential questionnaires interspersed with summarized information and feedback of opinions derived from earlier responses” (Atherton 1976). Using a three-round Delphi process, the authors conducted an expert elicitation study with 22 privacy and security experts to identify the factors that experts believe are important for consumers to consider when comparing the privacy and security of IoT devices to inform their purchase decisions. The same methodology could be used to elicit preferences from the users.

Considering the research theme, we took into account the General Data Protection Regulation (GDPR) (Parliament and the Council of the European Union 2016), the document that governs the storing, processing, and use of personal data by the European Union (EU) as of May 25, 2018. Even if not based in the EU, the GDPR applies to all third parties that operate in the EU market or access the data of EU citizens.

9 Conclusion and future work

This paper proposes an approach to recommend relevant privacy preferences to users. The approach consists of both supervised and unsupervised learning to identify privacy profiles. Based on an empirical study conducted on a fitness dataset, it was demonstrated that asking general questions is an effective way of categorizing user profiles and suggesting appropriate privacy settings. For future work, besides developing further experimental evidence supporting the results reported in this paper, we will work in the direction of building user profiles that cover other ethical dimensions beyond privacy. In the scope of the Exosoul project, we will deploy the conceived techniques to analyze data collected from users, studying the characteristics of users’ behaviors and attitudes in the digital world.