Keywords

1 Introduction

Facebook is the largest social network. Maintaining 1.5 billion daily active users, their connections and updates in real-time is a tremendous engineering feat. However, it appears that the guiding principles in the evolution of Facebook’s data platform have been: real-time response [2] and features to users, app developers, and advertisers. The recent revelations [3] have forced Facebook to acknowledge that data privacy is an important feature! The platform’s design choices, for speed and features, will hinder it from coherently enforcing privacy policies anytime soon in the near future.

Facebook’s platform allows users to establish and organize their relationships with other users using social relationship categories like “Friends”, “Close Friends”, “Family”, etc. An update in user’s personal life is more relevant to members of “Family” than “Friends” and the platform does such a prioritization intelligently. Similarly, among the categories of relationships further prioritization of updates is done based on the interests of the users that are at the other end of the connection. That is, a friend from school falls in sub-category school and likewise a friend from university. Furthermore, friends from school who have interest in history are distinguished from the friends who have interest in finance. Such a segmentation of categories helps the platform to build relevant audiences for a user’s updates. Users are given a control to decide which segment should see what updates. Facebook organizes all these information about its users and their interactions as a graph – called social graph. Users (nodes) are free to form new relationship (edge) and update the old ones. Social graph is a continuously evolving graph and this type of organization of users and their data helps Facebook in segmenting users with similar interests so that they can be introduced to a new post or an advertisement.

Facebook platform allows developers to write Apps, which users can install. An App serves a specific function to its users. When a user installs an App (represented by an edge between the App and the user on social graph), it signifies that user’s interest in the functionality provided by that App. Thus, users get a functional convenience and Facebook automatically gets contextual insights about users. Both, the App and the platform will have an access to users’ interactions within the administrative sphere of the App. Facebook can build an accurate context about an user than an App because it has other insights about the user. Thus an App, through its functional category, helps the platform to segment users in a specific category so that it can be used in profiling the users. For example, a flower delivery App can help identify users who are single, male, within a specific geographical area, and who have purchased flowers last year on Valentine’s day. In order to build audiences of such type, Facebook needs to build, maintain a detailed profile for each of its users. Higher the interactions of a user, richer the profile. Connectivity and interactions are important objectives of the platform, and Facebook does it very well in its ecosystem of users, Apps, content and interactions among them. This ecosystem of interacting nodes is depicted as a pyramid, in Fig. 2), to highlight their access privileges (either explicit or implicit) on the platform. Each layer (user layer, app layer, advertisement layer) serves a different purpose and has a different access control mechanism to control access to users’ information. In [21], we have analyzed privacy claims of the platform at the user level alone. In this paper, we analyze conformance of user privacy settings in the presence of Apps. We will show that there is no coherence in policy enforcement across the layers, which undermines the privacy of its users. We have validated our observations through experiments on Facebook’s developer platform v2.12 and Facebook Audience Network. While Facebook does profiling of users for varieties of reasons, one of the trusting factors of Facebook is that it shall not divulge intentionally or for price the data that violates its committed privacy setting with its users. However, this cannot be said about the app developers or the advertisers on the app. Thus, our findings show the challenges to plug the leaks, due to apps/advertisers, Facebook should undertake.

In the following section, we present the somewhat hybrid, ad-hoc nature of access control mechanisms employed by Facebook. In Sect. 3, we analyze the platform and trace the flow of user information beyond the layers of its policy sphere. In Sect. 4, we present a few scenarios where defined privacy settings of a user are violated due to Apps. Section 5 discusses related work followed by conclusion in Sect. 6.

2 Access Control in Facebook

At the different layers of the platform, Facebook employs different types of access control mechanisms. At the user layer, user content and user attributes are protected by a discretionary access control. At the App layer, user content and user attributes are protected by capability lists. The other entities of the platform are not governed by any policy that user can influence. Also, the metadata the platform collects about user is not controlled by the user in any way. The platform organizes all of its entities and content in a graph, which has a sub-graph that can be traversed by users/Apps according to their respective permissions. The platform owner can traverse the whole graph without any restriction and acts as a proxy to its collaborators (the advertisers).

Social Graph - Reachability as the Condition for Access: Social graph in Facebook is a representation of user information on Facebook. Two user nodes have an edge between them if the users are friends with each other. Having an edge between two nodes establishes connectivity between them and in turn extends their reachability: that is, a user can access posts of her friend because there is a path present on the graph between the user and her friend’s post via the friend node. Now, if the user likes her friend’s post, this will be reflected in the social graph by putting an edge of type like between the user and her friend’s post. Thus, each and every action or event created by Facebook’s users is consumed by the social graph. The graph continuously changes its state reflecting its users’ actions and interactions. Updates to social graph happen by adding/deleting nodes (or updating fields of nodes), and adding/deleting/updating the labelled edges – all such updates are due to a user’s and app’s interactions with their reachable nodes. Passive nodes like posts, photos, et al., do not interact on their own. Social graph also allows its nodes to be queried [21]. A user is allowed to compose a query by specifying a particular node (of type root [8]) about which the requester needs information. It is very likely that different sets of information about a node are presented based on who the requester is.

Lists as Access Policies for Users: Each user is provided with pre-defined relationship categories, called lists, along which users organize their relationships with others. Then there is a category of lists that Facebook creates for a user based on her social affiliations. And a user is also allowed to create and manage her own private lists. Given below is a typical set of labels provided to express access control policies:

  • Only Me: is a label/list in which user herself is the only member

  • Public: is a label, when used, the associated object is accessible publicly

  • Friends: is the primary list under which all friendship relations are enlisted

  • Restricted: is a list of friends to whom only Public labelled information is allowed

  • Family: is a list of friends who are assigned as family members

  • Close Friends: is a list of friends who are assigned as close friends

  • Acquaintances: is a list of friends who are assigned as acquaintances

  • Friends of friends: list of users who have friendship relation with “Friends”

  • University: is a social list of friends who are also members of Smart List University

  • School: is a social list of friends who are also members of Smart List School

  • Cycling: is a Private List to which user has assigned a set of friends

  • Custom: is a custom policy constructed using the label types described above.

Access control of objects in Facebook is a simple check on associated list’s membership. If a requester of an object is a member of the list with which the object is protected, the requester gets access. Tagging is a positive exception to the membership check. There are two negative exceptions to the membership check: “Restricted” list and “Blocked” list. If a requester of an object is member of one of these lists, access is denied even when the requester is member of the list with which the object is protected.

Fig. 1.
figure 1

Reachability and access in social graph of Facebook

In Fig. 1, User2 can reach & access Post1 because there is a path and the access policy for Post1 is set as friends by its owner User1. Therefore, User2 could interact with Post1 by like action. User1 & User2 can access Post3 because User1 is a friend of friend of User3 and User2 is friend of User3. Post2 cannot be accessed by User1 because the custom policy allows access to all friends of User2 except User1. The Event created by User1 cannot be accessed by anyone except User1 because the access policy is only me. Thus, labels or lists are used to control access to the content owned/posted by Facebook users.

Capabilities as Access Policies for Apps: Facebook Apps too are represented by nodes on social graph. However, Apps’ traverse-ability on the social graph is limited to the immediate neighborhood of the user node consisting only the object nodes. In other words, the App can neither reach the friends of the user nor the other Apps installed by that user. What interactions the App can do in the user’s neighborhood is determined by the set of permissions the user has allowed at the time of establishing the installed relationship with the user. There are 48 such permissions an App can obtain from its user. This is similar to capability lists in access control paradigm [16]. In later sections we shall discuss which of these permissions to an App undermines user’s privacy.

The utility of social graph is not limited to representation of subjects, objects and their relationship but to also provide real-time updates about the changes in the neighborhood of the subject. Prioritization of updates according to their relevancy to a user based on users’ past interactions on social graph is handled by NewsFeed algorithm; a core function of Facebook platform. How the App ecosystem helps it in achieving precision is explained below along with the other important components of platform.

3 Architecture of Facebook Platform

Figure 3 gives a schematic architecture of Facebook platform depicting the relationships between the major entities of this platform. In the following we describe the entities and their functionalities. The platform is logically divided into two: public space & private space. The entities in public space are the users and applications. They are said to be in public space because, having an account on Facebook, these types of nodes can query and interact among each other based on the access policies. Though the entities from private space can influence and have a richer view of the graph topology, they cannot perform any of the operations available to nodes in public space without being a node in the public space. Figure 2 depicts the access-hierarchy in the social graph of Facebook. The primary objective of the platform is to build accurate user profiles (behavioral, psychometric, etc.) so that advertisers can be accurately matched to their audience. The platform has been quite successful in micro-targeting users in real-time so that it artificially puts limits on advertisers while building their target audiences. An advertiser cannot compose a target audience whose size is less than 100. Similarly, an advertiser cannot request audience-tracking for audience size less than 100. To understand the design of this platform let us describe the role and functionality of its individual entities.

Fig. 2.
figure 2

Access-hierarchy in the social graph

NewsFeed: Facebook has an intelligent algorithm to prioritize the updates to a user, which is called NewsFeed. If we assume that each object/content on the social graph has a category type associated with it, like: education, finance, food, sarcasm, celebrity, etc., then a subject’s interaction with these objects determine the probability of interest the subject may have in such categories. Each interaction of a subject with its neighborhood node improves the confidence level of subject-category mapping. The objective of NewsFeed algorithm is to increase subjects’ interaction with varying categories [11] of content so that a rich user profile can be built. Such a user profile is necessary to determine relevancy of updates to the user and also to match the user with an advertiser interested in particular category [22]. If we assume the nodes in the graph are labelled with categories and edges are weighted proportional to the confidence level of the category, then we can think of an influence function over two nodes. A node with higher confidence value influences the confidence value of its peer. Thus the utility of NewsFeed function is incite the user to interact with content from its neighborhood and also from other influential nodes with whom the user does not have relationship (either friend or follow) yet. Higher the engagement of the user, more are the interaction, and thus higher the confidence value to categorize the user.

Fig. 3.
figure 3

Facebook’s schematic architecture

Users: Users are the largest part of the platform. Their interactions within their reachable neighborhood and with the nodes introduced by the NewsFeed builds their individual user profiles. Users interactions with content outside the platform also helps in building the profile.

Apps: The platform gives a general purpose connectivity and interaction mechanism to the users, whereas the Apps give a context to user profile. Apps serves a specific functionality (e.g., finance, education, dating, et al.) to its users and that functionality is a stronger measure to categorize users. Apps can opt for monetization of their functionality by serving advertisements to the users via the App. Apps obtain analytics over their users interactions. The analytics information contains attributes (like mobile advertisement ID, Facebook UID, email, phone, Device info, location, etc.) that can uniquely measure interactions of App users. To advertise itself, or to persuade its existing users the App may share its analytics with advertisers to target the existing and new users.

Advertisers: Advertisers are the paid interfaces to the platform’s ability to find precise audiences for a specific category/issue. Advertisers build advertisement campaigns by requesting specific audience type from the platform against a fee. To build the audience request, advertisers upload data fields that are compared against the user profiles that are built by the platform. Upon evaluating the scope of campaign targeting based on the uploaded data by the advertiser, the platform either accepts or rejects the request. Advertisers are allowed to micro-target a specific audience that is already engaged with it. Advertisers do so by defining events inside the Apps and trigger actions via Pixel for those events’ realization. For example, list of users who have browsed a product but did not checkout.

Pixel: It is a micro-targeting framework https://fb.com/business/learn/facebook-ads-pixel that uniquely identifies users of the platform and also the users off-the-platform. This is a script that generates a unique tracking number each time a defined event occurs. The events could be as simple as loading a website or a user selecting a product in her cart. The unique number concatenated with cookie at user side tracks the user event by event. These user behavior analytics are shared by the platform with the advertisers so that advertisers can measure the impact of their advertising campaigns.

FBAN: Facebook Audience Network (https://fb.com/audiencenetwork) is the core component of the platform and has access to users profiles generated by the platform. It has its own data-set that is built from user tracking (analytics) and other associated platforms’ meta-data information (like WhatsApp, Messenger, Instagram). It accepts audience requests from advertisers and based on the corroboration with its data-sets and user profiles, it identifies the target audience for a campaign. There exist public data-exchanges for user information, which can help enriching the profile attributes of users that come in contact with the platform.

Profiles: All individual user profiles are further enriched and attributed by the insights obtained from platform analytics and plausibly external public/private data-sets [5] (For Indian users, Facebook tried to link their Aadhaar numbers with their profiles. Aadhaar numbers are not secret but are used in various financial and public services delivery).

Filters: These determine the general access policy of the platform. For example, Facebook recently decided not to allow querying of its users (nodes) by their email/phone. This is also responsible for guiding the behavior of the platform in general. For example, to suppress a specific category of nodes appearing in the NewsFeed. Facebook had made an understanding with a large government (Project Colorful Balloons) to ensure a specific category of nodes is identified, tracked and controlled.

Having understood the roles various entities play in the Facebook ecosystem and keeping in mind those entities’ access hierarchy, the question we ask is the following:

Assuming users explicitly trust Facebook to handle their private data against the free services, and assuming that Facebook desensitizes user data before making use of it for advertisement: what privacy & leakage assurances can we expect from the platform?

As Apps are only loosely coupled with the ecosystem as compared to the other entities in the ecosystem, it is difficult to assume that (smaller) Apps will strive for achieving the same level of trust with users as Facebook may have. In the following we present a few scenarios in which Apps violate users’ privacy settings. In [21], we have presented whether Facebook users really preserve their privacy as they understand it or certain of their innocuous actions leak information contrary to their privacy settings. We would like to list those findings (at user-object layer of the platform) here:

  1. 1.

    Nonrestrictive change in policy of an object risks privacy of others,

  2. 2.

    Restrictive change in policy of an object suspends other’s privileges,

  3. 3.

    “Share” operation is privacy-preserving,

  4. 4.

    Policy composition using intensional labels is not privacy-preserving,

  5. 5.

    “Like”, “Comment” operations are not privacy-preserving.

In this paper, we extended the scope of our investigation to higher layers in the platform: that is, App layer and advertiser layer.

4 Experimental Scenarios of Access by Apps

In this section we list out our experiments using apps and advertisement facility of Facebook and highlight their potential in undermining user’s privacy and security. The experiments are carried out using Facebook APIs (v2.12) and our findings are reproducible as of April 13, 2018. This sort of gap analysis in privacy policy conformance across platform is ignored [7], and precisely due to the lack of a platform-wide, coherent, privacy policy enforcement, rouge apps are tracking and siphoning off user data.

Fig. 4.
figure 4

Scenario: Alice has installed App1. Bob is Alice’s friend

4.1 App Finds Out User’s Friends

Facebook has deprecated Apps to access its user’s friend list. Consider a scenario as shown in Fig. 4, in which Alice has set her list of friends to private in her privacy settings. This setting sets an expectation that Alice’s friend list will not be available to others. Alice installs App1 with permission user_posts. This permission allows App1 to reach all of Alice’s posts and their fields (comments, reactions, post privacy settings). Figure 5 is the list of posts retrieved by App1 from Alice’s timeline. Figure 6 shows the retrieval of comment & reaction on the first post in the list shown in Fig. 5. Facebook’s NewsFeed function presents updates from Alice’s timeline to her friends (Bob). When a friend interacts with the post, App1 can observe it and deduce with high probability that Bob is Alice’s friend. The probability of such an inference is 1 when Alice has given App1 permission to post with post’s access policy as “Friends”. Similarly, depending on post’s permission policy setting, App1 can reason about Family et al.

Fig. 5.
figure 5

List of posts retrieved by App1 from Alice’s timeline

Fig. 6.
figure 6

Retrieval of comment & reaction on the first post in the list shown in Fig. 5

4.2 App Can Access User Objects Despite “Only Me” Policy

Consider in Fig. 4, Alice changes the access policy of her post P1 to “Only Me”. This implies that only she can access this post. However, App1 can still access the post P1 even when Alice sets the policy to “Only Me”, see Fig. 7.

Fig. 7.
figure 7

Results of App1’s query to Post1

Fig. 8.
figure 8

Scenario: Alice has installed App1 and App2. Bob is Alice’s friend

4.3 App Can Find Out Other Apps Installed by the User

Consider the scenario shown in Fig. 8. Both the apps have permission user_posts. App2 (i.e., anshx.ananx as its real name in our experiments) has one additional permission publish_actions as shown in the figure. Let us assume that App2 publishes a post on Alice’s timeline. App1 can observe this event and can obtain the post ID. Figure 9 shows the query composed by App1 and its result, through which App1 deduces that Alice has also installed App2. Such a knowledge is useful is various ways.

Fig. 9.
figure 9

Query composed by App1 and its result

Fig. 10.
figure 10

Retrieving App user’s device information

4.4 App and Advertiser Can Identify Users: Linkability

Figure 11 is the analytics report for a campaign we designed for a Page under our control. The analytics is available in real-time. The campaign was to invite users to follow our page on “Online Privacy”. We could correlate the Likes (by Facebook users) on our page with the feed sequence report and find out which user has accessed the advertisement from what type of device and device OS version. This information greatly narrows down the types of attack payloads one can design to compromise a device. We could also access App user’s Device Information (Fig. 10).

A summary of privacy violations & data leaks from the above scenarios is given below:

  1. 1.

    App finds out user’s friends despite user setting it private.

  2. 2.

    App can access user objects with “Only Me” policy.

  3. 3.

    App can find out what other apps are installed by its users.

  4. 4.

    Linkability: App and advertiser can identify their audience from the analytics data.

4.5 Analysis

Given that the trust levels of Facebook and an app are not comparable, the question is how Facebook can control such data leaks? Some of the broad ways to contain these data leaks are:

  1. 1.

    By increasing the user’s privacy policy specification scope from current user-object layers (refer Fig. 2) to all the layers of the platform, except the owner’s layer. The current approach is fragmented and incoherent – that is, impact of changes at app layer on in-force settings at user layer is not communicated to users. The use of naturally understandable labels like “Friends”, “Family” should be devised to categorize apps and advertisers, using which user can define her access policies.

  2. 2.

    By encrypting the analytics available to apps and advertisers such that per campaign a distinct but ciphered string is generated for each measurable event that cannot be used to track users across campaigns. Only the platform owner should link the events across campaigns. Thus, only one entity takes the accountability.

  3. 3.

    It appears that Facebook is trying to address this issue of linkability through the concept of scope_id. A user is assigned a unique local ID, whose scope is limited to the context (App, Page) for which it is generated. For example, App1 will generate a scope_id, which is different from the scope_id generated by App2. Thus App1 and App2 or their parent cannot link users. However, we observed that, as of now, these scope IDs are resolving to the real user ID for whom the scope IDs were generated. For example, https://fb.com/100007460080360, https://fb.com/2051781625080487, and https://fb.com/1708004396124880 reveal the actual user.

Fig. 11.
figure 11

Campaign measurement report

5 Related Work and Discussion

Social networks like Facebook, Twitter, Snapchat have come to prominence in last decade because of their ability to engage users online such that users can carry out their social discourse 24\(\,\times \,\)7, around the world. As the users get convenience and real-time engagement with their connections for free, the platform gets user insights. The platform recovers its operational costs by sharing the insights in plausibly privacy-preserving fashion with advertisers https://fb.com/ads/about/. The rich data-sets generated by such social networks have ushered: advertising into a real-time persuasion industry [17, 24, 25], communication into a precision tracking system [1, 6], and social network platform into a rich user/content/relation labelling platform. All of these transformations have brought in tremendous challenges [18] in terms of privacy of users.

Privacy in social networks has been studied for quite some time and the research community had been highlighting privacy implication of connectivity [13, 23] even before the Cambridge Analytica fiasco. In [12], a survey on security and privacy in social networks is presented that touches upon properties like: anonymization, de-anonymization, link predictability [10, 14], information leakage, trust [20], and link privacy [19]. In [9], a privacy-preservation model for Facebook-style social network is proposed. Concepts for privacy-preservation in an app ecosystem, presented in [15] for mobile platforms, can be borrowed in Facebook’s platform. Facebook’s infrastructure [2] is a unique and not much is available in public. It remains interesting to see how Facebook adopts to the forthcoming European GDPR [4] regulation. The data generated across layers of Facebook platform is interlinked and once a data-tuple is associated with personal data, it becomes tainted and the tainted attributes propagate user’s identity further. Under GDPR, when a Facebook user invokes her right to be forgotten/erased, it will be interesting to see how far the data deletion chain goes; since the data is linked across the ecosystem. We believe that Facebook will have to define context and scope of user information and the deletion of user data will happen within that pre-defined scope.

6 Conclusion

We presented the role Apps play in tracking and profiling users on Facebook platform. We have shown a few instances of App configurations that violated the underlying primary privacy settings of the user. Apps may use such shortcomings in policy enforcement for various reasons that can seriously undermine not only the privacy of users but also their security. From the study of ecosystem on Facebook’s platform we showed that Apps potentially have as much visibility of its users’ objects, connections, and interactions as Facebook itself. If a coherent access control model across layers of Facebook ecosystem is not deployed, then Facebook with its ad-hoc approach will remain a sophisticated surveillance system available to any user. People, including lawmakers, around the world are asking Facebook should it really be expanding into influencing people based on what it has captured as their profile? This conundrum is multiplied in the presence of millions of Apps on its platform. App permission management need to be made understandable and available as extensional/intensional labels similar to permission management at users layer. It is not hard to see why our recommendations based on our analysis demands expansion of the scope of user privacy policies across user layer, app layer, and beyond.