Abstract
The question of how to make a city or government better by exploiting information and communication infrastructures, referred to as smart city, entails an emerging field of research. Large quantities of data are generated from these infrastructures and infusing these data into the physical infrastructure of a city or government may lead to better services to citizens. Collecting and processing of such data, however, may result in privacy and security issues that should be faced appropriately to create a sustainable approach for smart cities and governments. In this chapter, we focus on data collection through crowdsourcing with smart devices and identify the corresponding security and privacy issues in the context of enabling smart cities and governments. We categorize these issues in four classes. For each class, we identify a number of threats as well as solution directions for these threats.
Keywords
1 Introduction
The developments in communication and information technologies have entailed an explosive growth of data in recent years. In the context of “smart cities” and “smart governments,” organizations look for opportunities to take advantage of large quantities of the available data to create a more comprehensive view of a city or government (Choenni et al. 2010; Choenni and Leertouwer 2010). Such a comprehensive view may improve policy decision-making and may lead to better services to citizens. Technological developments make it easier to involve citizens in the data collection process. This approach is regarded as data collection through crowdsourcing (Ganesan and Corner 2011; Taylor 2010). Due to the growth of smart devices, such as smartphones equipped with sensors, citizens carry measurement devices that can easily collect data about several phenomena in a city. Examples of these phenomena are street litter, deterioration of rural areas, and air pollution. These citizens may be regarded in some sense as agents of the policy-makers/local governments for data collecting. Emphasizing greater citizen involvement and participatory government, local governments stimulate active partnerships and collaborations between citizens, the private sector, and the municipality (Stembert et al. 2013). There is a wide variety of applications for mobile devices allowing citizens to collect data. For example, in the Copenhagen Wheel, some sensors are attached to city bicycles in order to report data about pollution, road conditions, congestions, etc. via such an application.
Unlike in crowdsourcing, citizens may also passively and unknowingly be involved in collecting data. Today, users download and use apps that are equipped with several tracing and logging functionalities. Users often do not know which data these applications collect and to which entities they pass the data. It also occurs that users do not change the default settings of the tracing and logging functionalities. This is partly because users do not know how to change these settings or are simply unaware of these functionalities.
Involving citizens in data collection may raise several issues concerning privacy, security, misinterpretation, or even abuse. To what extent does (extra) data collection take place without the knowledge of citizens? To what extent can the data collected by citizens be shared with other citizens and institutions? To what extent is data leakage from mobile devices acceptable? Suppose that the collected data about trees is leaked and it can be concluded that many trees in a district are ill. Then, a possible reaction of the inhabitants of the district can be to cut down these trees. Another consequence of data leakage might be that people abuse the data for their own interests. Suppose that one may conclude from the citizens’ ratings that an area is indeed deteriorating. Combining this data with, for example, crime statistics that pertain to the area, someone could try to influence the prices of the houses in that area. Gutmann et al. 2008 and Kalidien et al. 2010 mention the possibility of using survey or administrative data to disclose the identity of individuals or groups to harm individuals, population subgroups, or business enterprises. An intruder might use the attributes of a small area to identify certain characteristics of individuals (e.g., ethnicity) in that area, possibly exposing them for repression or other harms. One example they mention is the use of the US Census of Population data to identify small areas with large proportions of Arab Americans after the events of September 11, 2001.
In this chapter, we provide a categorization of the security and privacy issues of crowdsourcing and accordingly present a number of guidelines to deal with such issues. To gain a sustainable value from crowdsourcing data, we need a continuation of citizens willing to collect unbiased data. The chances for having such a continuation increases whenever the security and privacy issues are handled in an adequate and transparent way and the misinterpretation and misuse of data are prevented. Therefore, we base our categorization on two criteria: whether the data access is authorized or not and whether the data use is authorized or not. Hereby, we directly relate crowdsourcing to two foundations of any trusted data collection process, namely: data privacy and data misuse. We discuss a number of mechanisms to deal with these privacy and security issues. These mechanisms include providing feedback to data subjects (i.e., those who own data) about the status of their data, reporting of aggregated data as much as possible, and developing safe and secure applications for smartphones.
The remainder of this chapter is organized as follows: We start with describing our research methodology. Subsequently, we discuss the role of smart devices and their users in collecting large volumes of data. We especially pay attention to the potentials that these devices offer with their embedded sensory capabilities for enabling a smart city/government. Furthermore, we discuss the differences between traditional data collection and data collection with smart devices. Then we discuss the privacy and security issues that may be raised if citizens are used as suppliers of crowdsourcing data. Next, we discuss a number of mechanisms and concepts that enforce privacy and security safeguards in data collection. Finally, we conclude the chapter.
2 Research Methodology
This contribution is the result of a participatory research, mainly to identify the security and privacy issues of deploying crowdsourcing with smart devices to collect data for enabling smart cities and smart governments. We formed a workgroup consisting of three researchers at the Ministry of Security and Justice of the Netherlands (a national government organization). In a period of 2 months the group had weekly brainstorming sessions to share their findings and experiences about the issues and to identify the solution directions. These brainstorming sessions were based on the knowledge acquired by the workgroup members in several projects carried out in the context of, among others, open data and mobile devices (Zuiderwijk et al. 2012; Meijer et al. 2013). Privacy and security aspects were important in these projects. Furthermore, the workgroup exchanged views with five other experts in the field of mobile applications and security in three occasions. During, for and through writing this chapter, the workgroup also conferred with the fourth coauthor who works for the Rotterdam municipality (a local government organization). Through these brainstorming instances, we followed a bottom-up approach to elucidate/elicit the hidden knowledge of the participants.
In between these meetings, the workgroup members carried out a literature study to learn from the best practices and the state of the art. This study led to identifying a number of security and privacy issues (in Sect. 4) and solution directions (in Sect. 5). Furthermore, to gain more insight, we observed how two mobile applications Burgerschouw (Centric 2014) and Scoor Ze (Stembert et al. 2013) collect neighborhood information in some Dutch cites through crowdsourcing.
3 Smart Devices and (Local) Governments
Today, an increasing number of apps are developed to enrich the use of smart devices. These apps range from those that ease our daily life to those that adapt our contemporary society. For example, an app in the context of a hospital appointment tells you that your appointment with a specialist is postponed by an hour because his previous appointments went on longer than expected. Such an app makes our life more convenient compared to the alternative of sitting for an hour in a waiting room of a hospital. Apps that are involved in transforming a city into a smart city are typical apps to adapt our society to the contemporary developments, such as facilitating time and place independency.
Today, (local) governments exploit smartphones as an additional channel to broadcast important messages besides conventional channels such as radio and television. Many apps exploit the data of the sensors embedded in mobile devices. A sensor may be regarded as a device that measures physical quantities or signals in an environment and converts it into meaningful figures for an observer, such as location of an object and temperature. There are apps that, exploiting sensory data, remind the owners about the interesting places and events in their current neighborhood or provide some information about the object of which they are taking a photograph. These types of apps may replace the provision of information by city halls and tourist information centers.
Besides the fact that smart devices are used to provide information to their owners, smart devices also collect and pass data to servers. In some cases, device owners are (actively or passively) involved in data collection (e.g., in crowdsourcing scenarios), while in other cases device owners are not aware of it. A typical example of the latter is a mobile app that collects tracking and tracing data. This data gives rise to a number of opportunities that may be exploited by a government. For example, crowd control is such an opportunity that may rely on the movement records of people in a city obtained from their mobile devices. As a consequence, the hotspots and crowded places in a city can be located and crowd control strategies can be defined. Tracking and tracing of mobile devices may also be useful for (police) investigation purposes. A list of persons who were at a certain place within a timeframe might be interesting information if a crime was committed at that place.
Collected tracking and tracing data may also be useful to define effective and sound policies in different sectors of our society. For example in the energy sector, energy suppliers can anticipate and influence the future energy consumption by introducing apps that make citizens aware of their energy consumption. Such apps may recommend users to turn off the heating system if the app detects that nobody will be at home for a while. Furthermore, these apps can be tailored to simulate as if people are at home while they are actually on holiday. Such functionality might be useful to minimize the risk of a burglary when people are on holiday.
Citizens may also be actively involved in gathering data for the government. They may feed the government with data whether or not orchestrated by it. On request of a municipality in the Netherlands, a selected group of citizens, for example, run an app called Burgerschouw (Centric 2014) on smartphones or tablets to rate various aspects of their district, for example, the condition of trees, verges, and streets. Citizens may rate an aspect of their district as fair, high, or low. To clarify the rating criteria, the app provides users with example pictures, for example, the pictures of what should be understood as a healthy tree (good), an average healthy tree (fair), and an ill tree (low). In this case, citizens are aware of their role as data collectors and actively perform this role. Another way to feed a government with data, which is not orchestrated by a government, is to upload data via social media sites that might be relevant to the government.
In the next section, we discuss the differences between traditional and contemporary data collection methods.
3.1 Traditional Versus Smart Data Collection
Traditional data collection is grafted on privacy law and regulations. Privacy laws govern the processing of personal data, which includes all actions carried out on the data from data collection to data destruction (DPPA 2014). There are a number of guidelines and legal frameworks to handle data processing such as the Data Protection Directive of the European Union (EU 1995) and the Dutch Privacy Protection Act (DPPA 2014). From these frameworks, six principles are extracted that pertain to the processing of personal data (Chadwick 2009; OECD 2013; Cameron 2005; EU 1995; DPPA 2014):
-
Finality principle, which refers to the purpose for which personal data is collected. The purpose should be explicit and the processing of the collected data must be compatible with the purpose for which the data was collected.
-
Legitimacy principle, which refers to proper, careful, and legal collection and processing of the data. To this end one should take into account the context within which the data is collected and used.
-
Limitation principle, which refers to having relevant, sufficient, not excessive, and correct data. This demands the data to be collected proportionally to the intended purpose (i.e., proportionality) and in ways that minimize the use of privacy sensitive data (i.e., subsidiarity).
-
Transparency principle, which entitles the data subject to know when, why (i.e., for which purpose), and by whom her/his data is processed. An individual even has the right to ask a data controller about whether the data controller has his/her data within a reasonable time and expense. And if the reply is affirmative, the individual is entitled to have the data erased, rectified, completed, or amended.
-
Security safeguards principle, which refers to having reasonable technical and organizational safeguards in place to protect data against various risks such as loss, unlawful and unauthorized access/use, destruction, modification, or disclosure.
-
Accountability principle, which states that a data controller should be accountable for complying with the measures that materialize the principles stated above.
Data collection with smart devices (referred to as “crowdsourcing” from now on) enables us to use a multitude of easily available data sources with relatively detailed data, collected for various goals and purposes potentially by a large population. This way of data collection offers many opportunities and provokes a rise of big data driven research. As crowdsourcing is mainly based on available data sources and as it often contains personal data, the use of the collected data for big data research often changes the context of the data usage and as such it potentially conflicts with the finality, legitimacy, collection-limitation principles. For example,
-
Traditional data collection emphasizes creation of original data, whereas crowdsourcing seizes the opportunities of (re)using data from existing sources.
-
Researchers in traditional data collection create their own data which enables them to define and control the principles of finality, legitimacy, collection limitation, and other legal principles concerning the data collection.
-
In traditional data collection, the citizens who participate in research give their explicit consent for collecting and using of their data.
-
Traditional data collection is a result of a research design process whereby validity and reliability of the collected data are taken care of (e.g., degree of data details, micro or aggregated records, structured or unstructured data). Such a careful research design is not possible in crowdsourcing due to reliance on existent tools, devices and data.
Thus, problems and issues might arise as described in the following section.
4 Security and Privacy Issues
Crowdsourcing inflicts various security and privacy vulnerabilities on the citizens who participate in sensing data, the citizens over whom data is collected, and on the entities (including governments, organizations, and citizens) that consume the collected data. Wang et al. (2013) identified a number of privacy and security threats that endangered the use of smart devices for crowdsourcing. In this section, we extend and categorize these threats.
4.1 Categorization Criteria
Crowdsourcing is based on available data sources and the crowdsourcing data often contains personal data. Even when the data is aggregated, the data could be combined with other data and result in revealing of personal data (Braak et al. 2012; Kulk and van Loenen 2012). In practice, it will be infeasible to reliably predict which part of the data is privacy sensitive. Even if a part of the data is privacy sensitive, the entire data should be treated as privacy sensitive. The use of crowdsourcing data imposes several privacy and security challenges due to the character of the data and the context in which they are originally created. We argue below that as crowdsourcing has to deal with existing data and as the data often contains personal data, the focus of privacy protection and security should be on the access and use of the data. Therefore, we categorize these threats according to the way that the crowdsourcing data might be accessed and used/exploited within the context of e-government applications.
Traditionally, controlling who gets access to sensitive information has been used as an important means of protection. Access control deals with granting authorized entities and preventing unauthorized entities to access resources such as data. Therefore, the first criterion for our categorization of the security and privacy threats is whether the collected data is being accessed by authorized or unauthorized entities. After getting access to sensitive data it is often not guaranteed that the data is used (i.e., processed, stored, etc.) appropriately, for example, in the way that it is desired by the data subject or for the purposes that the data is collected. Therefore, we adopt the way that the data is used as the other categorization criterion. For this criterion we consider whether the collected data is used for an authorized or unauthorized purpose.
Considering these two criteria we identify four classes for crowdsourcing data, in being used and exploited, namely: authorized-access and authorized-use, authorized-access and unauthorized-use, unauthorized-access and authorized-use; and unauthorized-access and unauthorized-use. In the following subsections, we provide some typical threats per each of these four categories, an overview of which is given in Fig. 1.
4.2 Authorized-Access and Authorized-Use
Even when collected data is accessed and used according to some defined rules and policies, there is a chance that the resulting information leads to some (privacy and security) issues.
The crowdsourcing data, for example, can be inaccurate and biased. In some applications, the crowd may rely on information supplied by others to make critical decisions (e.g., to derive hazardous traffic conditions, natural disasters, human rights violations, or political unrest). In such cases, there is a possibility of incorrect or inaccurate data being reported unintentionally or in some cases maliciously (Wang et al. 2013). A source of data inaccuracy can be due to the lack of a unique mindset among the individuals who contribute to the data sensing process. For example, if crowd is supposed to rate how clean their neighborhoods are, they need to have a common understanding of cleanness and be able to rank the scenes similarly and fairly. Having the common criteria to rate situation appropriately is a challenge. Another source of data inaccuracy can be attributed to malicious intension of sensing individuals. Through data poisoning, such individuals can inflict damages and harms on individuals and organizations. For example, a well-orchestrated malicious campaign among a number of individuals can damage the reputation of a nice neighborhood and lead to reducing of house prices there.
Collecting data through crowdsourcing is also prone to the so-called signal error when there exists a large gap of the data gathered over the phenomena under study (Zoldan 2013). One example is the use of Twitter messages to understand people’s decision-making related to Hurricane Sandy, as was reported by Zoldan (2013). Based on over 20 million tweets analyzed, the tweets about storm preparations peaked the night before the storm. Interestingly, the majority of the tweets originated from Manhattan rather than the hit areas (e.g., Seaside Heights and Midland Beach) because of the high concentration of smartphone and Twitter usage in Manhattan, and power outages and low battery power levels of mobile devices in the hit areas. As such “there was a huge data-gap from communities unrepresented in the Twitter sphere” (Zoldan 2013). One can imagine how it would have been if rescue missions were decided based on those Tweets solely, without considering the context within which the data is collected (in this case, power shortage and spatial concentration of sensors in the areas). The crowdsourcing data, moreover, is subject to the so-called confirmation bias, where people tend to search data in such a way that their previous viewpoint is confirmed regardless of what the data truly conveys (Zoldan 2013). Considering the issues mentioned above, one may perceive the crowdsourcing data as less accurate and less trustworthy. Consequently, the systems and services that fully rely on such inaccurate data may inflict security and privacy threats on the users of such systems and services.
Users who are part of the crowd that contribute to the data can also be subject to some retaliation threats (Wang et al. 2013). For example, if someone reports a domestic violence to authorities, he or she may become subject to retaliation, should the suspected violator find out who reported the domestic incident. “As a result, citizens will only use the system if they trust that the system is secure and they will face no public retaliation for making reports” (Wang et al. 2013).
One goal behind data collection mechanisms should be to collect enough data that provides a view pertained to a certain purpose. One drawback of crowdsourcing based data collection is the possibility of collecting extra comprehensive/too detailed data about phenomena. Such amounts of data not only introduce extra (security and) privacy risks but also lead to information overload. Suppose that we systematically track and store timestamps of a citizen who is inspecting his or her district. If we are interested in the citizen’s district ratings, the storage of timestamps and the exact route that he or she follows is irrelevant, and can therefore be regarded as a privacy breach or may introduce information overload. This is against the collection limitation principle.
4.3 Authorized-Access and Unauthorized-Use
When the collected data is accessed legitimately, it should still be used appropriately according to the laws, guidelines, rules, as well as the preferences of data subjects.
Crowdsourcing data may include some basic personal data from participants such as their profile data (including username, password, name, email address, and phone number), activity data (e.g., sporting, sleeping, eating, etc.), and situational data (like visited locations, adjacency to other users and objects, conversations, etc.). Such personal data must basically be accessible to a limited number of authorized entities like system administrators and specific services/systems. Authorized insiders with ill intentions (i.e., inside intruders) may reveal and misuse personal information to which they have access for their illegitimate purposes like personal satisfaction, financial gains, and political benefits. Revealing personal information makes data subjects (i.e., individuals and organizations whom the data is about) vulnerable to cyber attacks, such as identity theft, phishing, spams, and privacy breaches. Being subjected to such vulnerabilities and victimized by such attacks, the crowd may become fearful and unwilling to contribute to the data collection process. Even when voluntarily participating in crowdsourcing, users sometimes desire that their personal information should not to be shared, for instance at certain situations (like during evenings, in the weekends, and during holidays).
Integration of information systems has become a trend in the current era in various public, private, and semipublic sectors. For example, Google has merged various services like Gmail, Google+, Google Drive, and Facebook has acquired Instagram. In such cases, various databases are integrated within Google and Facebook, respectively. There are also open data initiatives to release public sector data to citizens as a measure of government transparency (Dawes 2010a, b). Such initiatives motivate combining crowdsourcing data with other data sources in order to deliver value-added services. In such cases, however, there are potential risks of privacy breaches when self-provided data of users is combined with other user data retrieved from elsewhere (Bargh and Choenni 2013). Even highly sensitive attributes may be disclosed by means of easily accessible data. Kosinski et al. (2013) show that easily accessible digital records of behavior, Facebook Likes, etc., can be used to automatically and accurately predict a range of highly sensitive personal attributes. They mention sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender (Kosinski et al. 2013). De Montjoye et al. (2013) analyzed a dataset of 15 months of human mobility data for 1.5 million individuals. They found that human mobility traces are highly unique. If the location of an individual is specified hourly, with a spatial resolution equal to that given by the carrier’s antennas, four spatiotemporal points were proven to be sufficient to uniquely identify 95 % of the individuals. They also found that even rather highly aggregated datasets provide little anonymity (De Montjoye et al. 2013).
In traditional data collection, citizens may be aware of and consent to the collection of their personal data. In crowdsourcing, however, this consent may not be present. Citizens may be unaware of the extent of their contribution to the data collected and the extent of third parties using their data. Citizens may or may not consent to the data processor which uses their personal data or shares it with other organizations. Generally, in crowdsourcing it may become unclear who is using citizens’ personal data and for which goals and purposes. Such an uncertainty and unawareness conflicts with the transparency principle.
Crowdsourcing data may be collected within a specific register (e.g., that of a municipality population, hospital, or judicial data) or a research context (e.g., a household or crime victimization survey, etc.). As such, data collection can be found for various purposes and on different legal domains (e.g., the health, criminal, or population register law). When data is used in another context than in which it is originally collected, a conflict may occur with the finality and legitimacy principles.
4.4 Unauthorized-Access and Authorized-Use
There are also cases possible where the collected data is accessed illegitimately while it is used legitimately. A typical example is the case where an employee accesses some data to which he has been authorized to, but he uses a colleague’s credentials due to a forgotten password.
Unauthorized access for an authorized use can occur also in crowdsourcing scenarios. For example, consider the case where mobile devices of citizens are traced and stored. Based on this data, one can discover and predict the movement and travelling patterns of a citizen and determine who the co-travelers are. At first, the processing and use of a citizen’s travelling and movement information may not seem interesting and relevant. Due to information overload, the data is not processed normally and therefore no privacy breaches occur. On the other hand, the timestamps and the routes that citizens follow might become very interesting data for the police if a serious crime is committed at a specific time and place around which some users were present. A citizen whose movement track coincides with the specific location and timeframe may become a suspect or may be called as a witness. Now if the police access the timestamps and route data, some questions about the unauthorized access to the data may arise, particularly whether the data use is authorized and legitimate.
4.5 Unauthorized-Access and Unauthorized-Use
The most known and acknowledged threats are those where the collected data is accessed and used illegitimately. All examples mentioned above for revealing personal information also hold for unauthorized access and unauthorized use here, where (external) intruders illegitimately get access to the systems that process and store this personal information.
Due to lack of awareness, users may reveal their personal information unwillingly. Technically such an access can be considered “authorized,” because the data subject agreed on it. But this agreement is done unknowingly when the user was unaware of the impacts and consequences. As such we consider such cases as unauthorized access and unauthorized use. Often users allow applications to access their data without knowing how their data will be actually used and without realizing the risks associated with sharing their data. Users make such wrong decisions and agree with such sharing of their data due to earning immediate gain or due to lack of transparency of privacy policies. Users usually do not understand the complex terms and conditions in privacy policies.
Mobile devices are also vulnerable to security and privacy threats and attacks. This puts end users in jeopardy of losing personal data and malfunctioning of the device. Mobile devices generally have limited energy, processing power, and communication resources. This limitation makes it difficult to run protective applications on such devices against malicious programs. Therefore, compared to personal computers, mobile devices become more susceptive to the above mentioned threats and attacks. Mobile applications are also susceptive to threats such as eavesdropping, spoofing and denial of service (Chin et al. 2011). This means that crowdsourcing applications may leak information to other applications due to possible inter process communications if they are realized carelessly.
5 Frameworks and Guidelines
To handle privacy and security issues, a mixture of procedural and technical measures may be used (Hildebrandt and Koops 2010). In the following we focus on providing some solution directions and guidelines to address the privacy and security challenges of crowdsourcing as described in previous sections. An overview of the discussed solution directions is given in Fig. 2.
5.1 Authorized-Access and Authorized-Use
When both data access and data use are authorized, one should be cautious and prudent in interpreting crowdsourcing data due to inaccuracy inherent in such data; after all the data is often produced for other purposes.
To make use of data objectively, a first step would be to collect reliable data as much as possible through harmonizing the mindset of (human) sensors. Those observers who judge situations and provide their perceptions as sensory data should be instructed in a way that the personal bias in situation scoring is minimized or eliminated as much as possible. This can be achieved through defining clear criteria for evaluating situations (e.g., having a limited number of categories and exemplifying each category through visual, vocal, and textual media). Having such a common set of criteria implies defining a universal ontology among all sensing units. Alternatively, one can allow each sensing domain to have its own ontology and perception and then devise a mapping function to relate these ontologies and perceptions unambiguously.
Subsequently, the context in which the data is collected should also be recorded, modeled, and conveyed to the data analysis process together with the data. The data analyzer should, in turn, take into consideration this contextual information when fusing data from various sources. Then the decision-making process that uses the fused data should be aware of the inaccuracy and uncertainty of the resulting data. In other words, any decision-making process should have an idea about the reliability of the data at hand and make an informed decision based on the level of data reliability. Hereto, the decision-maker should also consider the possibility of data poisoning by malicious individuals (e.g., suspicious data can trigger more precise investigations in order to make an operational decision). Eventually, the legitimacy principle should be respected by obeying the laws and rules throughout the whole lifecycle of data, including discarding the data properly after the use if required by law or data subjects.
Through data sharing, one increases chance of compromising privacy sensitive data. Such compromises undermine trust of data subjects (e.g., users and citizens). As information controllers (those who control the data of data subjects) are morally, ethically, and legally responsible for any misuse of the disseminated information, privacy enhancement techniques are often used to prevent unsavory disclosure of personal information. Moreover auditing the logs of data processing activities is used to control the adherence of data processors to laws and policy agreements. These audit procedures are carried out in longtime cycles and seem to be rather static. We envision that there is a need for a near real-time feedback from information processors to information controllers. When disseminating information of data subjects (e.g., citizens as carriers of smartphones) to data processors (e.g., smart city service providers), one can use feedback from data processors to data controllers to facilitate the privacy preservation process. Hereto feedback intrinsically serves as a trust enhancement mechanism by giving a good feeling directly to the data controllers (and data subject) to share data (Tsai et al. 2009). Moreover, within the context of privacy protection, feedback can enable data controllers/subjects to be in charge of revealing their data to other parties. When privacy policies cannot be defined in details beforehand, due to, for example, not knowing the information usage context, feedback can also be used to refine data privacy policies on the scene. Here, feedback works as an instrument for preventing privacy breaches. Feedback can also play a role in dealing with or preventing misuse and misinterpretation of data. This solution direction is a major step to realize the data transparency principle and as such to gain the public trust, in general, and to gain the trust of those who participate in crowdsourcing, in particular.
To prevent retaliation threats against those who contribute to crowdsourcing, the systems that collect and process the data should be trustful and have sufficient security safeguards in place such as applying data anonymization, data aggregation, and access control techniques. In this way, the identity of the contributors of crowdsourcing can be kept out of public reach.
Data aggregation can inherently help prevention of privacy breaches due to reduction of personally identifiable information. Nevertheless, one should be careful that the aggregated data does not indeed reveal any privacy sensitive information and does not contribute to deriving privacy sensitive information through fusing with other information, before and after disseminating of the data respectively (Kulk and van Loenen 2012; Gutmann et al. 2008).
5.2 Authorized-Access and Unauthorized-Use
When someone bypasses access control safeguards and gets access to sensitive information unjustifiably, one way to detect such intrusions is to use monitoring tools. We witness nowadays a surge of tools in the market that can help data custodians to monitor the way that their information resources are used. Example tools are those used for realizing a Security Operations Center for large organizations, and the one of (VDSS 2014) for small and medium size enterprises. Such tools provide (real-time) feedback for detecting security and privacy irregularities in information sharing systems. Based on this (near) real-time feedback the data custodians can respond to the detected unauthorized uses and also can take prevention measures for future unauthorized uses.
5.3 Unauthorized-Access and Authorized-Use
Using data for a legitimate purpose while the access is obtained from illegitimate ways requires solutions of a more procedural nature in our opinion. Primarily, there should be clear legislations and policies in place to define the conditions under which an access to crowdsourcing data becomes authorized without circumventing (i.e., denying) the access control process. Traditionally a court order is used to have access to crucial evidences, when it is recognized to be necessary. The procedural solutions in this area can also include campaigns to educate employees about seeking information through legitimate mechanisms.
5.4 Unauthorized-Access and Unauthorized-Use
Classical security solutions that realized functionalities such as access control (including authentication and authorization) and privacy enhancement techniques (including data anonymization, aggregation, and confidentiality) are mainly aimed at dealing with unauthorized access to and unauthorized use of sensitive data during data transit, storage, and processing. These techniques not only include technical solutions such as cryptographic protocols and algorithms but also procedural measures and user awareness enhancement campaigns and programs. While the necessity and use of technical solutions are rather well understood and acknowledged, user awareness solutions (both technical and nontechnical ones) are yet in its childhood phases. Human factors due to lack of knowledge, poor judgment, ignorance, mistake, overlooking, etc. are considered as the weakest point in the chain of defense against malicious cyber attacks.
A prerequisite for protecting crowdsourcing data against malicious attacks is to sufficiently have some security safeguards in place. This is also foreseen within the security safeguard principle of the data protection framework sketched in the previous sections. These safeguards not only include preventive measures (like access control and user awareness) but also detective measures to search for and detect those malicious activities and attacks that penetrate preventive lines of deference. For crowdsourcing data, therefore, we could use monitoring tools to detect suspicious processing of data and inform supervising authorities, data custodians, and data subjects appropriately.
Developing safe and secure applications reinforces the infrastructure that collects and distributes data from smartphones to backend servers. Important aspects to consider include limited battery energy of mobile devices and the scalability of security management among multiple devices. Furthermore, elucidating the requirements that a safe app should meet and the implementation of these requirements are some aspects that need in-depth elaboration.
6 Concluding Remarks
Technological developments make it easier to involve citizens in the data collection process. This approach is regarded as data collection through crowdsourcing. Smart devices equipped by mobile applications, that is, apps, appear to be ideally suited for this purpose. Furthermore, as discussed in this chapter, these devices contain an increasing number of apps that can ease our daily life and transform the city and government to smart entities. Besides the potentials of smart devices, data collection through crowdsourcing raises a number of privacy and security threats. We identified the privacy and security threats of deploying crowdsourcing with smart devices. To create sustainable smart cities it is necessary to take care of these threats. Otherwise, citizens will become suspicious and reluctant to use (smart cities related) apps that are intended to ease their daily life. This may have a negative impact on the economic growth of a city and its ambition to become a smart city.
We categorized the identified threats along two dimensions. The first dimension is whether the collected data is being accessed by authorized or unauthorized entities. After getting access to sensitive data it is often not guaranteed that the data is used (i.e., processed, stored, etc.) appropriately, for example, in the way that it is desired by the data subject or for the purposes that the data is collected. Therefore, we adopted the way that the data is used as the other dimension. For this dimension, we considered whether the collected data is used for an authorized or unauthorized purpose. For each class we discussed a number of threats and solutions directions.
One of these classes is authorized-access and authorized-use, where crowdsourcing data may result in, for example, an inaccurate and biased view of reality. Collecting reliable data requires, among others, defining clear criteria for evaluating situations by human sensors, unambiguous mapping of the different views of human sensors, or considering the data collection context during data analysis and decision-making processes. Another class is authorized-access and unauthorized-use, where, for example, authorized insiders with ill intentions may reveal the personal information embedded in crowdsourcing data, or data custodians may misuse crowdsourcing data without consent of data subjects. In such cases, one may use monitoring tools to detect data misuses carried out by authorized users (inside intruders). The monitoring can be done in real-time or in regular intervals (via for example auditing of data usage logs). Unauthorized-access and authorized-use is the third class where, for example, authorities and enterprises use crowdsourcing data when an emergency, crime, or conflict occurs. This way of data usage requires transparent solutions that are of a more legislative and procedural nature, whereby these authorities can obtain permission to process the data legitimately. Finally, we considered unauthorized-access and unauthorized-use, where (external) intruders illegitimately get access to crowdsourcing data and use the data for some malicious purposes. This case could stem from users who share their data and use their systems without knowing the risks involved or from using inadequate/ineffective safeguards to protect data or systems. Classical solutions such as access control and privacy enhancement technologies together with user awareness measures could be used to deal with such threats.
References
Bargh, M. S., & Choenni, S. (2013). On preserving privacy whilst integrating data in connected information systems. In Proceedings of the International Conference on Cloud Security Management (ICCSM’13), Seattle, US, 17–18 October.
Braak, S. W. van den, Choenni, S., Meijer, R., & Zuiderwijk, A. (2012). Trusted third parties for secure and privacy preserving data integration and sharing in the public sector. In Proceedings of the l3th Annual International Conference on Digital Government Research, New York, NY, USA, 2012, pp. 135–144.
Cameron, K. (2005). The laws of identity. http://www.identityblog.com/?p=352/#lawsofiden_topic3. Accessed 1 Feb 2014.
Centric. (2014). http://www.centric.nl/NL/Default/Branches/Lokale-overheid/Burgerschouw. Accessed 11 June 2015.
Chadwick, D. W. (2009). Federated identity management. In: Foundations of security analysis and design V; lecture notes in computer science (Vol. 5705), 2009, pp. 96–120.
Chin, E., Porter Felt, A., Greenwood, K., & Wagner, D. (2011). Analyzing inter-application communication in Android. In Proceedings of MobiSys. ACM, 2011.
Choenni, S., & Leertouwer, E. (2010). Public safety mashups to support policy makers. In Proceedings of EGOVIS 2010, Electronic Government and the Information Systems Perspective, Bilbao, Spain, August 31–September 2, 2010. LNCS 6267, pp. 234–248.
Choenni, S., van Dijk, J., & Leeuw, F., (2010). Preserving privacy whilst integrating data: Applied to criminal justice. Information Polity, 15(1–2), 125–138.
Dawes, S. S. (2010a). Stewardship and usefulness: Policy principles for information-based trans-parency. Government Information Quarterly, 27, 377–383.
Dawes, S. S. (2010b). Information policy meta-principles: Stewardship and usefulness. In 2010 43rd Hawaii International Conference on System Sciences (pp. 1–10). IEEE.
DPPA. (2014). Dutch data protection authority—College Bescherming Persoonsgegevens (2014), Dutch Privacy Protection Act (DPPA)—Wbp-naslag; Found on https://cbpweb.nl/nl/over-privacy/wetten/wbp-naslag. Accessed 11 June 2015.
EU. (1995). Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data.
Ganesan, D., & Corner, M. (2011). Crowd sourcing for data collection. http://sensorlab.cs.dartmouth.edu/NSFPervasiveComputingAtScale/pdf/1569392897.pdf. Accessed 26 Feb 2014.
Gutmann, M., Witkowski, K., Colyer, C., O’Rourke, J., & McNally, J. (2008). Providing spatial data for secondary analysis: Issues and current practices relating to confidentiality. Population Research and Policy Review, 27 (6), 639–665.
Hildebrandt, M., & Koops, B. J. (2010). The challenges of ambient law and legal protection in the profiling era. The Modern Law Review, 73(3), 428–460
Kalidien, S., Choenni, S., & Meijer, R. (2010). Crime statistics online: Potentials and challenges. Proceedings of the 11th Annual International Conference on Digital Government Research, New York, NY, USA, 2010, pp. 131–137.
Kosinski, M., Stillwell D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences of the United States of America, 110(15) Michal Kosinski, 5802–5805. doi:10.1073/pnas.1218772110.
Kulk, S., & van Loenen, B. (2012). Brave new open data world? International Journal of Spatial Data Infrastructures, 7, 196–206.
Meijer, R., Choenni, S., Sheikh, A. R., & Conradie, P. (2013). Bridging the contradictions of open data. Proceedings of the 13th European Conference on eGovernment. Como, Italy.
Montjoye de, Y-Al., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3(1376). doi:10.1038/srep01376.
OECD. (2013). The OECD privacy framework. http://www.oecd.org/sti/ieconomy/2013-oecd-privacy-guidelines.pdf. Accessed 21 Feb 2014.
Stembert, N., Conradie, P., Mulder, I., & Choenni, S. (2013). Participatory data gathering for public sector reuse: Lessons learned from traditional initiatives. Electronic government (pp. 87–98). Berlin Heidelberg: Springer.
Taylor, J. (2010). Citizens as public sensors. http://radar.oreilly.com/2010/04/crowdsourcingthe-dpw.html. Accessed 26 Feb 2014.
Tsai, J. Y., Kelley, P., Drielsma, P., Cranor, L. F., Hong, J., & Sadeh, N. (2009). Who’s viewed you? The impact of feedback in a mobile location-sharing application. Computer Human Interaction (CHI’09) ACM Press.
VDSS Vita Data Security Systems. (2014). http://www.vdss.nl/vdss-in-your-network.ashx. Accessed 21 Feb 2014.
Wang, Y., Huang, Y., & Louis, C. (September 2013). Towards a framework for privacy-aware mobile crowdsourcing. In Social Computing (SocialCom), 2013 International Conference on Social Computing (pp. 454–459). IEEE Press.
Zoldan, A. (5 October 2013). More data, more problems: Is big data always right?. http://www.wired.com/insights/2013/05/more-data-more-problems-is-big-data-always-right/. Accessed 11 June 2015.
Zuiderwijk, A., Janssen, M., Meijer, R., Choenni, R., Charalabidis, Y., & Jeffery, K. (2012). Issues and guiding principles for opening governmental judicial research data. Proceedings of the 11th European Conference On Electronic Government, EGOV’12, LNCS Vol. 7443. Springer verlag, Kristiansand, Norway, September 3–6 (90–101).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Choenni, S., Bargh, M., Roepan, C., Meijer, R. (2016). Privacy and Security in Smart Data Collection by Citizens. In: Gil-Garcia, J., Pardo, T., Nam, T. (eds) Smarter as the New Urban Agenda. Public Administration and Information Technology, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-319-17620-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-17620-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17619-2
Online ISBN: 978-3-319-17620-8
eBook Packages: Economics and FinanceEconomics and Finance (R0)