3.1 Introduction

Smart cities are increasingly receiving attention and praise as the future model of future living and working. Smart cities promise an improved quality of life and more informed decision-making, so it is no surprise that major headlines follow whenever a new smart city project is announced. Major cities around the world believe in the potential of smart cities and are buying in, hoping to become one of the premier and trailblazing cities that establish a leadership position in becoming a “smart” city (Earth.Org 2021).

According to McKinsey and Company (Woetzel et al. 2021), there are three layers of smartness that work together to produce an effective smart city, namely bottom, middle, and upper layer. The bottom layer consists of any device that has sensors and can interconnect with other devices via high-speed internet. Hence, any device that can collect data such as microwaves, traffic lights, digital doorbells, ovens, and buses are categorized as the bottom layer. The middle layer consists of applications that convert raw data collected from smartphones and sensors into metrics. These metrics are further analyzed into meaningful insights. Any software organization that develops or maintains software that aggregates and/or processes the data for analytical purposes is also part of the middle layer. After all, the focal point of the middle layer is deducing logical conclusions from the data collected in the bottom layer. Finally, the upper layer consists of organizations, municipal bodies, and the public as a whole. The first two layers set the groundwork to inform and help the third layer make rational decisions. For example, data about the capacity of buses would allow commuters to decide on the best route to take and avoid overcrowding or extended waits. Similarly, data about water consumption could help inform dwellers about their water usage and prompt them to repair broken or leaky pipes that unnecessarily waste water.

Ultimately, the crucial component to a successful smart city revolves around the robust integration of data and technology (Tuerk 2019), in particular, the linkage of data and technology to facilitate key decision-making. This linkage of data and technology creates new opportunities and benefits for software organizations, whether an organization creates sensors or devices to collect data or develops software that seeks to analyze metrics or data collected from individual devices. A new age of innovation is upon us, where an organization can specialize in aggregating and conducting large-scale comprehensive analysis on real-life data and produce useful results that are later used by other organizations. A significant range of business opportunities now exists for organizations that help process and understand the raw data.

Based on current levels of data and device interconnectivity, cities still have a long way to go before they reach the pinnacle of “smart” cities (Tuerk 2019). Yet, the battle for the top “smart” city is ever so fierce as cities around the globe engage in new infrastructure and technological projects often with the support of their national governments. Some of the more pronounced proposed smart city projects at one point included Hudson’s Yards in New York, Sidewalk Labs in Toronto, and Xiong’an New Area near Beijing. Each project involves massive amounts of investment in people, property, and technology. For instance, the Hudson’s Yards project was proclaimed as one of the most expensive private real estate projects in the history of the United States. Likewise, the Sidewalk Toronto project was supposed to develop one of the largest areas of underdeveloped urban lands in Canada and build a testbed neighbourhood for groundbreaking technologies. When completed, the proposed “neighbourhood” will have the highest levels of sustainability, economic opportunity, housing affordability, and new mobility.

Smart city projects are not without risks. Specifically, the Sidewalk Labs project was marred with controversy from its inception before ultimately terminating in 2020 due to economic uncertainty stemming from COVID-19. Critics of the project fiercely opposed the planned development due to the lack of transparency regarding data privacy and the potential for personal data infringement once the project is completed. Moreover, the project did not adequately disclose how or where data would be stored. While the project organizers assured Toronto residents that data would be anonymized before disclosing to the public, previous studies have shown that anonymized data can still be combined with other datasets to de-anonymize the protected data.

Ultimately, Sidewalk Toronto failed, but similar projects are in preparation or under development in other cities. The data privacy challenges highlighted by critics of the project represent a legitimate concern about smart cities. In this chapter, we discuss the difficulty of managing data privacy for software organizations that may develop software for processing, managing, or generating data. In particular, we describe the difficulty for software organizations in achieving a shared understanding of privacy and complying with privacy regulations.

3.2 Managing Privacy

As previously mentioned, the collection and analysis of data are critical to the success of a smart city. Correspondingly, the safeguarding and adequate handling of any private data are also vital to the privacy interests of the local population whose data is collected. Yet, when we take a deeper look at modern, agile organizations that develop software, we notice that they often have to make difficult trade-offs when dealing with software attributes, such as privacy. A previous study found that small, agile software organizations using continuous practices (i.e., a software development methodology emphasizing automation and rapid feedback that is ubiquitous in modern software developing organizations) manage software attributes, like privacy, via four main practices: (1) put a number on the attribute, (2) let someone else manage the attribute, (3) write your own tool to check the attribute, or (4) put the attribute in source control (Werner et al. 2021).

For the studied organizations, to deal with attributes of their software, the first step is assigning a number (i.e., metric) to the attribute. Assigning a metric to the privacy attribute may seem rudimentary or trivial; however, it is critical to reliably test whether the attribute was achieved or not. For example, a privacy attribute may prescribe that personal data may not stay on record for any longer than 90 days. Identifying the specifics regarding the legal limits for the data storage is at the crux of satisfying the privacy attribute. Imagine if the privacy attribute instead prescribed that “personal data may not stay on record for a lengthy period.” In this scenario, a developer or team in charge of carrying out the privacy attribute would have the onerous task of trying to discern what means “lengthy period” means. The use of “lengthy” becomes a precarious problem, as either 30 or 180 days could be considered lengthy depending on who is asked to make the determination. Moreover, such ambiguous privacy attributes make communication to customers and users a difficult problem for the software organization. As previously noted, a primary concern on the part of smart city residents involves the handling of the privacy of the data collected within the boundaries of the smart city. Hence, transparency and communication of how privacy is handled are paramount to satisfy the concerns of residents. Providing clear measurements regarding how or what data is collected is more convincing than ambiguous alternatives.

The second practice of letting someone else manage the attribute is also known as “offloading” or “outsourcing” and commonly occurs when software organizations offload services or software to third-party providers. For example, a software organization making use of Amazon Web Services (AWS) to host applications or store data would fall under the classification of offloading to a third party. Such offloading is ubiquitous in the software community as third-party services offer convenience at competitive prices. An alternative to third-party services is conducting the service in-house, which could cost more and create significantly more hassle. Another consideration that encourages software organizations to offload is that any third-party services help alleviate some responsibility regarding privacy and security. For instance, a software organization that collects residents’ movement data in a smart city would need to ensure that the data is securely stored in databases on the premises. However, if the organization is instead using a third-party database provider to manage data storage, the organization has much less responsibility. While the organization retains the responsibility to ensure that the databases in the cloud are finely tuned to not leak personal data, the organization is cleared from the day-to-day management of the privacy and security of the databases. The caveat with offloading is that a software organization most likely needs to pre-plan the location of the data storage. Many national and local governments have stringent restrictions on the physical location of data that require organizations to store data in specific jurisdictions. For example, universities in the province of British Columbia must abide by the Freedom of Information and Protection of Privacy Act (FIPPA), which requires that personal and personally identifiable data must be physically stored within Canada.

The third practice of writing your own tool to check the attribute is similar to the second practice. The main difference is that the organization develops its own tool instead of relying on a third-party service. Nonetheless, considering the cost of developing one’s own tool, writing your own tool is often seen as a last resort for small, resource-constrained organizations. While it is much more economically feasible for an organization to sign up for a third-party service, the organization may be forced to customize. For example, an organization may develop its own monitoring system to ensure that data is collected as expected.

Finally, the last practice involves writing down an attribute in the software. As we will later discuss in this chapter, reaching an adequate level of shared understanding of privacy is difficult and so documenting and writing down necessary details as frequently as possible is beneficial to managing a privacy attribute. In the study by Werner et al. (2021), it was observed that developers often opted to record knowledge about a software attribute through codification or related artifacts. This approach is viewed positively, as developers perceive that they themselves and other developers can easily find the knowledge in the code base of the software.

3.3 Challenges with Managing Privacy Compliance

While the aforementioned practices can assist in managing privacy, several noteworthy challenges were found in the study regarding managing privacy. In particular, automation and shared understanding of software attributes are viewed as significant challenges.

3.3.1 Automation

Automation is a challenge, as not all privacy attributes are suitable for automated tests. Some attributes may be naturally difficult to test, leading to hindrances on the part of the developer to implement an automated framework to verify the attributes. For example, one area of concern regarding data collection is that a person’s personal data should be accurate. However, verifying the accuracy of the data collected on the part of a data-gathering organization may be onerous, with some form of manual intervention. While it may be in a software organization’s best interests to develop a tool to conduct automated testing of the important privacy attributes, the ease of developing clear testable metrics of the privacy attributes may be arduous. If the organization’s developers also lack knowledge of privacy regulations, defining metrics of the privacy attributes may be further limited.

3.3.2 Shared Understanding of Privacy

One associated problem with any requirement, especially privacy, is that there is not an equal level of understanding amongst a group of people. That group of people may include developers, project managers, managers, or any other stakeholders. If we think about the term privacy on its own, it would be difficult to form the same definition amongst a group of people, as everyone has their own definition. Unfortunately, most software requirements focus on the functional aspects; for example, the user clicks X and Y is shown. Furthermore, writing privacy requirements is a difficult task, and the requirements may be hard to test. So, what ends up happening is that a data collecting software organization claims they are a privacy-compliant solution, but no one can point somewhere in the code to say, “there we have privacy,” and no one can really test it, since the definition may be ambiguous.

One problem a data collecting software organization may face when dealing with privacy is how to interpret any existing privacy legislation, as this usually requires a lawyer to understand the language in the legislation. Unfortunately, not many software developers are lawyers—and not many lawyers are software developers. The language in which legislation is written is likely to further contribute to a lack of shared understanding.

A privacy requirement may be dictated by some legislation, a project manager might write their own interpretation of that requirement in some development task management tool, a developer might read that requirement and develop software that conforms to their own interpretation, and finally, a tester might read the requirement and look at the product to assess that the privacy requirement was completed. However, not every privacy requirement will suffer, as some privacy requirements might be easier to interpret, implement, and test; for example, a database must be encrypted using AES-256 encryption.

Continuous software engineering (CSE) focuses on a succession of rapid cycles whereby software is released frequently, often as multiple short releases per hour or even in minutes. CSE induces and encourages an environment that requires software organizations to be able to handle a fast-paced environment. Research in practice has shown that CSE typically favors the release of features that users can actually use (perhaps new features or correcting buggy previously released features). Aspects of software, such as privacy, that do not necessarily impact or affect a feature visible or accessible to a user ultimately fall to the wayside. Research has indicated a correlation between CSE and a decreased level of shared understanding amongst requirements that do not exhibit direct functionality to users, such as privacy (Werner et al. 2020).

A side effect of CSE (and the associated environment experiencing a fast pace of change) is a lowered level of domain knowledge and inadequate communication. Domain knowledge is the business-specific context required to compete in a particular domain. Lack of domain knowledge can cause a lack of shared understanding of software attributes if a data collecting software organization is entering a new, unfamiliar market where even a basic understanding would be beneficial. Reaching consensus on the priority of an obscure requirement, such as privacy, may be difficult due to a discrepancy in perceived importance between different units within the organization. Additionally, the importance of a privacy requirement might be lowered until privacy is demanded by a particular customer.

CSE may also limit the ability of employees to communicate. Often developers may be isolated from one another, even if they are working on the same solution. The lack of communication may be a systemic problem related to the lack of domain knowledge, or simply an oversight on the part of a developer or a software organization. Alternatively, a developer may make false assumptions about privacy and just assume that privacy has already been handled elsewhere and is not within their current scope. Finally, privacy could be such an all-encompassing requirement that everyone assumes somebody else would handle such an important requirement, despite a lack of any documentation or notice. Ultimately, a communication breakdown about privacy can be disastrous, especially if privacy is supposed to be a key component of such a project.

3.4 Solutions

We discussed at length the difficulties and challenges with managing privacy in software organizations that are involved in collecting or analyzing data for smart city projects but have not yet described possible solutions.

3.4.1 Developing a Shared Understanding

Before embarking on actually achieving privacy compliance, it is wise for a smart city developer to invest in building a shared understanding of privacy. The shared understanding should encompass a variety of stakeholders invested in building the smart city. In particular (and at the very least), project management, development, and legal teams should build an equal, shared understanding with respect to privacy.

The first stage would involve a number of lawyers: lawyers that are familiar with the local laws where the smart city will be located, lawyers who are experts in privacy compliance, lawyers with experience in software, and additional lawyers lacking domain knowledge to help uncover tacit knowledge (Niknafs and Berry 2013). These lawyers should start building a shared understanding of the local laws and privacy compliance amongst themselves, likely bringing in project management and development teams as the level of shared understanding is substantial. However, we must also recognize the difficulty in discerning some new privacy laws, even for seasoned lawyers. Some privacy laws, such as the General Data Protection Regulation (GDPR) enacted in the European Union, were purposefully written in a broad and ambiguous manner to account for potential future technological advances and to provide legal guidance for a multitude of industries. Notwithstanding the reasoning for the ambiguity existing in the GDPR, some software organizations adopt a rather pragmatic wait-and-see approach to new privacy regulations as they want to see the degree of penalties that privacy regulators inflict on violators before deciding the level of compliance preparedness they should adopt. Further complicating the ease of uncovering knowledge and awareness about privacy regulations is that developers are ultimately the employees who implement privacy attributes in software. Even if lawyers have an abundance of privacy knowledge to share with developers, the transfer of knowledge is not trivial. Lawyers typically have little technical training, and developers rarely have a background in law, which can inhibit communication between the two parties.

Next, there should be a large effort to purposefully disseminate privacy compliance information, using both formal and informal techniques. Unsurprisingly, these techniques require adequate communication and documentation to succeed. Achieving a high level of communication and documentation might seem trivial to accomplish but is often harder than anticipated, thus requiring special focus to ensure that the communication and documentation are effective in disseminating information.

Once development teams have built and are able to maintain a shared understanding then a set of shared development standards should be designed and implemented to capture how privacy compliance will be documented, implemented, and met. The task of developing and sharing these standards requires a shared understanding to be built and maintained before development.

At this point, and only at this point, should actual development begin–incorporating the standards to achieve privacy compliance. The reason that development cannot begin prior to developing a shared understanding of how to achieve privacy compliance is that privacy attributes are not able to be shoehorned in after the fact, especially with software. Software that meets high levels of privacy compliance must be designed with privacy compliance in mind from the get-go (Cavoukian 2009).

3.4.2 Achieving Privacy Compliance

At this point, a software organization can move towards realizing privacy compliance in the software. There are two critical components to any software organization building and maintaining a shared understanding of privacy compliance: documentation and communication. While documentation and communication are two innocuous and recurrent terms repeated ad nauseam by any software organization aiming for success, achieving these tenets is not necessarily trivial. Documentation is often the crucial first step, but documentation is notoriously costly and expensive, especially for small organizations that may not have massive resources. Instead, there is a minimum level of privacy documentation that organizations may find more suitable for their situation. How much privacy documentation is needed depends on the organization and the situation, which makes discerning the level of documentation required such a challenging task. The likely prudent approach that an organization can take is identifying and recording the specific criteria of a privacy attribute as well as the test case for verifying the completion of the attribute. To maximize the usefulness of documentation, a software organization should aim to document the attributes deemed most valuable to the organization or otherwise most mission-critical to the organization’s business.

Another aspect that closely relates to documentation is communication. Once documentation is created, it is essential to share the documentation with all relevant stakeholders. At this point, communication is vital for disseminating documentation and relevant information to stakeholders. There are numerous mediums to facilitate communication including video conferencing, face-to-face meetings, texting, phone calls, or even emails. Regardless of the medium of communication, one key component is support from management in guiding employees to disseminate knowledge. Additionally, management and other stakeholders should help prioritize the important privacy attributes that highly impact the organization. If communication only flows in a unilateral direction (i.e., top to bottom) where experienced employees offer important information to junior employees, but the reverse flow of information does not occur, the organization risks developing tacit knowledge that only a select few individuals become aware of and increasing the organization’s “bus” factor, a measurement for how many team members need to suddenly disappear for a project to fail or suffer significant setbacks due to a lack of knowledge.

Once sufficient privacy documentation and communication is achieved in a software organization, the next step involves developing tool(s) to verify that the privacy attributes the organization documented and prioritized are realized. Closely related to the aforementioned third practice to help manage privacy, writing your own tool or using an existing tool, if such tool exists, is meant to help an organization check to see if privacy attributes are achieved on a continual basis. The logic for continual verification is simple. It is of little use to a software organization if a privacy attribute (e.g., personal data must be deleted after 30 days) deemed significant is only achieved for a brief period of time. A prioritized privacy attribute most likely fits the long-term interests of an organization, so it is in the software organization’s best interests to effectively treat the privacy attribute long term. Therefore, a privacy compliance check that an organization conducts on a semi-annual or annual basis is not effective for achieving privacy compliance, as the organization has little insight into the state of compliance during the interval between compliance checks. Instead, the auspicious approach that an organization should adhere to is one where it continually checks to see that its software satisfies the privacy attributes clearly documented and communicated to all relevant stakeholders.

The type of tool to check for privacy compliance varies by the organization, as each organization may have a different list of prioritized privacy attributes and associated software. For example, an organization may develop a tool to automatically verify that the cloud infrastructure it deploys on third-party services complies with privacy attributes. However, developing a tool is only one step of continuous privacy compliance. The other critical element that an organization must not omit is deploying the tool to its production system so that the tool is automatically executed on a continual basis. To quote Martin Fowler, a pioneer in the continuous integration movement, “imperfect tests, run frequently, are much better than perfect tests that are never written at all” (Fowler 2006). If an organization develops a tool to check for privacy, but rarely executes the tool, then the tool contributes little to the organization’s compliance. While the frequency of test execution is up to each organization to determine the best interval for its situation, weekly test executions seem reasonable in the general case.

3.5 Conclusion

Treatment of privacy is a vital issue for smart cities. It is necessary to have a clear plan for managing privacy attributes, especially if proponents of a smart city project want to assuage critics who worry about risks to privacy. More importantly, each software organization that plays a role in the smart city project, whether it assists in data collection or analysis, must adequately manage privacy. After all, the amount of data generated and analyzed in smart cities is unprecedented, thus a heightened focus must be placed on protecting personal privacy.

In this chapter, we discuss a few challenges that a software organization working on a smart city project may encounter when trying to manage privacy, but we also describe several practices that an organization can adopt to effectively manage privacy. Ultimately, such an organization must make privacy a high-priority initiative, as—without clear motivation and willpower—attention to privacy may not reach a sufficient level.