1 Introduction

The manifold experimentations conducted in the last years in the use of data—big, open data—have shown the great potential of these sources for addressing the real time monitoring of urban processes and operational actions (i.e. solving crisis situations such as traffic jams and accidents, or schedule adjustments in transport supply). However, their use and impact within the policy making processes is still a more controversial and less obvious question to be addressed.

Critical ex-post evaluations on the potential and limits of data-informed policy making have also led several public bodies such as the EU and the US Congress to understand and deepen the possibilities and the challenges related to the use of data in policy making and analysis in specific areas of application (De Gennaro et al. 2016; Jarmin and O’Hara 2016; Lim et al. 2018).

Underlining the relevance of data in promoting dynamic resource management; in allowing the possibility to discover trends and to analyse their developing explanation; in fostering public engagement and civic participation and, finally, in sustaining the development of “robust approaches for urban planning, service delivery, policy evaluation, and reform and also for the infrastructure and urban design decisions” (Thakuriah et al. 2017, p. 23), become central issues for cities that already have lots of data and, thanks to fast-evolving technologies, see the growing opportunity to collect more and faster.

Thanks to open data initiatives and emerging data ecosystems, data is shared across a multitude of actors. Policy makers, however, do not need data, they need insights. Data visualisations and advanced analytics can provide these insights, but only if they give the right answer to the right policy question. How can data experts assure this match? How and where do data activities fit into a policy making process? And what are the key aspects for an organisation to turn data into gold?

By dealing with these questions, the PoliVisu project aims to guide public administrations in adopting a data supported policy making process, developing a theoretical model describing the different stages of policy making and the role data can play in each step of the process. This topic is discussed in the first part of the paper, where practical examples drawn from the PoliVisu pilot’s direct experience are shared to further the understanding of the model and the possible uses of data for urban policies. The second part focuses on the different typologies and ways of implementing relevant data-related activities that should be carried out in a policy making context to define and answer a policy question effectively.

2 Policy Making Supported by Data

In PoliVisu, we assume to work with a model based on the policy cycle that means conceiving policy as a process, by conceptualising it as a data-assisted policy experimentation cycle consisting of interrelated, stepwise or cyclical stages.

The type and role of data analysis can change at each stage of the policy making process.

The policy cycle model (Fig. 6.1) consists of 3 stages: policy design, policy implementation, and policy evaluation. Every stage consists of several steps. The stages and steps do not follow each other in a linear manner, rather they are defined as overlapping and cyclical. Moreover, the stages of policy making tend to become more integrated and overlapping when data is involved (Concilio and Pucci 2021).

Fig. 6.1
figure 1

The policy cycle model

Before discussing this model in more detail, a common understanding is required of the data-based analysis types that can be employed during the policy making process. Here in Table 6.1 we provide the definitions, starting with relatively simple analysis types and proceeding to more complex ones.

Table 6.1 Types of data analysis

All the previous data analysis types and techniques can be employed in a data-supported policy cycle model which typically consists of three stages: design, implementation, and evaluation.

2.1 Policy Design

The first stage of the policy cycle model is policy design (Fig. 6.2). The policy design stage is focused on highlighting a collective policy problem, identifying a set of goals and objectives in relation to it and defining policy strategies and actions to contribute to solving the problem. The essential steps of policy design are problem setting, policy formulation and scenario analysis.

Fig. 6.2
figure 2

Policy design cycle and data-related activities

Problem setting highlights the existence of a problem and legitimizes it as a collective problem to be faced. It consists of an analysis of the existing policy and how it deals with the problem. It also includes a reconstruction of the public debate, and the identification of the stakeholders and actors potentially involved. In the problem setting step, data can be useful to explore the effectiveness of past policies and to better know the current urban phenomena affecting the problem. The depiction of ongoing trends and the consequent definition of the problem can be supported by the collected data. An as-is representation aids understanding the problem at hand.

In this step exploratory, descriptive, and diagnostic analysis methods can help to understand the data, describe the properties of the problem through the available data, and define the exact dimensions of the problem. Since the problem setting step is merely aimed at defining the policy problem and not at finding solutions, the predictive and prescriptive analysis methods are not suited for this step.

Several examples of how data can assist the assessment of a policy problem are given below.

Through traffic sensors and floating cars, we can collect data in real-time related to the movement of vehicles, their speed, and the occupancy of a road. With smart cards for the Public transport users and vehicle sharing systems we can learn the position and information related to each user. In both cases, data can help analysing the functioning of urban infrastructures and services.

As a regional capital, Pilsen suffers from traffic congestion because of the city design, increasing traffic and the organization of mobility and transport. In the Pilsen Pilot a traffic dashboard has been developed that shows congestion in real time, allowing to identify at which location congestion is present, how severe it is and how long it lasts. This aids in identifying the problem and getting information on the severity and the complexity of the policy problem.

Mobile phone traffic data can be used to consider the position of each device connected to the cellular network (and, consequently, of the person who owns the device). From this, we can learn mobility patterns of the owners, time-space variability of population distribution in cities and classification of urban spaces according to mobile phone uses.

In the Ghent pilot the goal is to identify the location of student residencies. In the Ghent pilot, mobile phone data was used to identify the distribution of student residents in the city. Two important lessons were learned from this. First, because mobile phone data is highly sensitive when it comes to privacy the raw data is not available for the public administration. The raw data remains with the telecom provider and only aggregated data is shared, limiting the possible types of analysis. Second, the precision of the location of a mobile device is limited to a polygon that is surrounded by cell towers.

Data from social networking services (such as Facebook, Twitter, Instagram, WeChat and others). From this, we can learn information about the location of city users and about the activities they are participating in, daily travel patterns, opinions, feelings and (self)track of habits, performances, and behaviours.

The Ghent pilot attempted to employ social media data to determine the behavioural patterns of students but had to conclude that the data could not be used. First, there was only a relatively small, and probably biased, number of users. Second, the data provided on e.g. location referred to a general point within the city and was not a reflection of the position of the user.

The Issy-les-Moulineaux pilot offered the opportunity to test some tools to crowdsource data that can evaluate the measures put in place during the local main event. During this event approximately 25.000 people were in town over a few hours. The usefulness of the tools to detect any issues related to transport and mobility through a sentiment analysis with this tool was identified. Unfortunately, data wasn’t useful due to limitations to access data of the most used, by citizens, social networks (Facebook and Instagram).

The second step in the policy design stage, policy formulation, is directed towards the identification of shared objectives and the alternative options for intervention in relation to the problem defined in the previous phase.

In this step, predictive and prescriptive analysis methods can be employed to support the choice between alternative measures. For a given policy problem several possible policy measures might exist. In this step the pros and cons of every measure are investigated and eventually one policy measure is chosen. Experimental iterations in the policy making process can be used to diagnose the effect of different scenarios. At the same time, these iterations are a good opportunity to gather training data, validate and refine predictive models.

The pilot of Issy-les-Moulineaux developed a mobility dashboard to visualise and identify the most important congestion points in a detailed way to support the policy makers in taking decisions and defining policies. As a first result, it became clear that the bulk of traffic originates from cars passing through, and not from inhabitants of Issy-les-Moulineaux. In collaboration with a local start-up an application was tested in congested areas to propose different paths, defined by the City on precise data (and not just on algorithms), and to communicate to drivers in real time.

The last step of the policy design stage, scenario analysis, can be carried out once a policy measure has been chosen by the use of different methods to “create a set of the plausible futures” rather than “forecasting of the most probable future” (Amer et al. 2013, p. 25). Such a different focus explains why scenarios are not appropriate in forecasting but rather in “backcasting”, that is, identifying desirable futures and the action required to attain them.

By considering existing trends and possible future developments, thanks to predictive and prescriptive analysis of data, it becomes possible to assess the potential benefits and costs of different alternative scenarios and, by doing this, making a decision.

2.2 Policy Implementation

The policy implementation stage (Fig. 6.3) is focused on the realisation of a policy plan. In this stage, the monitoring of early impacts is the key. The essential steps are the making of the implementation plan, its realization (implementation), on-going monitoring, and communication.

Fig. 6.3
figure 3

Policy implementation cycle and data-related activities

The implementation plan is necessary for policy implementation to be as effective as possible. While doing this the opportunities for data-based monitoring activities should be considered. When a data collection plan is included in the implementation plan, data collection infrastructures need to be designed together with data analysis for the implementation step.

In the Mechelen pilot a regional traffic model is being used to study the traffic in the city. A recent policy decision introduced the concept of “school streets”, streets that are being closed at the beginning and at the end of the school day. The traffic model combined with local traffic count data measures and analyses the impact of school streets on traffic behaviour in and around the school streets.

In this step, data can be useful to guarantee the full impact of the policy implementation and the achievement of the policy goals. Descriptive and predictive analysis methods are the most relevant in this step, because data can help to describe the current situation and context, namely all the spatial and socio-economical aspects affected by and involved in the process- the policy decision is directed to. From the other, predictive methods can be designed to foresee the impact of a policy decision and how the decision will affect the context.

The implementation of a policy might require a lot of time and can produce important temporary effects in the context. In this step, data can be useful to verify that policies are implemented as planned and to check early policy impacts. Insights from reporting and monitoring activities and diagnostic analyses can support policy tuning where needed.

This step is crucial because it can be performed in an experimental way that considers the data generated as a result of the impact of the policy interventions so far. In fact, if an effective monitoring plan is associated with the implementation plan, the implementation step can be sided by a step of on-going monitoring. In this step, data can be collected concerning all aspects that are hypothesized to be influenced by the policy decision, just as it is being implemented. An observation of the context on a daily basis can in fact create a rich learning opportunity both for the institutions as for the citizens in the context.

During the entire policy making process communication is essential. This includes communication with citizens, communities, and other stakeholders, allowing them to participate in the policy making process and even take part in the data collection process. To be most effective communication should take place in parallel with the implementation of the policy decision.

2.3 Policy Evaluation

The policy evaluation stage (Fig. 6.4) examines the desired and undesired impacts achieved through the implementation of a policy. It monitors how the policy contributed to address the initial problem, whether possible disadvantages were avoided, which advantages arose and how a policy is likely to perform in the future.

Fig. 6.4
figure 4

Policy evaluation cycle and data-related activities

How a policy should be evaluated needs be decided when a policy problem is defined. Policy evaluation takes place throughout the whole policy cycle with a final evaluation stage at the end. The definition of the issues to be faced and the objectives to be achieved already determine what data will be relevant for evaluation, and what procedures for data collection and evaluation need to be established. An evaluation not only refers to final results, but rather to the whole planning and implementation progress.

To be able to perform a policy evaluation, a data collection plan is included in the step of Impact assessment for supporting, at least:

  • multidimensional qualitative and quantitative impact assessments;

  • observation of direct and/or indirect effects—for instance at the urban level;

  • participatory evaluation—i.e. shared with, and possibly affected by, the very same stakeholders involved in the policy implementation process.

During the impact assessment step exploratory, descriptive, diagnostic, and prescriptive analysis methods can be useful, next to reporting and monitoring activities.

In the city of Pilsen, the Sustainable Urban Mobility Plan (SUMP) is currently being implemented. It is a mobility plan, up to the year 2025, which includes 82 measures for better mobility in the city. In the PoliVisu project we created tools for visualising the state of traffic before, during and after the implementation of SUMP measures. These visualisations help to evaluate the impacts of the measures.

Thanks to the analysis carried out in the impact assessment step, it is possible to discover the policy results and to evaluate how successful the policy was, based on expectations set during the policy design stage. This approach allows us to critically revise the contents of the policy measures, as well  to reconsider the nature of the problem itself. It is possible to obtain new insights on the characteristics of the problem, on its evolution over time and on possible new deployable strategic responses to tackle it.

The last step, problem restructuring, represents a moment of retrospective reflection in which the descriptive and diagnostic analysis from the data can contribute to reconsidering the initial problem. A new definition of the problem will consequently lead to the development of a new data supported policy making process, and to the definition of a new data analysis question that will guide the data related activities.

In this way, the processes of data supported policy making can be configured as a continuous set of experimental activities implemented dynamically by public administrations for the continuous discovery of policy problems and testing of possible solutions.

3 Policy-Oriented Data Activities

In a data-supported policy making process, most effort is spent on identifying or developing tools that analyse and visualise a well-defined type of data in order to support the decision process. In cases where the required data is already known, and the information question is well advanced, this approach can be effective. However, in the policy making context most policy problems still require a translation to a more specific information question and relevant data is often not easily available or identifiable. Also, data visualisations might need frequent adaptations to a varying audience and to the specific story the policy maker wants to communicate. On top of this flexibility, policy makers often have the urgency to respond to problematic circumstances. The combination of all these requirements asks for a different approach to the data activities supporting the policy making process.

Figure 6.5 provides a practical framework developed with the experiences of the PoliVisu project. The framework in which the policy makers and their support teams operate. It describes a policy-oriented approach to data-analysis and data visualisation. This approach suggests a close collaboration between data literate policy makers and data specialists. The iterative nature of the collaboration aims to ensure the data analysis is well customized to support the policy question.

Fig. 6.5
figure 5

Policy-oriented data activities framework

3.1 Differentiating Roles and Competences

The framework considers the different roles, tasks and competencies involved in the process. The main actors are the policy maker, the data analyst or data scientist, the data engineer and the developer. In small organisations, multiple roles can be covered by one single actor, but given the specific competencies associated with each role, this is not recommended for larger organisations and more complex policy questions.

The policy makers main activities and competencies are related to the policy making process. This policy-oriented data-analysis framework describes data-activities for each of the stages of the policy making process as described in the policy making model.

Although data is not the main focus, some data literacy is demanded from the policy maker. In a data-supported organisation, a decision maker or policy maker must be familiar with the possibilities and restrictions of using data. On one hand, they must be able to clearly identify and formulate the information requested to a data-analysts or data scientists. On the other hand, they must be able to interpret data results and visualisations correctly. Depending on the data literacy of the decision makers, they can be closely accompanied by a researcher, since these roles are often skilled in translating business or policy problems into information and analysis questions.

Data analysts or data scientists are charged with the analysis of the data and the visualisation of the results. In short, their task is to make data useful for the policy maker. A close collaboration of the data analyst or data scientist with the policy maker is essential to clearly define and understand the data analysis question. Data visualisations play a key role in the conversations between policy makers and data specialists. To perform their tasks, data analysts and data scientists need access to data and have access to platforms for data-analysis and data visualisation. These will be provided by data engineers and developers.

Data engineers are responsible for providing access to data, the main resource for data analysts and data scientists. To enable quick and flexible responses to policy information questions, data engineers must organise quick and generic access to (raw) data for exploration and analysis. Some of the data will be explored but not used in the final data analyses or visualisations. If the final analysis result is considered useful, the data access and data streams will need to be automated in production data-pipelines to be frequently updated and available for the end-user, the policy maker, via the visualisation. Organisational approaches and methodologies to deal with these challenges, go beyond the scope of this framework. Although the data engineer has a specific role, it is often combined with the other IT-oriented role in the process: the developer. Besides the provision of tools and platforms, the developer can contribute to the automation of analyses.

3.2 Balancing Flexibility and Usability

The required velocity and flexibility of policy-oriented data analyses do not allow a conventional development lead time for every new data visualisation. Similar to data engineers, developers need to provide generic and flexible platforms for analysis and visualisation. Two kinds of tools can be distinguished based on the target users and the trade-off between flexibility and versatility on one side, and usability and required expertise on the other side.

From one side, data analysts and data scientists need flexible and versatile analysis platforms. Since they are considered data experts the interfaces can be built for advanced users and might even be code-based. These platforms include a wide range of analysis and visualisation methods, but graphical customization options are often limited. Since data scientists are expected to deal with large and big datasets, computation performance is an important requirement for these analysis platforms.

From the other, the second type of tools required for policy-oriented data analysis still provides a wide range of visualisation options, allows some data manipulation, and deals with multiple data types. It combines this flexibility with easy to use interfaces and visual customization options. Data scientists and data analysts can use these tools to communicate analysis results with policy makers via dynamic visualisations. Less technically skilled data analysts and policy makers themselves will use these generic tools to explore data and analysis results, discover insights and build basic dynamic visualisations. These self-service tools provide data analysts and policy makers with large independence to quickly respond to policy information questions without the need for development skills or budget and time consuming IT-projects.

3.3 Transforming Iterations into Experimental Drivers

The policy-oriented data analysis framework shows common analysis steps and analysis types from a data analyst and data scientist viewpoint. The framework proposes an iterative approach to enhance flexibility and agile refinement of the analysis to fit the policy questions.

Each analysis iteration starts with defining or refining the data analysis question and the appropriate analysis type and method that will be used in the next iteration. The next step is the identification and collection of input data. It is highly recommended that a quality assessment is conducted on the collected data to ensure or estimate the reliability of the results that will be used for policy making.

In the pilot of Ghent the goal is to identify the location of student residencies in order to assess the pressure on the housing market. In the city administration several information sources on residential students were used in different departments. These datasets were gathered, cleaned and their quality was evaluated. The quality assessment revealed that the quality of the data was not sufficient and that the necessary information was not present to measure the real impact.

When starting with a new data analysis question or new data, it is recommended to perform an exploratory analysis to ‘get to know’ the data. The purpose of the exploratory analysis is to gain insight into data characteristics, to assess the potential of the data to answer the policy information question and to get ideas for the main analysis. Data analysts or data scientists explore the data using basic visualisations and summary statistics. This exploration should not take too long but it is crucial for the efficiency of the next analysis steps. Since multiple data sources might be explored, this iteration can be repeated multiple times for one policy information question.

In the Pilsen pilot, a traffic model is constructed to assess and monitor traffic flows thanks to a combination of modelling data and live traffic sensor data. A lot of possible data sources can be entered into the model, such as geo-time series of detectors and their interpolation, Police (Municipal/ČR) events and car accidents. These datasets have been explored and samples of the datasets have been used for preliminary visualisations.

Before starting the actual data analysis, the collected data often needs some manipulation. Data cleaning is about detecting and correcting unexpected, incorrect, inconsistent, or irrelevant data. The quality of the input data will determine the usability of every analysis result, no matter how complex the algorithm that is used: qualitative data always beats fancy algorithms. Data scientists spend a considerable part - up to 60% - of their time on cleaning and formatting data. It is clear that access to clean and structured data can save a lot of time and proper cleaning is essential for reliable results to support decision making.

Feature engineering is the process of creating new input features by combining or categorising the existing features of the raw data. Which features will be created depends on the analysis the data will be used for. This step requires a good understanding of the data definitions and involves domain expertise. This means assumptions made by the data analyst or data scientist in this phase should be carefully discussed with the policy makers.

After completing the data cleaning and feature engineering steps, the raw data has been transformed into an analytical base table.

In the Ghent pilot, data related to the policy problem was not available from administrative data sources. Therefore, new possible data sources were explored, such as telecom data and Wi-Fi sniffing data. To effectively understand the data that would be delivered by the telecom provider several meetings were organised with the data scientists from the company. This allowed the company to fully understand the analysis question, to construct a plan together and for the members of the local administration to understand the data results they would be receiving.

In Flanders, the Federal police road accident data has been used to map road accidents to specific locations on a map. PoliVisu was able to map 87% of the road accidents consistently on a map for the last five years. Several meetings with specialists from the police and the traffic safety institutes lead to an interactive map, including the location of schools. New data sources combined with interactive co-creative session commences furthering data manipulation experiments with ANPR data that gives more insights in traffic intensity and driving speed. The final result will be a traffic safety map instead of a road accident map.

The analytical base table can now be used in analysis algorithms to derive new information from the data. The different types of analyses are discussed in paragraph 6.2. At the end of the analysis iteration, the results are visualised and interpreted.

The iterative approach in this model suggests not to get stuck on the design of the perfect data analysis question. Instead, a first analysis iteration is executed with the basic analysis ideas. Sometimes the first results can already serve the policy maker or can be used to redirect the analysis question. Gradually, more complexity can be added to the analysis, learning from the previous iterations. The increasing complexity can be related to the policy question and the amount of variables that need to be taken into account. Analysis iterations might also gradually add complexity to the analysis method that is used. It is common to start with descriptive analysis, evolving to diagnostic and predictive analysis, to finally develop prescriptive analysis models to drive decisions.

The goal of the pilot of Issy-les-Moulineaux was to achieve a shift in behaviour concerning car use through a communication campaign. In the first, simpler, analysis iteration it was shown that only 27% of the local population uses the car, and that the congestion problem is mainly caused by traffic passing through. This allowed us to adapt the implementation plan and perform more specific and complex analyses in the second iteration.

4 Conclusions

Although data seems to be everywhere nowadays, finding suitable qualitative data is often the first obstacle to be overcome in data supported policy making. Turning the data into relevant insights is the next big challenge. Data visualisations and analyses can provide these insights if the policy question is well defined and correctly interpreted. Different data analysis types will be used depending on the phase in the policy making process. Performing the data analysis and the creation of data visualisations in an iterative way, enables the data analysis to be adapted to the needs of the policy maker while gradually increasing the complexity. The introduction of an explicit collaboration between the data literate policy maker and the data experts during these iterations, will ensure that the data response properly fits the policy question. Data visualisations with intermediate results will support this collaboration. The use of flexible data platforms and generic tools for data access, analysis and visualisation can provide the versatility and velocity requested by policy makers. The pilot experiences in the PoliVisu project permitted the development of a data supported policy making model and a practical framework for policy-oriented data activities.