Keywords

1 Introduction

The analyses aimed at spatial planning for resilience, which pursue the objectives of sustainability and reduction of vulnerabilities, require more and more data of different origins, to be integrated to create models, scenarios and prefigurations useful to interpret complex and closely related phenomena (Voghera and La Riccia 2018).

The quality of analyses itself depends on the available data, and that's why the rise of open data, open source software and new types of licences seem to be an attempt to prevent the monopoly of data by a small number of subjects in order to promote the dissemination of data and the possibility of gaining knowledge.

However, the availability of truly open data sometimes seems to be limited and suffers from a lack of accessibility and systematicity that can facilitate user access and consultation. The objective of this contribution is to recompose the theoretical framework underlying the concepts of open data, open source and creative commons and the reasons why they can support planning, in order to provide the reader with a synthetic scheme useful to orient himself in the data search phase aimed at analysis for territorial planning, and to critically analyse whether the availability of open data is sufficient to allow in-depth and comprehensive analysis of a territory and within what margins it is necessary to resort to commercial or non-open data emphasizing the need for a paradigm shift towards the open model.

2 Background: Open Data in the Big Data Era and Its Relevance in Spatial Analysis

2.1 Data as a Common Good

Everything is related to everything else, but near things are more related than distant things. (Tobler 1970)

Data are the basis of a large number of human activities, and are considered, not only in the collective consciousness but also at the legal level, increasingly a fundamental element for the performance of the same. We could define them as the oil of contemporary civilization, as they are an objective starting element for the construction of information, to be considered instead subjective as the result of subsequent elaborations on the “raw” data.

The use of data to derive information is fundamental in the public sector and in support of administration, in the private sector for the start-up and enhancement of economic activities and for the conscious participation of citizens in the public debate (Voghera and La Riccia 2018).

In this sense, access to and availability of data can be considered engines of stimulus and development for the economy, with particular reference to the service sector. In this regard, there is a tendency to consider the whole Internet as global public good.

Perhaps this is also the reason why a type of licence widely used in the digital world, the “Creative Commons”Footnote 1 set up by Lawrence Lessig and widely engaged in the promotion of the commons code, expressly refers to the “commons”, those “common goods” that, in the field of natural resources, were the subject of the debate between Hardin (1968), who, speaking of “tragedy”, argued that if individuals relied on themselves alone, and not on the relationship between society and man, then people will treat other people as resources, which would lead to the world population growing and for the process to continue; and Ostrom (1990), who instead highlighted their nature, neither public nor private, but collective and self-managed by the users themselves.

2.2 Volume of Data and Information Extraction: Big Data

The term data-driven is often used not only for companies but also for markets, indicating the propensity to rely on data to make decisions that are as objective as possible. To speak of data-driven entities, it is necessary to generate and dispose of large quantities of data, acquired in such a way as to guarantee their validity and adherence to objective reality. Regarding the first aspect, the amount of data available, it is easy to identify a strong correlation between their availability and the technical capacity to acquire and store them. In this sense, storage and computing capabilities are correlated in turn to the improvement of hardware performance, well described by the empirical assumption of Moore’s first law (1965):

The complexity of a microcircuit, measured for example by the number of transistors per chip, doubles every 18 months (and then quadruples every 3 years).

The increase in hardware performance, coupled with a drop in device prices, has led to an increase in the number of players who can create and manage increasingly large datasets, leading to an exponential growth in the amount of data produced annually throughout the planet.Footnote 2

We can consider the increased availability of data and, consequently, of information as one of the most obvious manifestations of the technological revolution we are experiencing. In recent years, the term “big data” has thus become popular. Often misused, this expression generally indicates any collection of data so extensive in volume, velocity and variety that special technologies are needed to extract information from it (De Mauro et al. 2016). In a nutshell, a human operator without tools would not have the ability to extract information from this data.

Examples of big data are the logs of accesses to a website, the profiles of a social network, the list of transactions made by customers of a large online store and their personal information, the set of surveys made by a satellite. All datasets that would be unusable, or very slow to process, without the help of special algorithms that allow us to obtain useful information. The crossing of the data of the dataset or between different datasets, in turn, allows to automatically discover unexpected correlations, which are then validated or discarded by a human operator on the basis of the well-known principle “correlation does not imply causation”.

The amount of data available is so large that most of those collected are never analysed, generating a real data gap. This amount of data is known as “dark data”, a term coined by Gartner Inc.Footnote 3 to indicate data that is collected and processed but never used. The use of these amounts of data is one of the factors that led to the development of data mining. The term “data scientist” was first used by Dhanurjay “DJ” Patil, a computer scientist and Chief Data Scientist at the US Office of Science and Technology Policy, and indicates the process of extracting non-trivial, previously unknown and potentially useful information from available data. It is interesting to note, as evidence of what was written at the beginning about data as the oil of contemporary civilization, that the term mining seems almost to consider data an external entity as much as a mineral deposit generated naturally underground. This term almost seems to recognize that the ability to acquire data has far exceeded the ability to keep track of it and obtain information from it without a specific “exploration”: a mining, in fact.

2.3 Where Is Big Data Coming From?

Who generates data? As we have seen above, data are acquired in a more or less systematic way by many actors, from public administrations to companies or non-profit organizations. Basically, any organization produces data.

To give some examples, every query made on a search engine generates data on who made this search. Similarly, the “path” of the pages visited during navigation generates a series of data that, if properly interpreted, allow those who own them to reconstruct the evolution of intentions and choices made by users: information that can then become precious to determine—for example—which advertisements to visualize in the spaces of the websites hosting the advertisements.

In concrete terms and using limited free tools, from a simple comparison between the words ArcGIS and QGIS elaborated through Google Trends,Footnote 4 we can observe the diachronic trend and the territorial areas most interested by the searches of the users of these two words and reconstruct the evolution of the interests and the possibility to generate value in terms of requested services.

Another example of big data relates to data acquired by satellites. As of April 2020, there are 2666 satellites in active orbit around our Planet, of which 1440 have commercial purposes and 436 are governmental. There are 339 for military use, 133 civilian and 138 mixed (Space Foundation 2021).

Each of these satellites has a specific purpose, ranging from the maintenance of GPS systems to telephony and direct observation of the Earth. With regard to the latter, it is sufficient to mention some of the most famous government satellites to realize the amount of data generated: the American Landsat, the European ERS and Sentinel acquire every day hundreds of megabytes in their orbits around the planet, generating useful data to monitor the changes taking place on the surface. On a commercial level, we should mention the services offered by QuickBird, IKONOS and WorldView are able to offer customers high and very high resolution satellite images.

2.4 Data Availability and Access: Some Critical Issues

This data revolution (Voghera La Riccia 2018), however, raises some relevant issues that need to be exposed in order to fully understand the meaning of the whole “paradigm” of open access data and open source software. We could summarize these issues in four expressions: ownership, access, usability (or reusability) and privacy.

It seems quite logical to assume that the owner of a given data set is the one who generated it. A private service that, after having obtained the appropriate governmental authorizations with its satellite, acquires images has every right to use them as it sees fit and to eventually obtain a profit. However, the problem from this point of view is represented by the disparity in the possibility to generate and cross data. In this sense, we can use the case of Google as a useful example. The Californian Company is the owner of the most popular search engine in the world: in essence, it has the ability to process millions of queries from its users and cross-reference the information in order to obtain useful information. In 2005, the company launched the Google Maps service, a webgis that covers a large part of the Earth allowing to visualize geographical maps.

Over time, the service has expanded considerably, and today it allows to view not only the basic cartographic layers, but also to build itineraries and to identify stores and other economic activities added by those directly concerned through the Google Places service. The company, strengthened by its leadership position among search engines, has introduced for the user the possibility of evaluating and expressing his own opinion on the places visited, contributing to define the overall image that a given economic activity has for Google users and in fact conditioning the choice of possible future customers of that particular service. Crossing further this information with that made available by the Street View service, it is possible to make available to Google's customers, the merchants in this case, a series of cognitive tools useful to optimize the possibilities offered by the “economic positioning” of their business with access to information generated—for free—by the same Google users.

It is also true, however, that business intelligence, especially if directed at geo-marketing, which also distils information from a “location analysis” is not shared for free but, like all of Google's advanced services, is subject to precise pricing for—in this case—the merchants involved.

The program was further strengthened in 2015 with the creation of the Google Local Guides service, which in exchange for some benefits from the company such as free cloud space for the user, allows the company to obtain additional information about the places visited. In fact, joining the program implies from the user the possibility to answer specific questions about the accessibility of the place, the level of crowding, the services offered.

This ability to acquire user-generated data for free, both unintentional (such as browsing histories) and voluntary (content posted on a social or ratings of a place) puts large tech companies at a significant advantage over potential new competitors and sometimes over governments themselves. Access to the information generated by this data, in fact, is in fact access to proprietary information, given to the provider company for a financial consideration or in exchange for data, information, or a licence to use it (such as photos posted on Instagram).

This modus operandi has generated considerable debate around the issue of privacy, prompting governments and supranational institutions to legislate on the subject. The most important example in Europe is in this regard the General Data Protection Regulation 2016/679 (GDPR),Footnote 5 published in the Official Journal of the European Union on May 4, 2016, and in implementation since May 25, 2018. The regulation harmonizes national regulations, and shifts the focus from a proprietary view of data to one based on user control of the data, encouraging free movement and the right to know the nature and use of personal information in the hands of third parties. The regulation applies to both automatically generated and non-automatically generated data and effectively requires the use of an informed consent form prior to the granting of one’s data.

While the GDPR has attempted to remedy some asymmetries in data generation and transfer—for example, users now have the option to avoid tracking their browsing data for marketing purposes by denying consent to a website's use of cookies—on the other hand, this regulation poses generalized difficulties in the acquisition and cross-referencing of data by even non-profit entities such as universities and research institutes and does not solve the problem of data access.

In a society where the use of data is increasingly fundamental to the development of activities and knowledge of reality, this issue is crucial. The ownership of information and the possibility of accessing it are—and will be even more so—a necessity for those who wish to start or maintain an economic activity, or even more so for study and research purposes, perhaps aimed at supporting administrative action.

If data is therefore seen as something necessary for the development—in fact—of society, it is necessary to remedy the inhomogeneity in distribution and access. Initiatives aimed at improving this aspect are often characterized by the use of the “paradigm” of open data. The Open Knowledge FoundationFootnote 6 has elaborated the open definition:

Open data and content can be freely used, modified, and shared by anyone for any purpose.Footnote 7

The availability of freely accessible data is based on the idea of increasing transparency, releasing social and economic value and increasing community participation and engagement. Among the most useful data in this sense are of course spatial data, i.e. any data that can be associated with information in space. Rather than to ensure open access to spatial data as much as to promote the sharing of a common infrastructure, the European Union has issued the INSPIRE Directive,Footnote 8 which requires member states to “systematize” interchangeable geographic data, metadata and services to facilitate access and reuse. These data are usually derived from satellite acquisitions, performed as seen above via the Copernicus systemFootnote 9 and Sentinel satellites, subsequently reprocessed by the European Space Agency (ESA) and other institutions.

In parallel, many volunteers and non-profit organizations have been engaged in building datasets “from below”, according to a participatory and “active citizenship” paradigm. The most important example is in this sense the OpenStreetMap project.Footnote 10

The idea of open access is not limited only to the data itself, but also concerns the processing capacity. We therefore speak of open source, meaning all free software and algorithms whose source code is freely accessible and modifiable. It goes without saying, in fact, that without the ability to process information without cost (or at least without significant costs), access to the data itself would become almost useless. The community that has built and maintains the QGIS spatial data processing software is committed to this. In this regard, it is worth mentioning the community of volunteers who implement programming languages, a fundamental element both for the development of software and for the performance of more advanced operations of reading and analysis: in this regard, we cite the vast community of R, Python, PostGreSQL and the countless “packages” of code that, when installed, allow you to perform special operations on your GIS software, such as WhiteBox Tools. The “open” community is growing and has greatly influenced the way geospatial data are acquired, processed, analysed and visualized (Coetzee et al. 2020).

This set of factors, the availability of public data and the large community of volunteers who carry out “open” projects, together with the improvement of computational performance at lower costs, has greatly expanded the ability to analyse the territory not only by experts and academics but by anyone who is “familiar” with these computer tools. At this point it is mandatory or, more simply, likely to incur costs if you choose to use proprietary software, to use an external technical support or need a computational power not available, thus having to resort through cloud computing services to external service providers.

The question now is whether the available data alone are sufficient to meet the needs of increasingly accurate analysis and studies of the territory, fundamental in planning if we decide to adopt a contemporary paradigm aimed at developing the resilience of a territory. Are we really at that point? Are we at an optimal level that can be improved, or are we even at a guard level below which analyses risk losing quality, forcing us to rely on proprietary platforms, data and services?

To be able to work in this sense, in fact, or rather to ensure that territories are planned in such a way as to be aware of their vulnerabilities, ready to deal with risk and adapt to change, it is essential to have an increasingly in-depth knowledge of phenomena at various scales, their relationships and how they change over time. It is therefore necessary to be able to build models of the territory, where multiple information of various derivations can be crossed to appreciate the complexity and the network of relationships. The ability to account for multiple elements in parallel is, for example, a feature of modern multivariate analysis, but this methodology needs to rest on robust, large and reliable data systems.

3 Availability and Access to GIS-Based Data on the Metropolitan City of Turin

One of the main tasks of spatial analysis for planning is to provide data and information on the territory under study and planning. Data retrieval and the construction of information and interpretation plays a fundamental role in the process of interpreting the territory itself, which allows the implementation of robust and place-based strategic or planning tools.

If the intention is to adopt a territorial resilience approach, it is also necessary to use a multidisciplinary approach that responds to the technical challenge for building new knowledge tools that can support the development of strategies, plans and actions aimed at reducing local vulnerabilities (Beltramino et al. 2022). The arrival of the COVID-19 pandemic has also opened a window of reflection on what data can be further sought.

Trying to respond to these stimuli and trying to prioritize free access to data, we have hypothesized a study of the metropolitan area north of Turin by searching among the databases available data that were useful in order to provide a valid analytical support to the objective.

3.1 Satellite Images

A first element to check availability is related to satellite images in raster format. As pointed out in the first part of this paper, many public bodies provide good quality images useful for carrying out a series of supporting analyses. In particular, it is possible to download images from the Copernicus portal from Sentinel satellites, USGS and European Space Imaging search engines. Other satellite images, available for a fee, are made available by the Italian Military Geographical Institute (IGM) and commercial companies. The usefulness of satellite photo analysis can be found for the calculation of some important indicators on soil consumption, soil slopes or physical phenomena (erosion, landslides, etc.).

3.2 Physical and Basic Cartography

We could define the basic cartography as the set of elements necessary to provide an immediate “eye” on a given territory: type of land, level curves, buildings, roads and other elements. As far as Piemonte is concerned, the data source is represented by BDTRE, distributed under Creative Commons—BY 2.5 licence, in progressive replacement with version 4.0. Built in implementation of the 2007 INSPIRE Directive, with the Regional Law of 5 February 2014, it has become the reference cartographic base for all public and private entities that interface with the body itself. BDTRE offers numerous information layers in the traditional shapefile format and in the new geodatabase.

Another unofficial data source is the OpenStreetMap project. Other basic cartography can be found through the national geoportal and other sites of research centres and governmental institutes.

3.3 Demographic Data

The analysis of demographic data is usually carried out in Italy on the basis of data provided by the censuses of the Italian National Institute of Statistics (ISTAT),Footnote 11 freely accessible and downloadable from the portal of the Institute under licence CC BY 3.0 IT: at the moment, data sets of territorial bases and census variables of population censuses and industry and services are available. The censuses available in this way are those of 2011, 2001 and 1991 and the minimum territorial base is represented by the census section. It has been noted that in some cases, it is difficult to extract information from ISTAT demographic data due to inconsistencies between the spatial bases in the form of shapefiles and those tabular with the socio-economic information, and it is noted that the ten-year period of the censuses can make the data old compared to the type of analysis for which they are needed. In this sense, the entry into operation of the new permanent census could be of great importance in order to improve the quality and quantity of studies carried out.

3.4 Vehicular Flows

Data on vehicle flows have been made possible for years by direct measurements at the entrance and exit of a given road trunk. With the spread of the “black boxes”, the GPS devices used to reconstruct the dynamics of possible accidents and on which to calibrate the rates of car insurance and other tracking devices the monitoring of this phenomenon has become easier, but the data is not immediately accessible.

In Piedmont, this kind of information is in the possession of 5 T,Footnote 12 an in-house company with an entirely public participation.

3.5 Soft Mobility

Researching data on gentle mobility, understood as the set of journeys made on foot, by bicycle or by other physical traction vehicles, can be of great interest as much as it is difficult to succeed. First of all, how do you measure these movements? As for those on foot or by bicycle, the answer comes from the world of sport, and from the widespread spread of Smartwatch, Running Watch, Cyclocomputer and similar or directly from popular smartphones. These devices have been accompanied by applications for several years and after registration by the user provide both a tracking of their movements free of charge—provided by the wearable device and then downloaded to the app—or directly recorded by the app installed on the phone, both statistics on the type of activities such as metres travelled, average speed, an estimate of the watts produced and so on. It is also possible for you to share this information on specialized social networks, such as Strava.Footnote 13

The set of user data, as easily imagined, goes to build huge datasets of aggregated information that allows to obtain information about lifestyles, habits, movements. In order to support spatial planning, access to these data sets would be of particular importance in order to obtain information on, for example, the use of green areas, the use of cycle paths or the finding of routes and routes that exist de facto but have not been established by the public authority.

These data sets are privately owned, but there are some services that allow them to be accessed by professionals and researchers. In particular, there are two Strava instruments, the Strava Global Heatmap and Strava Metro. Strava Global Heatmap is a webgis that represents the “travel density” of a given path by users who upload their activities to the social network. In extreme synthesis, it consists of an OpenStreetMap cartographic base, which is gradually replacing Google Maps, and with a “style” of terrain rendered through MapBox, on which appear in aggregate form the activities of users, thematized by intensity of colour. The more intense the colour, the more that stretch was crossed by Strava users. Strava’s Heatmap is used by OpenStreetMap users to improve their service and is available in JOSM by loading a TMS level.

However, the Heatmap does not allow any other analysis, even with the help of colour-based raster analysis tools, so the use of Strava Metro is necessary. Metro is a service born in 2016 as a support tool for city planners and other experts, and provides data sets that can be used for the study of flows, road safety and project evaluation. However, access is subject to a request from the company, with long times and undiscounted outcomes.

Other systems for estimating pedestrian flows are based on the estimation of the number of movements in relation to the areas of attendance (green areas, administrative offices, etc.).

3.6 Energy Consumption

The analysis of energy consumption can be considered an important indicator useful to outline strategies for the territory and the city such as refitting, plant modernization and strengthening self-production. A rapid research on Piedmont on the website of the Agenzia per l’Italia DigitaleFootnote 14 shows little data, mainly related to the electricity consumption of school buildings in the Metropolitan City of Turin and the consumption of gas on a municipal basis.

The consumption data relating to users are the property of the companies that manage the contracts, such as IRENFootnote 15 in Turin. The data, once requested and obtained, are provided in the form of.csv with annual and monthly consumption. The rows are not georeferenced but associated with an address and require geocoding operations to obtain a precise shapefile that allows them to be used in GIS environment.

3.7 Risk and Hazards

A large number of public and open access datasets are available for risk and hazard analysis. For example, it is possible to estimate the fire risk by calculating the IPSI indicator that crosses land cover data (e.g. vegetation types), the slope of the land and the construction characteristics of buildings. If an on-site investigation may be required on the last point, it is also true that the remaining data—DTM, soil cover maps such as Corine Land Cover—are easily accessible. The flood hazard can be estimated with the ranges made available by Hydrogeological Asset Plans, as well as the danger posed by landslides. Other interesting data sets can be found on portals such as ARPAFootnote 16 for air quality monitoring.

3.8 Health Data

The search for health information is particularly difficult, with data that is in public hands but difficult to access. A practical example is access to data related to the spread of COVID-19 infections. Available and publicly accessible data can be accessed on a municipal scale, making any kind of analysis complex at a more detailed scale. The problem in this case is mainly related to privacy and to the set of rules that protect the individual.

3.9 Presence of Services

Defining the word “services” is complex, but in general they are quite simple to find through the regional geoportal all those elements related to the public: schools, parks, but also churches, sports facilities, monuments, hospitals and sanitary facilities. These data can normally be identified within BDTRE, or with the discharge of other shapefiles from regional geoportals or metropolitan level ones.

However, if we use the term “services” in a broader sense, including, for example, commercial activities and merchants or non-profit organizations, the theme becomes more complex.

In this case, the resources made available by open data and commercial projects are supported. The aforementioned OpenStreetMap is able to provide data—built by users—on commercial activities, classifying them according to the type of service offered. If you work in QGIS environment, it can be of great help to instal the OSM plugin that with a simple query allows you to download entire levels of open data for the territory you are studying.

Among the commercial services it is worth mentioning Google Maps, which as already mentioned above is also able to provide user-generated information to correlate searches on its search engine. Maps information can be accessed for a fee via the Google Maps Platform service from Google Cloud.

4 Conclusions

The development of a comprehensive city-wide model, which integrates data from various sources, to provide valuable information supporting initial planning does not necessarily require the use of proprietary or restricted data.

The integration of basic, flow and risk information into a GIS database would make it possible to build an effective decision support system and should be seen as the arrival point for the development of new analysis methodologies.

However, the possibility of accessing the data should be further implemented, making it easier to consult and download data of public origin at least for research or support to public administrations. The sectorization characteristic of many sectors of public administration therefore also affects the integration of data, thus conflicting with the very aims of planning that seeks to operate according to multilevel and multi-sector logic.

Integration in this regard should be pursued by strengthening the infrastructure that seeks to link datasets of different origins, ensuring its publicity and the widespread adoption of the open access paradigm with rights allocation that allows the re-use of data also for commercial purposes. The dissemination of integrated public infrastructures, which provide “free” data, could be an element of interest not only for the study and monitoring of the territory but also for the development of new economic opportunities, bypassing in a certain sense the incoming costs represented by the need to use data but to be able to access it only for a fee.

From an almost “futuristic” perspective, the dissemination of real Digital Twins, digital city models that can provide updated and freely accessible data on multiple levels of authorisation would be desirable. Thinking from this perspective could be the only way to avoid the risk of data monopoly by other actors, with consequences for example on competition. In the background, a serious issue related to this model is related to privacy, and it will increasingly be necessary to develop new models capable of reaching a reasonable compromise between the need to acquire and access data and the protection of the individual. In this sense, a strong public role in the process of data acquisition itself could be an element of greater stability and reliability.