Introduction framework of the project

In recent years, an interesting issue in the field of geographical information systems (GIS) at the level of local municipalities has been the one regarding the implementation of large archives of shared cartographic and administrative/land registry data, to be used as the cornerstone for the City GIS.

The archives must contain a multiplicity of data having different nature and being used in such different fields as urban management, government, and planning. Their integration must be realized both internally to the single administration and externally with other administrations, mainly due to the fact that nowadays the data are stored and manipulated at a digital level, using information systems (and database technology). Unfortunately, it frequently happens that the digital archives are not structured according to a DB model, so only simple selections or visualizations can be made. On the contrary, the availability of a system working with an organized DB structure allows to set up elaborate queries, involving many layers. Besides, often the structures of these systems, even when present, are not standardized, making the systems not interoperable.

A fundamental condition to build a system working on a heterogeneous archive of data (where cartographic data have a basic role), with a high level of integration and interoperability with such other systems, is to have data which are not only correct (which can be considered, so to say, as a sort of pre-requisite) but also correctly structured according to the rules of database design and implementation (rules which are widely used in the fields of computer science and data processing), in particular the rules which apply to relational databases.

At another level, we must consider that the concept of cartography has largely and rapidly evolved in the past decade. Cartography has not lost its basic role as a means for transferring geo-referenced knowledge, but it has also acquired new capabilities, exploiting the possibilities offered by recent information technology.

With the rise of automatic and digital cartography at the beginning and now with geographical databases, the world of geographical information has become more and more tied to the world of computer science, from which it derives data archiving structures and representation rules. More and more often, geographic information is only available in digital format, while spatial information in analogic format (i.e., maps drawn on paper) is nowadays rarely used.

Finally, it is worthwhile to remark that the GIS issue can no longer be disregarded in Regione Lombardia by local authorities, since cartographic production in the form of DBT (topologically structured GIS database) and realization of GIS for the towns within the territory of the region are explicitly requested in accordance with law no. L.R. 12/2005 (Section 3), where the new tools to be used for the coordination and integration of digital information in support of the knowledge of the territory and of land management and planning are described.

Besides, Regione Lombardia intends to stimulate the production of DBT by co-financing the town administrations that decide to carry on such projects, provided that specific criteria and conditions are fulfilled, and that the project is submitted to Regione Lombardia for approval.

The above cited Regione Lombardia bylaws and financing policy encouraged the local authorities to take charge of issues related to DBT and GIS, and in particular to attend to the making of a GIS in case they are not already equipped with one, and to the updating of the cartography of their territory at the scale of 1:2,000/1:5,000, in case the existing cartography is no longer up-to-date (Carrion et al. 2007a).

In the production of DBT and GIS, several issues must be faced. Among them, particular care must be devoted to the following ones:

  • the self-consistency of cartographic data and, where needed, the transformation of spatial data in the UTM WGS84 reference system (which has been recently defined as a standard in Italy);

  • the integration in a unique database of information of different types and origins, such as cartographic data and data provided by administrative and cadastral services;

  • the interoperability with other GIS, meaning the capability of different systems to share data for different applications.

Designing a relational database: conceptual and logical model

In order to proceed in a sound way in the making of a GIS (like in the making of any other information system), particular care must be paid to the design of the database, which will be, so to say, the foundations of the whole system.

And as far as the structure of the database is involved, this must follow the rules stated by the theory of database designing, extensively accounted for and commented in the literature (see, e.g., Atzeni et al. 1999).

In designing (or “modeling”) a database, one out of several different models may be chosen; however, the one which is nowadays most widely used in the field of GIS is the relational model of data (Codd 1970; Chen 1976).

The idea of the relational model was introduced for the first time by Edgar Frank “Ted” Codd (1923–2003), who developed it at the beginning of the 1970s while working as a computer scientist for the IBM Research Laboratory in San Jose (California). This model is based on the concept of “relationship” between the different pieces of information archived in the database.

The relational model allows for a remarkable “flexibility” from the point of view of the database users, while providing “a means of describing data with its natural structure onlyFootnote 1—that is without superimposing any additional structure for machine representation purposes” (Codd 1970). The connections between the data are not fixed in a permanent way by the database designer, but may be suitably exploited for different kinds of data analysis and queries.

The relational model also provides a basis for reaching the independence of data from their machine representation and organization, which is a desirable quality for users of large shared databases.

Four subsequent steps identify the different levels of information modeling in a database (Jones 1997; Bertino et al. 1997), and they are usually followed in database designing:

  • the external model is used to describe (usually, in natural language) the relevant information to be archived in the database of the information system;

  • the conceptual model presents a concise, schematic, and univocal description of the organization of the data to be archived in the database;

  • the logical model is used to define the structure of the “relations” (i.e., the tables) which will contain the data;

  • the internal (or physical) model contains the description of information in terms of computer storage units (e.g., bytes, blocks, addresses).

In this paper, we will deal in particular with the two central steps, which have a fundamental role in GIS database designing. Properly completing these steps allows for a data management software (DBMS, Database Management System) to manipulate and retrieve data from the archive and as a result the database shows the following important characteristics:

  • minimum data redundancy;

  • controlled access of data;

  • easy organization of data, from the point of view of the system users;

  • data security;

  • accuracy, consistency, reliability of data.

In other words, we may say that data integrity is the goal which is pursued in this process, especially when a GIS environment is being built up: GIS data are derived from many different sources and applied to numerous activities.

Coming to the conceptual model, we have already defined it as a concise, schematic, and univocal description of the data structure. In case that a relational database is chosen as the model for database designing, the description is usually achieved by using a well-known model, namely the Entity Relationship Model, ERM (or Entity Relationship Diagram, ERD), which is particularly suited to represent the database content, data types, and relations (connections) between data. The Entity Relationship Model was proposed for the first time in 1976 by Peter P. Chen of the Massachusetts Institute of Technology (Chen 1976).

Regarding the logical model, it describes the structure and content of the tables (which are called “relations” in a relational database) in which the data will be archived. In actual fact, it represents the transformation of the conceptual model by means of mathematical tools, which gives rise to a model which is computer oriented and allows for automatic data manipulation.

Of course, depending on the chosen conceptual model, the database designer is oriented towards a particular choice for the logical model. In case of a relational database, the ERM is used as a basis to arrange a suitable set of “keys” (or codes) to implement the connections (relations) among the data of the archive.

Finally, it must be remarked that, although applications involving spatial data represent a particular case of database, due to the presence of geo-referenced information, the modeling of these data should follow the basic rules of database design (see for example Worboys et al. 1990).

A database for the GIS of a consortium of towns in Regione Lombardia

The database model that we present in this paper was designed for the GIS of the CAAM, a consortiumFootnote 2 of towns in Regione Lombardia.

The CAAM GIS, started in 1997, represents a valuable tool supporting the consortium daily activities (particularly the activities carried on by the Department for the Promotion of the Territory and by the Unified Desk for Productive Activities). Of course, it proves fundamental especially in all the activities in which cartographic data are involved, such as in the analysis of special projects in which the consortium is involved (an example is the project regarding the development of the cellular network for mobile phones in the CAAM territory).

Finally, the CAAM GIS represents a considerable economic and strategic asset, due to its large archive of data and to the different applications that it supports. As a consequence, during the years, the Consortium considered fundamental to invest time and resources to maintain, improve, and update the system, in order to make its use more and more efficient.

However, it must be remarked that in the past years the updating activities were carried on in an inhomogeneous way, mainly depending on the fact that not all the towns in the consortium had their own digital cartography; besides, all the towns’ data were organized in different ways and formats and they had been collected with different instruments and techniques (this happened for example for the “Piano Regolatore Generale”, PRG, that is the Town Master Plan of the single Town). In 1999, some towns already had available their land-planning instruments in digital format, but many still worked with Town Master Plans drawn in a traditional way (that is, manually). As a consequence, in some cases, the data for the GIS digital archive had only to be acquired and subsequently made internally consistent, while in most cases the Town Master Plans had to be at least partly digitized (which operation was performed by external companies, with no real possibility of checks on the result of the work).

The first analysis of the state of the system showed that all the above described operations aiming at data acquisition really had as a result the making of a conspicuous but not structured archive of digital data, with about 450 files in vector format and 150 files in raster format. Most of the raster data were represented by tiles of:

  • the Regional Base Map at the scales of 1:10,000 and 1:50,000,

  • photogrammetric surveys,

  • orthophotos,

  • the “Piano Territoriale di Coordinamento Provinciale” (Province Master Plan).

Vector data were represented by a large number of layers including:

  • photogrammetric surveys of the territory,

  • data on discarded industrial areas,

  • data on empty areas,

  • data on areas subject to the CAAM geo-marketing policy.

However, this large digital set of data had not been designed and organized according to the typical rules of DBMS: the data (both spatial and non spatial) of the CAAM GIS were (so to say) simply stored and accumulated in time (see Fig. 1 for a graphic idea of this concept), depending on the momentary needs of the consortium, thus making up a stratified archive which failed to satisfy complex queries of the users.

Fig. 1
figure 1

The pre-existing CAAM digital archive was a simple collection of a large amount of layers

No relationships had been defined between data; neither definite procedures had been fixed for data storage or updating. Besides, metadata were not present.

The conceptual model for the CAAM database (re-organization of the existing database)

Considering the situation described above, the existing CAAM digital data archive first had to be re-organized (Carrion et al. 2007b). This was done by designing a database according to the rules of the relational model of data, for which a few “basic” vector entities were defined. They are:

  • Town,

  • Land parcel,

  • Old CAAM Building,

  • Zone of Town Master Plan,

  • Area subject to geo-marketing policy,

  • Area depending on the Unified Desk for Productive Activities,

  • Document produced by the Unified Desk for Productive Activities.

Regarding the Old CAAM Building, it represents the geometric entity corresponding to thematic and cadastral data stored in the pre-existing digital archive.

The conceptual model was designed in the form of an Entity Relationship Diagram, and a primary key was defined as a unique identifier for each entity.

Relationships were defined between couples of entities (Land parcel, Zone of Town Master Plan, Area depending on the Unified Desk for Productive Activities, Document produced by the Unified Desk for Productive Activities, and Old CAAM Building), all with 1:N mapping.

Another relationship introduced in the model (and representing a connection between data which was not implemented in the old data archive) is the one between Document produced by the Unified Desk for Productive Activities and Land parcel.

Figure 2 shows the ERD for the re-organized relational database, with primary keys and mappings of the relationships.

Fig. 2
figure 2

ERD (conceptual model) for the CAAM GIS database re-organization; for each entity, the primary key is reported; mapping values of the relationships are also reported

The conceptual model for the entities of the DBT (new data acquisition)

The on-going production of the DBT, promoted by Regione Lombardia, will cover the territory of nearly all the towns participating in the consortium (plus a much larger surrounding area) and will result in the new acquisition or updating of a remarkable amount of cartographic information, for which the surveying and restitution operations are being performed in compliance with the technical specifications issued by “IntesaGIS”Footnote 3. Then, from the cartographic point of view, the geographic data will be homogeneous and up to date.

The new spatial digital information will in any way substitute the corresponding spatial information which has been so far archived in the CAAM database. A very large number of data layers will be produced in the course of the DBT acquisition: among them, “basic” entities for the purposes of the CAAM GIS have been identified:

  • Building volume unit,

  • New DBT Building,

  • Road element,

  • External access to building,

  • Internal access to building.

The New DBT Building entity represents the geometric and thematic data which will be provided to the CAAM Consortium after the new survey which should be completed by the end of 2009.

Exploiting again the relational database design rules and the ERD formalism, the conceptual model was designed. It is shown in Fig. 3, where also the primary keys and the relationship mappings between the entities are reported.

Fig. 3
figure 3

ERD (conceptual model) for the new DBT; for each entity, the primary key is reported; mapping values of the relationships are also reported

Conceptual and logical model for the integrated CAAM database

The ERD describing the integration of the newly acquired DBT data with the CAAM existing data is shown in Fig. 4. It incorporates all the basic entities above listed, that is: Town, Land parcel, Old CAAM Building, Zone of Town Master Plan, Area subject to geo-marketing policy, Area depending on the Unified Desk, Document produced by the Unified Desk for Productive Activities, Building volume unit, New DBT Building, Road element, External access to building, Internal access to building, plus an entity which will be used to structure a table which simply contains the name of each road element.

Fig. 4
figure 4

ERD (conceptual model) for the integration of the CAAM GIS database with the new DBT; for each entity the primary key is reported; mapping values of the relationships are also reported

As one can see from the figure, the connection (that is the integration) between the two parts of the database (the new DBT and the existing CAAM database) is represented by the relationship between the entities Land parcel (existing CAAM database) and New DBT Building. In this way, a unique database is obtained, and thus a unique GIS.

One could wonder why in the integrated database there are two entities (namely Old CAAM Building and New DBT Building) which represent the same real-world object. In fact, this is true from the geometric point of view, but thematic data are different and also come from different sources (Cadastre and Base Map). In any case, this redundancy is only to be maintained during the transitory period before the new integrated DB will be operating, when only the newest data will be kept in the system.

Finally, in Fig. 5, the logical model of the integrated database is reported, according to a formalism that helps to understand in a schematic and intuitive way the connections between the entities contained in the information system.

Fig. 5
figure 5

Logical model for the integration of the CAAM GIS database with the new DBT

To complete the description of the integration of the data in a unique database, some remarks are in order, regarding the data format which was used in the project.

For the CAAM database re-organization, the file format (meaning the ESRI® shapefile format in our case) has been used, that is a structure where the GIS data layers are organized in files and directories. This is quite a simple solution, and allows to obtain a good functioning of the system, especially if the users work with local applications (software and data residing on a unique computer).

However, a set of data organized in such a file format way may result not practical when several users must have access to the same dataset for different applications.

Usually, in this case, it is possible to duplicate the dataset so that each user operates on a particular copy of the dataset, or alternatively, when the files containing spatial data are very large, they may be subdivided in smaller datasets, each of them covering a smaller geographical area. These solutions present limits and drawbacks.

To overcome this kind of problems, for the integration of the CAAM database with the new DBT data, the ESRI® GeoDataBase data format implemented in ArcGIS was used. This format, not envisaging a file-based data structure, but archiving data in a unique central database (each archived object being an element of such a central storage area), offers remarkable advantages: first of all from the point of view of interoperability, but also for all the operations regarding specific data elements. In fact, all the components of GIS information (spatial and thematic-alphanumeric data) are integrated and, so to say, “kept together” in the GeoDataBase format.

Results of the project and future developments

Applying the rules for relational database design to the re-organization and enlargement of the CAAM GIS data archive allowed to obtain an integrated database, containing consistent data which can be used in an automatic and efficient way to perform selections and answer to queries on data representing both spatial and alphanumeric (thematic) information.

The database implementation has been performed in the ESRI® ArcGIS environment, exploiting the GeoDataBase data format.

The project regarding the CAAM spatial database re-organization and integration with the new DBT data has been completed for the whole territory of the Municipality of Cesano Maderno (one of the 12 towns participating in the CAAM consortium). This operation required to transform the Cesano Maderno Land Registry data from CAD to GIS format. In the future, we expect that the Land Registry data will be directly acquired in a DBT format or in a format easily convertible in DBT, such as the cxf format.

The correctness of the implementation of the model designed for the integration of the CAAM GIS archive with the DBT data was verified on a test area belonging to the territory of the Municipality of Cesano Maderno, with a size of 0.5 km2 (being the 5% of the whole territory).

At this point, the database of the CAAM GIS can be queried and used in an integrated way, allowing to perform complex queries involving different layers. This represents a helpful tool which can be exploited in several fields of interest, not only for the purposes of the consortium or of the offices of the single municipalities. Examples of possible applications are listed below:

  • analysis of business or industrial settlement opportunities (which is an activity carried on by the geo-marketing division of CAAM):

    • selection of industrial buildings and streets laying within a given distance from a main road (see Fig. 6 for an example);

    • selection of the roads on which such buildings are located;

    • identification of the zones of the Town Master Plan on which such buildings are located (see Fig. 7 for an example);

    Fig. 6
    figure 6

    An example of a spatial selection on the CAAM integrated database: selection (cyan blue) of the industrial buildings and streets laying within 200 m from a chosen element of the street graph (yellow)

    Fig. 7
    figure 7

    An example of a spatial query on the CAAM integrated database: visualization of the zones of the Town Master Plan on which the building selected in Fig. 6 are located (the letters represent the abbreviations of the zones allocation for use)

  • 3D building visualization for analysis and simulations (for example, impact of new buildings or new mobile phone antennas);

  • management of spatial information related to the realization of major infrastructures (such as, for example, the “Autostrada Pedemontana”):

    • spatial location of road yards;

    • monitoring of the work progress;

    • network analysis on the road graph, for the search of optimal routes when local traffic needs to be deviated due to the works.

All the analyses listed above can be performed through queries, which the system can answer thanks to the presence of the foreign keys in the relational tables, as it is shown in the logical model (see again Fig. 5).

Moreover, exploiting provincial and regional spatial databases which could be integrated with the CAAM database, it will be possible to perform data analysis on a larger territorial scale.

Of course, in order to obtain optimal results and to be able to perform data analysis on the whole territory covered by CAAM, it will be necessary to extend the re-organization of the CAAM spatial database to all the municipalities participating in the consortium (this activity is nowadays being carried on), and afterwards to integrate this database with the DBT being produced at the moment. In this way, a complete and updated spatial information system will be obtained for the activities of the consortium and for all its territory. Besides, each municipality will have access to the portion of database containing data regarding their specific territory and administration: a system to control data access will have to be implemented.

Another problem which has been spotted but not yet completely solved regards the discontinuity of the territory which is being surveyed for the production of the DBT: not all the towns which take part in the consortium have already started a project for the production of the updated DBT, applying to Regione Lombardia for co-financing, so they still use maps coming from different surveying and restitution processes. This means that the cartographic data of these towns will in any case not be consistent with the data of the new DBT (at least from the point of view of the epoch of data updating, and possibly also of data formats and consistency): checks will have to be carried out on “old data” in order to find out their real level of accuracy and updating before integrating them in the consortium database.

Conclusions

The rules of database modeling have been applied to the design of a GIS for urban data management for a consortium of towns in the province of Milano. This kind of approach focuses on the data, as the main and most valuable component of the information system. This means that, however important are the decisions about the technological components (hardware and software), the analysis of the requirements of the system must basically lead to a careful design of its database, according to computer science literature on the subject. In fact, it is obviously the database which will be used to support the applications requested by the users.

In the specific case presented here, the “database-oriented approach” has been fruitfully applied to an existing GIS data archive which had been previously formed just collecting data, but not studying the structure according to which they had to be organized. Consequently, a database structure (to be supported by a proper DBMS) has been now created which enables to perform data queries and selections according to the rules of relational databases.

An additional problem which has been faced and solved is the integration of the existing data with the newly acquired ones (meaning new entities to be included in the database): the result has been a unique structure, in which “old” and “new” entities can live together thanks to the use of primary and external keys which implement the relationships between entities.

Finally, we like to remark that the particular (however matter-of-fact) feature of the data modeling problem studied here is represented by the fact that the conceptual and logical models (being the main steps in database modeling) involve geo-referenced entities, which is what makes a GIS peculiar with respect to any other information system and the structures of most GIS data not so simple to manage.

The positive results obtained by applying the described GIS design procedure encourage us to exploit this approach in the development of future projects regarding GIS matters.