Keywords

1 Preliminary Discussion

In a currently on-going project we are investigating the extraction, enrichment, publication and reuse of Linked Open Data (LOD) [1, 2] for the Municipality of Catania (MoC), Italy, by means of the application of latest semantic technologies and software components [3]. The main motivation of the work consists of experimenting social eGovernment systems aimed at optimizing the performance of the Public Administration (PA) of the MoC for the provision of intelligent ICT services to citizens and businesses, supporting the external evaluation of the PA by the detection of the community trust. The work falls within the spirit of the Smart Cities initiatives of the European Commission, which aims at bringing together cities, industry and citizens to improve urban life through more sustainable integrated solutions. Although the methodology has been designed for the case study of the city hall of Catania, the approach is completely generalizable and can be replicated to any PA worldwide. One of the main development objectives of the project consists in conceiving, designing and prototyping applications for the MoC related to certain areas of experimentation, such as online social services and health, traffic management and transport, and urban decor. With the aim of detecting and collecting the required data and processes for these applications, meetings with the Leadership of the Directorate of Information Systems Service of the MoC were carried out.

A particular field of experimentation is specially focused on the management of mobility, i.e. road traffic and public transport. Within this context, the scenario has identified the development of a prototype mobile application implementing a real-time system to inform on the state of roads in urban areas to support sustainable mobility and, in particular, to aid the response to urban emergencies, from small scale accidents to more serious disasters. The system aims at connecting drivers to one another, helping people create local driving communities that work together to improve the quality of everyone’s daily driving. That might mean helping them avoid the frustration of sitting in traffic, advising them on unexpected accidents or other traps, or just shaving five minutes off of their regular commute by showing them new routes they never even knew about. But most importantly, the application may have any extremely important role on emergency logistics. Response to an emergency incident requires careful planning and professional execution of plans, when and if an emergency occurs [4]. During these events there is the need to find rapidly the nearest hospitals, or to obtain the best way outs from the emergency zones, or to produce the optimal path connecting two suburbs for redirecting the road traffic, etc. Technically, this system should be able to locate the best path between source and destination not only in a static environment, but particularly in a dynamic one. That is, the user feedback serves at placing in the map some obstacles, or inaccessible zones, coming from accidents or emergency events, and the system responds in real-time producing the optimal path without these forbidden zones. After typing in their destination address, users just drive with the application open on their phone to passively contribute traffic and other road data, but they can also take a more active role by sharing road reports on accidents, advising on unexpected traps, or any other hazards along the way, helping to give other users in the area real-time information about what’s to come [5]. For the realization of the app for our case study, it is necessary to process the data and diagrams in the Geographic Information System of the MoC, referred to as SIT: “Sistema Informativo Territoriale” [6]. Therefore it was decided, by mutual agreement with the chief officers and experts of the city hall of Catania, to process the data in order to make them open, interoperable and compatible with the principles of Linked Open Data.

The paper is structured as follows. Background on the state of the art on the use of LOD for PA, often referred to as Linked eGovernment Data [7], is reported in Sect. 2. Techniques and tools used to deal with LOD for the MoC are introduced in Sect. 3, while the extracted ontology is described in Sect. 3.3, along with the means used to consume the accessible data. Section 4 ends the paper with conclusions and the future research where we are directed.

2 Linked eGovernment Data

LOD are currently bootstrapping the Web of Data by converting into RDF and publishing existing datasets available to the general public under open licenses [1, 2]. LOD offers the possibility of using data across domains or organisations for purposes like statistics, analysis, maps and publications. These major changes in technology and society are involving also the way of doing politics, administration and the relationship between politicians, public servants and citizens. Transparency, participation and collaboration are the main issues of the integration of citizens in the paradigm of Open Government [8]. Because PAs have large amounts of data which could be made accessible for the purpose of the LOD movement, research on the opening process, data reengineering, linking, formalisation and consumption is of primary interest [9].

The Digital Administration Code incorporates a wide range of best-practices in the usage of Linked eGovernment Data, which can be synthetized as: portals for the supply of the Linked eGovernment Data sets; portals providing raw data sets of LOD for PAs along with technical tools or developer kits for understanding, interpreting, or processing the provided data; existing portals acting as showrooms for best practices for Linked eGovernment Data; mobile apps for smartphones using LOD for PAs [7].

The main thrust on the publication of LOD for PA is coming from big initiatives in the United States (data.gov) [10, 11] and the United Kingdom (data.gov.uk) [12], both providing thousands of raw sets of LOD within their portals, but there are also some other experiences and notable initiatives that are in line with the international state of the art. In Germany, one of the first examples for a LOD portal is the one from the state of Baden-Württemberg (opendata.service-bw.de), divided into three main parts: LOD, applications, and tools. In addition to their potentials, Linked eGovernment Data can provide great benefits in the matter of accountability, as shown in the LOD portal example of Kenya (opendata.go.ke).

In addition, LOD have been published in Italy by the city hall of FlorenceFootnote 1, Agency for Digital ItalyFootnote 2, from the Piedmont regionFootnote 3, the Chamber of DeputiesFootnote 4. Beside these initiatives, another notable for the Italian PA is “data.cnr.it” [13, 14], the open data project of the National Research Council (CNR), designed and maintained by the Semantic Technology Laboratory of ISTC-CNR, and shared with the unit Information Systems Office of CNR.

3 Extraction of Linked eGovernment Data for the MoC

In this section we present the methodology used for the extraction and publication of LOD for the Municipality of Catania. The methods are based on the standards of the W3CFootnote 5, on good international practices, on the guidelines issued by the Agency for Digital Italy [15, 16] and those by the Italian Index of Public AdministrationFootnote 6, as well as on the in-depth experience of the research participants on this field, in particular related to the development of the “data.cnr.it” [13, 14] portal.

3.1 Scenario Analysis

During the phase of selection of the source data, a thorough analysis of the reference domain was made. Thanks to the close interaction with the PA experts of the MoC, the Geographic Information System, SIT [6], was identified as the source dataset for the enrichment and publication of data. The SIT is a data warehouse used for reporting and data analysis, and consisting of databases, hardware, software, and technicians, which manages, develops and integrates information of the province of Catania based on a geographical space [6]. The various territorial levels (hydrography, topography, buildings, infrastructure, technological networks, administrative boundaries and land, ...) form the geo-localised common part of the information flow of the MoC, according to which all the constituent parts are related to each other.

The SIT is designed to contain all the available data of the PA in Catania for the purpose of in-depth knowledge of the local area. Basically tit contains three types of data: register base, registry office, and toponymy, provided in the form of Shape-based files [17] for each data record, i.e. files with extensions: .dbf, .shp, .shx, .sbn, .sbx, .xml. Through the consultation platform on the web it is possible to display the following information: basic cartography; ortho-photos; road graph; buildings with a breakdown by main body of some areas of the city; cadastral sections; data from the 1991 and 2001 census of the population; last Master Plan; gas network on-going works; resident population in selected areas (municipalities, entire street, polygonal, circular area); total population, distributed into bow street, house number, etc.; breakdown of the population by municipality, blocks, nationality, gender, family components, age, marital status, etc.; extraction and search of resident persons, and their location on the bow streets; competence areas of pharmacies; location and alphanumeric information of: municipality, hospitals, universities, schools, pharmacies, post offices, areas or emergency, public safety, fire departments, public green areas, public community centres, institutions for minors and orphanages. The SIT also includes maps containing geo-referenced information related to: sub-services (electricity-gas-water pipes); data on stoppage areas; occupation stalls; stalls for disadvantaged people; occupation of public land; public transport fleet; management and working state of the fleet; data on lines and stops of public transport; accident traffic data; road signs and markings; maintenance state of roads and sidewalks; management of roadway construction; data of the municipal police; the accounting of the Municipality. Note that the information contained in the SIT are in Italian language, therefore the produced Linked Open Data will be in Italian too (although the whole generation process is completely language-independent).

Fig. 1.
figure 1

Example of a geo-localised entity of “pharmacies”.

3.2 Geo-Data Modelling and Reengineering

To reengineer the dataset according to the target conceptual model we used TabelsFootnote 7, a software tool developed by the research foundation CTIC, which, using the GeoTools librariesFootnote 8, is capable of transforming the information encoded in the shape files into RDF representations. From the shape files supplied for each data record (in particular, the files with extensions .dbf and .shp), Tabels encoded the shape files into RDF triples related to the designed ontology, that it will be described in more detail in Sect. 3.3. On the one hand the characteristics of the table are stored as RDF representation, and, on the other hand, the spatial geometry is modelled on the standard KML representation [18]. At this stage we are mapping to existing vocabularies, in particular NeoGeoFootnote 9, suitable for geo-data. The geometric coordinates in KML are expressed according to the Geodetic reference system Gauss-Boaga (or Rome 40). By means of different conversion tools publicly available on-line (e.g. http://www.ultrasoft3d.net/Conversione_Coordinate.aspx), it is possible to produce the coordinates of latitude, longitude and altitude in meters using the Geodetic system WGS84 [19]. In particular, the application of Tabels to each pair of files, .dbf and .shp, of the data tables is able to produce a set of RDF triples stored in a repository with other geometric resources contained in a public server. For example, from the information stored in the database of the SIT representing an entity of “pharmacies” (Fig. 1), Tabels produces the related RDF triples, shown in Fig. 2, and the file with the geometric KML coordinates (Fig. 3).

Fig. 2.
figure 2

RDF triples produced by Tabels for the example of entity in “pharmacies”.

Fig. 3.
figure 3

KML coordinates produced by Tabels for the example of entity in “pharmacies”.

Tabels is able to import common file formats, such as XLS or CSV, including shape files. Afterwards it generates automatically a transformation program from the input data files. The generated program is able to transform each row of the input data into a new instance of a RDF class ad-hoc. In addition, each value in the column of the input tables is converted into a new triple where the subject is the instance mentioned, the predicate is a property based on the name of the column header, and the object is the value of the column as a rdfs:Literal. It is worth noting that the transformation program automatically generated, is a SPARQL-based script completely customisable by the user. Thus it is possible to change classes, names and associated properties, and then to annotate them appropriately. Once the transformation program is defined, the execution of Tabels generates the corresponding RDF in output, which we make publicly available online through a dedicated SPARQL endpoint. In addition, information regarding each resource object of the ontology data can be obtained through negotiation mechanisms of the content (content negotiation) based on HTTP REST that make them accessible, for example, through a browser or as REST web service. Data consumption is described in more detail in Sect. 3.5.

3.3 Resulting Ontology for the SIT

Starting from the definition of the tables of the SIT, a first version of OWL ontology was developed. This provides classes and properties representing the database entities of the SIT, and is publicly available at the following URI:

http://ontologydesignpatterns.org/ont/prisma/ontology.owl

having the namespace (i.e. the default address of the entities in the ontology):

http://www.ontologydesignpatterns.org/ont/prisma/.

The creation process of this ontology was divided into two main phases and has followed the good practice of formal representation, naming, and semantic assumptions in use in the domain of the Semantic Web and Linked Open Data [15, 16]. In the first phase, the entire structure of the tables was converted into a draft OWL ontology, where each table (i.e. each entity type described by the supplied data) is represented by a class and each field of the table by a data property. This translation was carried out in a fully automatic way from the sources provided in XML format (extension .shp.xml) by means of the use of an XSLT transformation. Note that fields with the same name but belonging to different tables have been provided with distinct properties. For example, the fields “Name” of the tables “Nursing Homes” (“Case Riposo”) and “Pharmacies” (“Farmacie”) have been translated with two different data properties, respectively “Name-of-CATANIA.SDO_NursingHomes” and “Name-of-CATANIA.SDO_Pharmacies”.

From this interim draft ontology and from the available data, a first version of the ontology in OWL was produced. At this stage we have followed the suggestions of the W3C Organization OntologyFootnote 10, a set of guidelines for generating, publishing and consuming LOD for organizational structures. In this respect we have named the graph nodes as URIs and pursued the following principles:

  • The name of all the classes was taken to the singular (e.g., from “Pharmacies” to “Pharmacy”);

  • The names of the data properties were aligned when they were clearly showing the same semantics. For example, the properties

    “Name-of-CATANIA.SDO_NursingHomes” and

    “Name-of-CATANIA.SDO_Pharmacies” ended in the same property “name”, assigned to “NursingHome” and “Pharmacy” as domain or entity class;

  • The data properties that seemed to refer to individuals of other classes, probably having foreign key functions on the data base, were transformed into object properties. For example, the property “MUNI-of-CATANIA.SDO_NursingHomes” became “municipality” in order to connect individuals of class “Nursing Home’ with individuals of class “Municipality”;

  • The data properties having values clearly assigned to some resources were transformed into object properties and their values were reified as individuals of specially created classes.

All changes made to the intermediate draft ontology for the implementation of the first version of the ontology have been documented in the form of SPARQL CONSTRUCT. This allowed us to create a simple script to convert the data extracted by Tabels in order to make them fully compliant with the final expected ontology, produced as output in RDF format.

3.4 Example of Conversion from the Geo-Data to the Final Ontology

In this section we want to focus on the phase of transformation from shape files to the final RDF ontology by reporting an example. Consider as reference the data record “Traffic Lights” (“Semafori”). The SQL schema of this table includes the fields:

  • ObjectID - unique number incremented sequentially;

  • Shape - type Geometry that represents the coordinates defining the geometric characteristics of the entity;

  • Id - Identification number of type Double;

  • name - String type name of the entity;

  • Sde_SDE_se - integer number;

  • Se_ANNO_CAD_DATA - blob representing the date.

Fig. 4.
figure 4

A view on the transformation program used by Tabels to convert the shape files to RDF for the table “Traffic Lights” (“Semafori”).

Fig. 5.
figure 5

Top panel (a): RDF/Turtle produced by the transformation program of Tabels for a single entity of the table “Traffic Lights” (“Semafori”). Bottom panel (b): Corresponding final RDF/Turtle ontology obtained through SPARQL CONSTRUCT conversion to fully match the designed ontology.

Passing the .shp and .dbf files to Tabels, this generates the transformation program, that is the SPARQL-based script used to import the data (see Fig. 4). As already mentioned, it is possible to edit the script to suit custom requirements. Once any change in the transformation program is completed, it is possible to save and run it, which generates the RDF triples from the table data given as input. Figure 5(a) shows the RDF/Turtle produced by Tabels by using the methodology already described for a single “Traffic Light” entity as example. Figure 5(b) shows the corresponding final ontology of this entity obtained by conversion through SPARQL CONSTRUCT of the related data extracted by Tabels, in order to fully match the designed ontology.

This example further shows the ability and simplicity of the proposed methodology to gather the complex structure of a non-structured database, allowing a rapid analysis, retrieval, and conversion of the data into a structured RDF format, and the publication in the form of Linked Open Data.

3.5 Data Consumption

The produced ontology consists of 854,221 triples and can be publicly queried by selecting the RDF graph called \({<}prisma{>}\) on the dedicated SPARQL endpoint accessible at http://wit.istc.cnr.it:8894/sparql. Queries can be made by editing the text area available into the interface for the SPARQL query. The SPARQL endpoint is also accessible as a REST web service, whose synopsis is:

  • URL \(\Rightarrow \) http://wit.istc.cnr.it:8894/sparql

  • Method \(\Rightarrow \) GET

  • Parameters \(\Rightarrow \) query (mandatory)

  • MIME type supported output \(\Rightarrow \) text/html; text/rdf+n3; application/xml; application/json; application/rdf+xml.

Data are also accessible through content negotiation. The reference namespace for the ontology is http://www.ontologydesignpatterns.org/ont/prisma/ which is identified by the prefix prisma-ont. The namespace associated with the data is, instead http://www.ontologydesignpatterns.org/data/prisma/ which is identified by the prefix prisma. These two namespaces allow content negotiation related to the ontology and the associated data. The negotiation can be done either via a web browser (in this case the MIME type of the output is always text/html), or by making HTTP REST requests to one of the two namespaces. The synopsis of the REST requests to the web service associated with the namespace identified by the prefix prisma-ont is the following:

  • URL \(\Rightarrow \) http://www.ontologydesignpatterns.org/ont/prisma/

  • Method \(\Rightarrow \) GET

  • Parameters \(\Rightarrow \) ID of the ontology object (mandatory the PATH parameter)

  • MIME type supported output \(\Rightarrow \) text/html; text/rdf+n3; text/turtle; text/owl-functional; text/owl-manchester; application/owl+xml; application/rdf+xml; application /rdf+json.

Instead, the synopsis of the REST requests to the web service associated with the namespace identified by the prefix prisma is the following:

  • URL \(\Rightarrow \) http://www.ontologydesignpatterns.org/data/prisma/

  • Method \(\Rightarrow \) GET

  • Parameters \(\Rightarrow \) ID of the ontology object (mandatory the PATH parameter)

  • MIME type supported output \(\Rightarrow \) text/html; text/rdf+n3; text/turtle; text/owl-functional; text/owl-manchester; application/owl+xml; application/rdf+xml; application /rdf+json.

4 Conclusion

This paper presents an application of Linked Open Data for PA. The used methodology was implemented by following the standards of the W3C, the good international practices, the guidelines issued by the Agency for Digital Italy and the Italian Index of Public Administration, as well as by the in-depth experience of the research participants in the field. The method was applied to the case study of the PA of the MoC, in particular from their data stored in the Geographic Information System, SIT. By using tools and technologies for the extraction and publication of data, it was possible to produce an ontology of the SIT according to the paradigm of Linked Open Data. The data are publicly accessible to users through queries to a dedicated SPARQL endpoint, or alternatively through calls to dedicate REST web services.

In currently on-going work a mobile application based on this LOD and related to sustainable mobility and emergency vehicle routing is under development and will be released soon. This will support the real-time management of road traffic and public transport, informing citizens on the state of roads in urban areas, in particular during urban emergencies, from small accidents to more serious disasters, and redirecting the road traffic by providing best alternatives routes to find way outs, the nearest hospitals or other locations of interest. The user will be able to contribute traffic and other road data, sharing road reports on accidents, advising on unexpected obstacles or inaccessible zones, or any other hazards along the way, helping to give other users in the area real-time information about what is currently happening. Soon, when the mobile app based on these LOD will be launched, user-centric tests and an experimental evaluation will be object of investigation. Our work is a concrete step supporting the Municipality of Catania to move into the paradigm of Open Government and Linked Data, boosting the metropolis towards the route of a modern Smart City.