Advancements in GIS map copyright protection schemes - a critical review
Dramatic advancements in Geographic Information Systems (GIS) and computer technologies resulted in a wide availability of rich-knowledge GIS digital vector maps that are easily accessible and downloadable through the world wide web. This led to the need of copyright protection systems to protect the rights of the producers of these maps. Unlike the research on copyright protection for raster/image data types, the research for vector map data is less documented. This article surveys and classifies the GIS vector map copyright protection research papers published between 2000 and 2014, towards a thorough understanding of the-state-of-the-art, addressing significant limitations of previous review articles, and outlining effective recommendations for future research directions.
KeywordsGeographic information system (GIS)WatermarkingVector dataDigital mapCopyright protection
Developments in computer technologies and Geographic Information Systems (GIS), i.e. computer-based systems for managing and displaying locational data related to positions on Earths surface , increased the amount of digital vector maps that are available on the world wide web. GIS vector maps are highly accurate, they document attribute and topological information through the use of geometrical shapes, and are more compact in size compared with GIS raster maps such as satellite images. While these properties make GIS vector maps of high quality, their complexity and level of detail also means that they incur a high production cost.1
GIS vector maps are widely used in environmental, social and economic applications such as disaster management, navigation, infrastructure and utilities allocation, and business planning. They are also used in military/security applications. Due to the value of these maps, their protection is necessary not only to prevent attackers gaining economic advantage (by using them without paying copyright fees), but also to prevent their unethical use in situations related to national and international security.
To prevent GIS vector maps being illegally modified and exchanged, different copyright techniques have been used, which fall mainly in two categories: encryption and information hiding. Encryption is part of a cryptographic system that has the purpose to protect the content of a message/file. Information hiding is used in several sub-disciplines, of which the most important are steganography and watermarking. In steganography, the purpose of information hiding is to keep secret the existence of information, while in watermarking, the purpose is to make the hidden information imperceptible. The interested reader can find a more detailed distinction between these fields in . From these approaches, the watermarking approach is the most popular for marking the copyright of GIS vector maps.
GIS maps of raster data format (e.g. image) received more attention than digital GIS vector maps in watermarking research, e.g. [1, 178, 205]; however, due to the importance of vector maps, the research for watermarking this type of maps has increased in the last decade. This article surveys and classifies the GIS vector map watermarking research articles published between 2000 and 2014, towards a thorough understanding of the-state-of-the-art, addressing significant limitations of previous review articles, highlighting the key differences between images and GIS vector maps, and giving recommendations for future research directions the research community should address.
The rest of the paper is organized as described in the following. The next section gives an overview and critical appraisal of previous review papers in the field of vector maps watermarking. Section 3 explains the methodology used for collecting the research articles that have been reviewed in this paper. Section 4 describes GIS vector data models, entities and formats. Section 5 classifies existing GIS watermarking methods and gives an overview of the distribution of published papers according to the categories of this classification. Section 6 discusses the limitations in the current approaches and outlines directions for future GIS watermarking research. Finally, Section 7 concludes this paper.
2 Previous reviews
In 2002, Lopez  presented a review article with 43 references covering work published until 2000 to analyze the state-of-the-art of digital watermarking research including images, vector, text and databases data formats. He highlighted some key differences between the watermarking research and other research work such as cryptography and steganography. He also reported some legal aspects in watermarking research for the United States and Europe regions.
In 2003, Chang et al.  reviewed digital image watermarking research by utilizing 26 references published until 2002 to highlight possible ways of extending some image watermarking techniques to the context of 2D/3D vector map watermarking research.
In 2006, Niu et al. , used 28 references published until 2004 to distinguish the features of vector map data from raster/ image data.
In 2007, Niu et al. used 23 references published until 2004 to outline some key features of GIS vector maps, reviewed the state-of-the-art of the vector map watermarking research and classified this research into three sub-research areas: robust watermarking, reversible data hiding and fragile watermarking.
In 2008, Li et al.  used 23 references published until 2006 to summarize the status and prospects of watermarking research in GIS vector maps in terms of the basic concept, watermark generation, real-time detection and embedding strategies.
In 2009, Zheng et al.  used 30 references published until 2007 to classify digital map copyright protection schemes, and to propose some directions for further research. This review covered only the watermark embedding process.
In 2010, Zheng et al.  used 31 references published until 2007 and discussed some types of embedding techniques in the context of vector graphics. They highlighted some merits and drawbacks of a given set of image-based techniques with the purpose of suggesting some adaptations of these techniques for the vector map watermarking research context.
In 2013, two review articles are found in the literature. The first one  used 26 references published until 2012 to review a set of digital vector map watermarking techniques, and to define some possible attacks for removing the embedded watermark. The second review  used 27 references published until 2012 to classify the map watermarking components into two modules: embedding location selection module and integrity decision module. Neither of the two reviews covered the entire watermarking process.
Although all previous review articles paid attention to either differentiating vector map data from raster image data or adapting some image-based watermarking techniques to the context of GIS vector maps, nevertheless, they suffer from two major drawbacks: (a) they do not cover the entire watermarking process and (b) they do not outline their search method, nor give an indication of their coverage in relation to the total number of published articles.
The watermarking system is composed of three main components: embedding, evaluation and extraction. A comprehensive overview of our current knowledge of the digital map watermarking research progress can only be obtained by reviewing all three components of the process. In addition, without a documented search method used for the selection of the articles to review, we cannot be confident about the coverage and relevance of the reviewed research.
In this survey, we attempt to address these two major drawbacks of the current surveys by considering the three components of the watermarking system, and providing details of the methodology used for selecting the articles included in this review work. This survey article covers 215 articles published between 2000 and 2014, thus being more comprehensive than any of the previous surveys on the subject.
3 Search methodology
The search for relevant publications was performed using the following electronic libraries and databases: (i) Springer Digital Library, (ii) IEEE Xplore Digital Library, (iii) ACM digital library, (iv) Google Scholar, and (v) Elsevier Digital Library.
The search was limited to articles that have been published in English in the period between 1 January 2000 and 31 December 2014. It was done using a Boolean search containing the following terms: “GIS watermark” OR “zero watermark” OR “2D map watermark” OR “copyright protection” OR “vector data” OR “geospatial watermark” OR “vector data” OR “graph watermarking”.
Initially, any article containing the search terms was considered as a potential candidate for including into the database of the GIS map watermarking publications. To supplement the automated search, a manual search was also done. The manual procedure involved searching the reference sections of the papers identified by the automated search. Any relevant references within those articles were followed up. Inclusion criteria for the review were any theoretical or applied work concerning an integration of the GIS vector data and watermarking/ copyright protection methods.
A number of papers were identified in the search as title-only papers without access to the full text [12, 57, 99, 136, 202, 218]. These were included in the count of published papers, but were not included in the classifications discussed in Section 5.
4 GIS vector data structure
Points – point entities are used to define a single location of an object; they are used to represent real-world objects, such as bus stops, traffic lights and street lights.
Polylines – line entities define linear objects; they can range from two-point lines to complex strings that have many vertices; lines are used to represent real-world objects, such as rivers and roads.
Polygons – polygon entities define area-based objects; they can range from rectangles to multi-sided shapes with many vertices; polygons are used to represent real-world objects, such as lakes, shopping areas, buildings and city boundaries.
ESRI (Environmental Systems Research Institute) shape file. The ESRI shape file  has become an industry standard in geospatial data format due to its compatibility, to some extent, with recently released GIS software products.
CAD (Computer Aided Design) drawing. CAD drawings are used in many disciplines such as engineering, architecture, surveying, and mapping to define real-world objects in the context of geographic information systems. DXF (Drawing Interchange File)  files are a popular format for storing and exchanging vector-based spatial information.
The attribute data describes the properties of map entities through links to the location data. Attributes can be, for example, names or matching addresses. The most known example of GIS attribute data format is the ESRI database file that is associated with the ESRI shape file and needs to have the same prefix as the shape file .
Last but not least, in the GIS context, the index data describes a file structure, such as total file length, for either spatial or attribute data. The ESRI index file  is the best known example of index files.
Vector data versus image/raster data
Use points and lines to represent features
Represented as 2-dimensional array of brightness
values for pixels
Resolution is determined by precision of
Resolution is determined by pixel size
Efficiently represents sparse data
Efficiently represents dense data
Spatial relations exist
Spatial relations do not exist
Efficient storage of sparse data
Requires large amounts of storage space
Small redundancy to hide watermark
Considerable redundancy to hide watermark
Explicit representation of linear features
Deals poorly with linear features
The main challenges for watermarking vector map data are related to the embedding locations and the evaluation of the watermarking approach. Selecting embedding locations is a crucial issue in the watermarking field because of the small redundancy to hide the watermark due to the need to preserve the coordinates precision of points/vertices. The evaluation of the watermarking approach includes several challenges, such as the preservation of map quality and the robustness to attacks, which will be discussed in detail in Section 5.2.
5 Digital map copyright protection algorithms
List of the used terms in the watermarking research
Aims to utilise some key characters of the
host data in generating the watermark data.
Attempts to shape the watermark according to
some local characteristics of the original data.
Refers to the use of more than one watermark
to be embedded in the host data.
Aims to achieve a good balance between the
embedding process and the quality of the
watermarked data, and aims to restore the
original data after watermark extraction.
Refers to the field of applying watermarking
techniques to the data of image type.
The process of adding the watermark
bits directly to the value of the coordinates
5.1 Watermark embedding module
The embedding module involves hiding the watermark bits inside the original map content without affecting the visual quality of the host map. The secret key (see Fig. 1 in Section 1) should be used to enforce security and to prevent unauthorized parties from recovering and manipulating the watermark. This module involves both the embedding domains and the embedding strategies, which are discussed in the following subsections.
According to the embedding domain, a digital watermark can be embedded into two domains: space and transform domains. In the space domain, the watermark is embedded directly by modifying the values of vertices coordinates. In the transform domain, the watermark data is embedded not by directly modifying the coordinates of the vertices, but their transform coefficients instead. Space and transform domains are discussed in Sections 5.1.1 and 5.1.2, respectively.
5.1.1 Space-domain approaches
Digital map watermarking schemes in the space domain
No. of articles
As shown in Table 4, the most popular approaches are the topological relations (35.5 %) and the Cartesian coordinates (40 %); 17.25 % of the papers use blocks, while the least popular approach is the use of polar coordinates or angles (7.25 %).
The topological relations embedding approaches refer to the process of inserting the watermark into map topologies instead of vertices’ coordinates values (e.g. distance between the map vertices) to gain the advantage of preserving GIS data quality against rotation and translation attacks [48, 181]; details about these and other attacks are given in Section 5.2.2. Mean/ average distance length is the best known research example of topological relations embedding space, e.g. [4, 48, 181].
The Cartesian coordinates embedding approaches use directly the vertices’ coordinates values for inserting the watermark . Most of these approaches utilize a specified digit place after the decimal point in the vertex coordinate value for adding the watermark bits, also defined as additive watermarking  and related to the Least Significant Bit embedding strategy (see the next subsection).
The blocks-based embedding approaches divide the vector map into a number of parts (blocks) which help in achieving more robustness against noise and simplification attacks . These approaches can maintain the fidelity of the watermarked vector map to some extent, and relatively locate the watermark bits in a certain block .
The polar coordinates embedding approaches involve the use of another form of vertices’ coordinates values for directly embedding the watermark. These approaches like Cartesian coordinates-based approaches achieve good robustness to attacks such as translation, rotation and equal scaling [102, 174].
The advantages of space-domain schemes are: (a) simplicity ; (b) low computational complexity; (c) potential for high capacity of the watermark (i.e. the size of the watermark). The main disadvantage of space-domain schemes is the vulnerability to certain attack, i.e. low robustness.
5.1.2 Transform-domain approaches
Digital Map Watermarking Schemes in the Transform Domain
WT is a kind of transform that analyzes the digital vector map into different bands and levels. The wavelet-based method is robust against noise, rotation and scaling .
FT is a digital transform that offers the possibility of controlling the frequencies of the host vector map, which helps in selecting the adequate positions for embedding the watermark bits into the vector map to meet the best compromise between invisibility and robustness. The main advantage of FT is its invariance property against some geometric attacks like translation, scaling and rotation [56, 91].
CT is another digital transform that separate the vector map into parts of different frequency with respect to the vector map visual quality. The basic characteristic of CT is the high concentration of energy in low frequency coefficients with relative low computational cost [98, 191].
As shown in Table 5, the WT approach is the most popular approach used in 41 % of the papers. CT is the second most popular at 36 %, while FT is the least popular with 23 % of papers reporting the use of this approach.
Transform-domain approaches are robust against geometric attacks such as rotation, translation and scaling; however, they have the disadvantages of being hard to implement and of having high computational complexity.
5.1.3 Embedding strategies
The significant bits embedding strategy refers to the process of selecting appropriate digits within the vertex coordinate value for inserting the watermark bit. This approach represents 43 % of the published papers, and can be used in two different ways: least significant bits (LSB) (30 %) or most significant bits (13 %) (MSB).
LSB deals with the digits after the decimal point, and can be a useful hiding strategy in terms of: simplicity, invisibility, low computational time and allowing a large amount of watermark bits. LSB, however, is vulnerable to geometric distortion. LSB is mostly used in space-domain schemes with the exemption of the proposed scheme of  that used a LSB strategy in the wavelet transform-domain.
Some existing schemes used the MSB strategy that deals with the digits before the decimal point to control the modification of vertices’ coordinate according to the precision tolerance. More precisely, this approach should meet two conditions: small modifications of the coordinates should not change the shape, and two adjacent shapes should not share the same identifier.
Difference expansion is a method for inserting the watermark into any kind of high correlation data . Digital vector maps consist of a sequence of the coordinates of the vertices. Due to the density of the vertices, the positions of two adjacent vertices are usually very close and the differences between their coordinates are very small. Consequently, the sequence of vertices’ coordinates can also be considered high correlation data . Since higher correlation means lower distortions and higher capacity, the difference between two adjacent vertices is used as embedding space .
The quantization modulation strategy is a nonlinear method used to hide the watermark and scale some map objects to derive the watermarked data . This embedding strategy offers a good performance in balancing the trade-off between watermark fidelity, robustness and capacity . An example of using the quantization modulation method is the watermark embedding according to odd-even index of map coordinates or topological relations [48, 119, 167].
5.2 Watermarking evaluation module
The evaluation module assesses the quality of the watermarking approach by measuring several aspects: (a) the quality of the map after the insertion of the watermark (fidelity); (b) the resistance of the watermarked map to attacks (robustness); (c) the coverage of the watermark (capacity); (d) the computational complexity of the approach (complexity) and (e) the security of the watermark locations within the map (security). These aspects are discussed in the following subsections.
Fidelity is defined as the relative similarity between the non-watermarked host object and the one after the watermarking operation  and refers to the perceptual similarity between the watermarked data and its original data . The fidelity issue is a crucial problem in the digital maps watermarking research, as the watermarked maps need to preserve their quality.
List of published articles according to the fidelity metrics
No. of articles
The use of RMSE metric represents 48 % of the published research, while the PSNR metric is used in 20 % of the published research. 12 % of the research approaches use the BER metric, and 8 % use the NC metric. The least popular metrics in the published literature are CR (7 %), LR (3 %) and HV shift (2 %).
Most of these metrics are borrowed from image watermarking and are based on theories of signal processing. These are not necessarily the most appropriate metrics for measuring the quality of the watermarked map, as discussed in Section 6.
The digital watermark is robust if it withstands a designated manipulation on the vector map data [3, 177, 181]. Fragile watermarking allows the detection of any tampering with the vector map data [158, 159]; however, any small change in the watermark would make it undetectable. This approach has a wide range of applications such as authentication and integrity protection of the vector maps [204, 205]. Semi-fragile schemes allow the detection of malicious tampering with the vector map data [36, 118, 191]; in these schemes, the watermark is still detectable after non-malicious transformations, however, it is not detectable after malicious attacks.
A successful attack refers to the success in removing the embedded watermark while preserving the validity of the vector map data . In literature, the attacks can be classified in two categories: (a) geometric attacks [30, 170, 181], and (b) signal operation attacks [158, 169].
List of published articles according to the robustness to a set of geometric attacks
No. of articles
List of published articles according to the robustness to a set of operational attacks
No. of articles
List of published articles according to the robustness metrics
Many researchers use the same metrics for measuring both the robustness and the fidelity, as it can be seen by the overlap between Tables 7 and 11, i.e. all metrics from Table 11 are also in Table 7 and several papers are in both tables, thus indicating that the same metric is used for the two different purposes.
From the robustness metrics, the use of the NC metric represents 47 % of the published research. 27 % of the published research are approaches that use the BER metric, while the use of PSNR metric is represented by 13 % of the published research. The least popular metrics in the published literature are CR (8 %) and RMSE (5 %).
5.2.3 Capacity, complexity and security
The watermark capacity refers to the amount of embedded bits within the digital vector map [3, 17], or the total number of vertices that carry the watermark bits [4, 53, 98]. Computational complexity refers to a specific formula for measuring the embedding algorithm complexity . In other words, it stands for measuring the required time for implementing the watermark embedding approach [4, 25]. The security of a watermarking technique is defined as the level of unpredictability in identifying the watermark bits positions that are used to perform the watermark embedding process. A highly secure watermarking process would produce an output that does not contain any specific signatures that can be used to identify the watermark bits positions . The secure watermarking approach should have a secret key for the embedded bits locations in the vector map vertices, to make it more difficult for an attacker to trace the distribution of the embedded watermark bits .
List of published papers according to the evaluation metrics
No. of articles
5.3 Watermark extraction module
List of published paper according to the classification of extraction methods
No. of articles
Blind/public approaches mean that the original map is not needed in the watermark extraction process, and this category represents 86 % of published work. Semi-blind approaches refer to those approaches that do not use the original map, but use the original watermark in the watermark extraction process, and represent 3.5 % of published work. Non-blind/private approaches mean that the original host data is needed in the watermark extraction process, and represent 10.5 % of published work.
6 Overview and directions for future work
In this paper we reviewed the state-of-the-art of GIS vector maps copyright protection, with a focus on watermarking as the most popular approach to mark the copyright of GIS vector maps. The relevant work in this area has been organised according to the three modules of watermarking systems: embedding, evaluation and extraction. In the following, for each of these a brief overview is given and directions for future work are outlined.
6.1 The embedding module
The embedding module involves hiding the watermark inside the original map. The embedding can be done through a space-domain or a transform-domain scheme. The advantages of space-domain schemes are: (a) simplicity ; (b) low computational complexity; (c) potential for high capacity of the watermark (i.e. the size of the watermark). The main disadvantage of space-domain schemes is the vulnerability to certain attack, i.e. low robustness. While transform-domain schemes are robust against geometric attacks such as rotation, translation and scaling, they have the disadvantages of being hard to implement and of having high computational complexity. In the transform-domain, the capacity aspect can be less controlled compared with the space-domain, making it difficult to experiment with different levels of capacity and observe their influence on other aspects such as fidelity and robustness.
Several aspects related to the embedding module need to be addressed by the research community: (a) which attacks are relevant for vector data to satisfy the robustness of the watermarked map? (b) the trade-off between capacity and fidelity, and their implications when choosing embedding locations. These are tightly related to the evaluation module and are discussed below because they have an influence on the choice of embedding locations.
Several types of attacks can distort the watermarked map by distorting either the watermark (which would prevent the establishment of the rightful owner) or of the map (which would prevent it from being useful). There are two broad categories of attacks: geometric and signal operations attacks. The transformations done through geometrics attacks (e.g. rotation, translation) can be easily reversed on vector data with minimal data loss, which has already been pointed out . Consequently, the focus should be on the signal operations attacks (e.g. simplification, noise addition, interpolation). To allow a fair comparison between different approaches proposed for watermarking GIS vector data, a common framework for reporting the robustness to these types of attacks should be developed.
Capacity and fidelity are two important metrics in the evaluation of the watermarking approach. The capacity is about the coverage of the watermark, and is thus, related to robustness, while fidelity is about the quality of the map after inserting the watermark. These two metrics need to be balanced, as the higher the capacity, the higher the noise introduced in the map, which means lower fidelity. Low fidelity means that the watermarked map is not usable because some of the properties of the map are lost, especially with regards to the precision of points/vertices. The precision in vector data is one of the aspects that makes the vector maps most valuable, especially for applications where precision is key, such as military operations. Consequently, the balance between these two metrics is very important and should be reported in watermarking research on vector map data.
6.2 The evaluation module
The evaluation module is the most challenging module in GIS map watermarking research due to the lack of appropriate metrics to define the quality of the watermarking approach. This module involves the use of metrics for judging the quality aspects of the given approach.
The current research on GIS map watermarking suffers from the lack of appropriate metrics, and it is mainly focused on the error analysis quality aspect that has been borrowed from the research on image watermarking. Huang et al.  outline the need for considering some topological aspects in addition to the error analysis. However, the problem of measuring the quality of the GIS map watermarking approach has not been addressed yet, except for the introduction of a metric that checks for unwanted intersections between lines introduced by the watermark, which is referred to as the intersection test [63, 64].
The topological aspects are important for vector map data because the insertion of the watermark or some attacks may introduce changes in the shapes of the map (distortions), which may violate the constraints of the vector format, such as overlaps between polygons and gaps between polygons. An error metric can only measure the difference between the original map and the watermarked map in terms of the “noise” introduced by the watermark, without an indication of the presence of distortions. In fact, a watermarking approach with a higher error but little distortion is more useful that a watermarking approach with low error and large distortion. Consequently, a metric that indicates the level of distortion is needed.
As pointed out previously, some of the metrics used to measure fidelity are also used to measure the robustness of the watermark to attacks. As the watermark does not have the topological properties of the vector data, the metrics borrowed from image watermarking are suitable for this purpose.
Furthermore, as pointed out previously when discussing the embedding aspect, there is a demand for defining a specific set of attacks in order to address the robustness issue. Two examples of operations that could be used for this purpose are: (a) merging two adjacent polygons into one polygon and (b) cutting a polygon by another polygon neighbour.
Another issue tightly related to the evaluation module is that there is no benchmark data that could help the researchers to reliably compare different approaches without the need to re-implement them for this purpose.
This issue can be addressed within the GIS map watermarking research community by using free map data that is available on the Internet. For instance, some map data are freely available from the Map Library3 and DIVA-GIS4 websites.
6.3 The extraction module
In relation to the extraction module, most of the current approaches seek the blind-based extraction especially for the application of copyright protection, where the original map is not available at the detector side. This is in line with the applicability of these approaches and the need for obtaining the original map in an easy way.
6.4 Milestones for future work
Although there are significant issues to be addressed for the field of vector map watermarking, we can learn from other communities by focusing on the most important aspects that would advance the field. For example, the audio watermarking research community started with benchmark datasets and developed appropriate metrics that enables comparison between different approaches [23, 155].
use of freely available datasets; this would enable replication of research and reliable comparisons between different approaches;
development of appropriate metrics for judging map quality, as well as reporting results on both capacity and fidelity;
definition of relevant attacks and reporting robustness metrics for each of the relevant attacks.
These would enable comparisons between different approaches, which, in turn, would allow the emergence of promising techniques that can then be further refined. In this way, the research community would be able to judge the potential of different approaches and build on each other’s work, in a unified effort to advance the field.
This paper is a review of the state-of-the-art with respect to GIS vector map watermarking; it covers the most relevant work in this area from 2000 to 2014. The distinct features of GIS map data compared with other general multimedia data were highlighted and discussed in terms of their implication for watermarking approaches.
The published papers were classified according to the three main components of watermarking systems, i.e. embedding, evaluation and extraction. Within the embedding module, the different approaches were classified according to embedding domains (space and transform) and embedding strategies. Within the evaluation module, the papers were classified according to the different metrics used for measuring fidelity and robustness. In addition, papers were classified according to weather they looked into other important aspects such as capacity, security and complexity. With respects to the extraction module, the papers were classified in accordance with the use of one of three possible approaches: blind, semi-blind and non-blind watermarking.
Finally, the paper discussed several directions for further advancing this field of research and enabling a more robust evaluation of watermarking approaches by: (a) the use of freely available benchmark datasets; (b) defining appropriate metrics for map quality and reporting metrics of both capacity and fidelity; and (c) defining a set of relevant attacks to be used consistently when reporting robustness metrics.