GeoInformatica

, Volume 15, Issue 4, pp 769–803

Editing and versioning for high performance network models in a multiuser environment

Authors

  • Petko Bakalov
    • Environmental Systems Research Institute
  • Erik Hoel
    • Environmental Systems Research Institute
  • Wee-Liang Heng
    • Environmental Systems Research Institute
  • Sudhakar Menon
    • Environmental Systems Research Institute
    • University of California
Article

DOI: 10.1007/s10707-011-0126-7

Cite this article as:
Bakalov, P., Hoel, E., Heng, W. et al. Geoinformatica (2011) 15: 769. doi:10.1007/s10707-011-0126-7

Abstract

Network data models are frequently used as a mechanism to describe the connectivity between spatial features in GIS applications. Real-life network models are dynamic in nature since spatial features can be periodically modified to reflect changes in the real world objects that they model. Such updates may change the connectivity relations with the other features in the model. In order to perform analysis the connectivity must be reestablished. Existing editing frameworks are not suitable for a dynamic environment, since they require network connectivity to be reconstructed from scratch. Another requirement for GIS network models is to provide support for a multiuser environment, where users are simultaneously creating and updating large amounts of geographic information. The system must support edit sessions that typically span a number of days or weeks, the facility to undo or redo changes made to the data, and the ability to develop models and alternative application designs without affecting the published database. The row-locking mechanisms adopted by many DBMSs is prohibitively restrictive for many common workflows. To deal with long-lasting transactions, a solution based on versioning is thus preferrable. In this paper we provide a unified solution to the problems of dynamic editing and versioning of network models. We first propose an efficient algorithm that incrementally maintains connectivity within a dynamic network. Our solution is based on the notion of dirty areas and dirty objects (i.e., regions or elements containing edits that have not been reflected in the network connectivity index). The dirty areas and objects are identified and marked during the editing of the network feature data; they are then subsequently cleaned and connectivity is re-built. Furthermore, for improving performance, we propose a ‘hyperedge’ extension to the basic network model. A hyperedge drastically decreases the number of edge elements accessed during solve time on large networks; this in turn leads to faster solve operations. We show how our connectivity maintenance algorithms can support the hyperedge enhanced model. We then propose a new network model versioning scheme that utilizes the dirty areas/objects of the connectivity rebuild algorithm. Our scheme uses flexible reconciling rules that allow the definition of a resolving mechanism between conflicting edits according to user needs. Moreover, the utilization of dirty areas/objects minimizes the overhead of tracking the editing history. The unified editing and versioning solution has been implemented and tested within ESRI’s ArcGIS system.

Keywords

VersioningNetwork modelsTransportation networks

1 Introduction

Network data models have a long history as an efficient way to describe the topological connectivity information among spatial features (objects with geographic representation) in geographic information systems [11, 14, 17, 18, 25]. At an abstract level, the network model can be viewed as a graph whose elements explicitly represent the connectivity information about the features in the database. The presence of an edge in the graph depicts the information that the two features represented by the junctions are connected and vice versa. Variations of network models have been implemented in existing operational systems such as ARC/INFO [21] and TransCAD [5]. Because of the large volume of data frequently found in these networks, the model is typically persisted inside a centralized database server. Using the connectivity information stored on the server side, those systems can then be utilized to solve a wide range of problems, typical for the transportation or utility network domains (e.g., finding the shortest path between points of interest, finding optimal resource allocation, determining the maximal flow of a resource, and other graph theoretic operations).

A common weakness in previously proposed network model designs is that they do not consider dynamic modifications in the network. Such modifications occur often in many real life scenarios where spatial features can be frequently modified (e.g., features are updated, deleted or inserted). Even a single feature update can change significant number of connectivity relations among other features and connectivity needs to be reestablished. To address this problem, all existing approaches reestablish the connectivity of the whole network from scratch. While the correct result is finally produced, this process is very time consuming (networks are typically very large e.g. 50 million linear features in a continental wide transportation networks) and can be prohibitive for many applications (for example, networks supporting on-line navigation queries or location-based services). A new and effective mechanism to maintain the correctness of the network model in dynamic environment is thus needed. We term this as the dynamic editing problem.

Another requirement for an efficient GIS network model is to efficiently support a concurrent multiuser environment that creates and updates large amounts of geographic data. In scenarios where these users are required to edit the same data at the same time, the system must provide an editing environment that supports concurrent multiuser modifications without creating multiple instances of the data. In contrast to traditional DBMSs, this editing environment must also support edit sessions that typically span a number of days or weeks (e.g., large engineering projects requiring significant interactive editing and revision), the facility to undo or redo changes made to the data, and the ability to develop models and alternative application designs without affecting the published database.

Concurrency control in traditional database management systems is addressed through the use of transactions and the two-phase locking protocol. This is efficient for short-lived edit operations that are typically completed in few seconds. It is not well suited however for the type of editing tasks required when updating geographic data. For a GIS multiuser environment, the standard DBMS row-locking mechanisms would be prohibitively restrictive for many common workflows. Instead, to deal with long-lasting transactions, a solution based on the use of multiple versions has been proposed [16, 26]. A version can be logically viewed as an alternative, independent, persistent view of the database that does not involve creating a copy of the actual data. Since there is an independent view for every user, a versioned database can support multiple concurrent editors. In addition, with versioning we can support many other GIS scenarios such as:
  • Modeling “what if” scenarios. The versioning mechanism allows end users to exploit different alternatives (versions) during a design phase.

  • Workflow management. Typically the design process goes through multiple steps organized in a workflow process where the output of one step is an input for another. The versioning scheme allows users to save intermediate results during the design process.

  • Historical queries. The versioning scheme allows the preservation of different states of the data which later can be re-visited and re-examined if necessary.

However, existing database versioning approaches cannot easily manage the specifics of the geographical data such as topological network relations, the presence of connectivity among the stored elements, and traversability. We would thus like to invent an efficient versioning scheme for a network model.

In this paper we propose a unified solution to both the dynamic editing and versioning problems. We start by introducing an incremental algorithm for maintaining the correctness of the connectivity information in the presence of modifications. In this algorithm, the users are allowed to rebuild the portions of the network model, affected by the dynamic modifications, using the notions of dirty areas and dirty objects (a similar mechanism has also been applied to our topological data model [13]). A dirty area is a region inside the network spatial extend where the network features have been modified but the correctness of their connectivity information has not been verified. The network data model is assumed correct only when it is free of dirty areas. A dirty area is incrementally reduced by a process called rebuilding. The rebuilding may happen over the entire dirty area in the network, or it may affect only portions of it. The end user specifies which portions of the dirty area should be cleaned and the rebuild process analyzes and re-establishes the connectivity information there. Allowing users the ability to rebuild only portions of the dirty area is a practical requirement in scenarios involving very big seamless networks. The user may clean only these portions of the network extend which are of interest to a given application or query, thus avoiding the costly total rebuild.

Effectively dirty areas can be viewed as a mechanism to support transaction functionality over complex network data (graphs etc.). The database starts with a consistent state (i.e. without any dirty areas), then the updates are applied, the dirty areas are cleaned and eventually the database returns to a consistent state when all dirty areas are cleaned (the “end” of the transaction). The connectivity rebuilding algorithm has been implemented in ArcGIS version 10 and provides an effective solution to maintain dynamic network models in an incremental manner.

Furthermore, we introduce an additional feature in the network model that greatly improves performance, namely the notion of hyperedges. Consider solve operation that computes a nationwide coast to coast shortest route. The running time of the operation is drastically affected by the large number of edges (streets, roads, etc) the shortest path algorithm has to examine. One way to reduce this is by exploiting the natural hierarchical nature of the transportation system (freeways, highways, etc.) With hyperedges users can merge together multiple compatible features and greatly improve performance. Nevertheless, the incremental rebuild algorithm needs to be modified so as it can also support hyperedges efficiently.

We then propose a novel versioning scheme for network models that utilizes the concept of dirty areas/objects and the connectivity rebuild algorithm to allow multiple simultaneous edits. Versioning of network models is different from version control over simple spatial data (“simple” meaning data that is geometrically unrelated to other data—i.e., no topological structuring). While the same basic principles are still in operation, resolving conflicts between features that are related to other features, as with network models, is different. This is because of the specific internal behavior of the network and the requirement that the connectivity information (also known as connectivity index) in the model should be kept consistent all the time.

Our dynamic editing approach was first presented in [1] while our work on versioning in [34]. This paper combines them in a common framework that effectively provides concurrency control in a multiuser GIS environment. The unified solution has been implemented and successfully tested in ESRI’s ArcGIS. Moreover, the inclusion of hyperedges and their effects on the editing system are novel.

The rest of the paper is organized as follows: Section 2 provides a brief description of the network model including its logical structure and physical design; the notion of hyperedges is also introduced. Section 3 presents the algorithm used for connectivity establishment and an in-depth description of our rebuilding algorithm, including the effect of hyperedge support. Section 4 discusses versioning of spatial databases in general. Section 5 addresses our proposed extensions of these general techniques to the support of versioned network models. Section 6 discusses our implementation experiences, and Section 7 concludes the paper.

2 The network model

A network model is described as a graph (named connectivity graph) that maintains the connectivity information about spatial features with line or point geometry. The basic elements of a network model are edges, junctions and turns. Features with point geometry are represented with junction elements inside the graph, while lines are represented as one or more edge elements between pairs of junction elements. Figure 1 depicts a typical network model [14] as employed in our system. It is composed of spatial features and network elements. Similar designs have been used in many research or commercial implementations [7, 12, 15, 22, 24, 32]. Network elements are used only to describe the connectivity information for the spatial features they are representing; they do not carry any geometrical properties.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig1_HTML.gif
Fig. 1

Network model—features and network elements (which represent connectivity)

Most of the systems that utilize network models have client-server architectures. Because of their very large data size (e.g., many tens of millions of features for some nationwide or continent-wide transportation networks), the network models are usually located in a centralized server, persisted either in a RDBMS tables or in a file system. Typically the process of analysis is done within a GIS server (that acts as a client to the database) or within a thick client [4, 23, 27].

2.1 Traversability

While the connectivity elements (edges and junctions) allow the user to express connections, they are not sufficient for expressing specific restrictions from the real world (for example, no left turn, or, no u-turn allowed at an intersection) [3, 29, 31]. Turn restrictions are used for this purpose. Turn restrictions present a problem to most network models. The presence of turns can greatly impact the movement (or traversability) through a network [20]. A common way to model turns within a network is with a turn table [30]. A turn table represents each explicitly specified turn restriction (or penalty) as a row with references to the associated two edges. Turn tables may be augmented with an impedance attribute if the turns may also represent delays or impedances. When traversing the network, the turn table is queried as necessary. An alternative approach is to employ a transition matrix that represents possible transitions at an intersection [10].

A maneuver is a turn that spans three or more edges. Maneuvers are used to model turning movements at complex street intersections within transportation networks. Consider the following intersection formed by a dual carriageway (i.e., a street where each travel direction is represented as a separate line feature) and a two-way street in Fig. 2.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig2_HTML.gif
Fig. 2

Example of a three part maneuver e1-e2-e3 at an intersection with a dual carriageway

To restrict the u-turn from edge e1 to edge e3, we need a maneuver composed of the edges e1, e2 and e3 in sequence. The maneuver cannot be synthesized from the two overlapping turns e1-e2 and e2-e3, since restricting the e1-e2 turn also incorrectly restricts the left turn specified by the sequence e1-e2-e4.

To introduce the turn restriction in addition to the edge and junction elements, a network model can also have a special network elements called turns (see Fig. 1). Similar to the edges which are defined as a relation between junctions turns are defined as a relation between edges. A turn element is anchored to a specific junction (the junction where the turn starts) and controls the movement between sequence edges expressed as pairs (firstEdgeId, lastEdgeId).

2.2 Connectivity

One essential requirement for the network models is the ability to represent multimodal networks. An example of a multimodal network appears in transportation systems where various transportation modes (roads, bus lines, etc.) are linked. To satisfy this requirement, we use connectivity groups [14], where each connectivity group represents the set of features associated with a given mode of transportation. For example, in a transportation system with two modes (e.g., road network and railway system), there will be two connectivity groups of features.

When considering connectivity, there are two general connectivity policies that we support. The first policy, termed end point connectivity, is generally based upon spatial coincidence of the endpoints of a line features and other point features. This leads to 1:1 mapping between the features participating in the network and the network elements used to represent the network connectivity. This approach works reasonably well for simpler planar network datasets (e.g TIGER/Line)

However with non planar datasets (commonly available from commercial data vendors like Tele Atlas or NAVTEQ), which use long linear features to model elements of transportation systems such as highways, it is useful to allow network connectivity pathways along the line features. This leads to the mid span connectivity policy. The familiar one-to-one mapping between linear features and edge elements must then be generalized to one-to-many mapping.

The connectivity is established locally for each group using either the endpoint or the midspan connectivity policies (see Fig. 3). To create connections between the two connectivity groups we allow point features to participate in more than one connectivity group. As it is shown in Fig. 3, if a point feature (e.g., railway station) participates in both connectivity groups, it will connect them in its role as a junction element in the graph.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig3_HTML.gif
Fig. 3

Multimodal network example

2.3 Hyperedges

Consider the problem of finding a shortest route from Los Angeles to New York City on a nationwide USA street network. The problem is computationally prohibitive due to the size of the network (about 37 million edges for the TeleAtlas 7.2 data). One possible way to reduce the search space is by exploiting the natural hierarchical classification of the streets, e.g., interstate freeways, state highways, major roads and local streets. Even then, with an algorithm that favors traversing the interstates, the resultant route may have several thousand edges. The edge count of the route affects the time taken to find the route.

The crux of the computational inefficiency is the granularity of the interstate edge elements. Since interstates connect to ramps at relatively short intervals, there are many edge elements for each interstate. In addition, there is often a one-to-one association between features and edge elements, especially with planar data that originates from TIGER. As such, a single interstate freeway may be represented by several thousand features in the street data.

In order to solve this problem we propose a hyperedge extension to our standard network model, which relies on a re-interpretation of the mid span connectivity policy for networks with hierarchies. In particular, features model the highest level of the network that they belong to, and are appropriately merged. Lower-level network connectivity is derived from the mid-span intersection of these features.

Definition 1

A hyperedge is an edge element that spans on top of regular edge elements and has hierarchical rank higher or equal to the hierarchical rank of the feature it models.

This pre-computation approach is different from the use of oracles [35] that yields an estimate of the network distance.

To better explain the hyperedge extension, consider the example in Fig. 4, of a small street network where the streets are classified into three hierarchy ranks (primary, secondary and tertiary), and which is based on endpoint connectivity. The diagram on the left shows the features in the source data, pattern-coded by their hierarchy rank. The diagram on the right depicts the derived connectivity network. Note that the two diagrams are “identical”, reflecting the one-to-one correspondence between features and network elements that result from end-point connectivity.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig4_HTML.gif
Fig. 4

A hierarchical network

To add hyperedges into this network, we merge compatible features. We consider features to be compatible if they have the same attributes (like speed limit or number of lanes) from the transportation point of view. In our example we assume that the three horizontal primary line features are compatible, as are the three vertical secondary features. The resultant merged feature class suffices for defining a hyperedge network based on any-vertex connectivity. Each merged feature defines hyperedges at the appropriate highest level, while lower-level hyperedges are derived from parts of the merged feature. The merged features and the hyperedge network are shown on the left and right diagrams respectively in Fig. 5 below.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig5_HTML.gif
Fig. 5

Hyperedge examples

In this example feature l2 is represented in the hyperedge enabled network model by three regular edges e2, e7 and e9 at the lowest level of hierarchy and two hyperedges e10 and e11. Hyperedge e10 is placed on the second level of hierarchy and covers the regular edge elements e7 and e9. Hyperedge e11 is placed on the highest level of hierarchy and covers the regular edge elements e1 and hyperedge e10. As it can be seen from this example hyperedges can be nested where the hyperedges on the higher hierarchical level cover hyperedges on the lower levels.

To motivate the need for hyperedges, we have tested the improvement in performance generated by their usage. For this set of experiments we use the shortest path network solver and we have performed test using two different sized datasets. Table 1 shows the average solver time with and without the hyperedges. As it can be seen the improvements in solver time generated by the usage of hyperedges are substantial when performing coast to coast solves on a large network datasets. This is because for those datasets the hyperedges tend to be longer and as a result the savings generated by them are bigger (around 10 times). For smaller datasets the size of the hyperedge elements is smaller, covering fewer regular edge elements and thus they have smaller effect on the solve time. In conclusion this experimental set shows that even though the number of hyper edges in the network model is relatively small (around 5%) they have huge impact on the solve time making the traversal operations on large operations 10 times faster.
Table 1

Solver time for different network datasets

Dataset

Size (M of features)

Ave. solve time (with hyperedges)

Ave. solve time (no hyperedges)

ArcGIS Online Data

0.05

≈ 0.05 s.

≈ 0.07 s.

Paris

0.5

≈ 0.12 s.

≈ 0.69 s.

United States

35.9

≈ 1 s.

≈ 10.3 s.

Hyperedges can be seen as edge elements that model the connectivity of a network where all the features with hierarchical rank smaller than the rank of the hyperedge have been deleted. A network without the hyperedge elements is the same as that for any-vertex connectivity, but with a different assignment of hierarchy ranks. In our example, hyperedge e11 which has the highest rank, models what the connectivity for feature l2 will be, if all features with rank lower than the rank of e11 are removed from the model (these are features l3, l4 and l5) e.g if we model only the connectivity for the primary roads.

The generation of hyperedge elements and the relationships between them and lower-level elements is accomplished by extending the junction and edge generation phases of the network rebuild algorithm. The details are discussed in section 3.3 of this paper.

2.4 Physical implementation

In our network implementation the connectivity information is maintained as a set of adjacency pairs of the form < edgeId, junctionId>, stored inside the “junction table” (see Fig. 6). This approach is designed to answer the most common type of adjacency queries during the network analysis process. The junction table uses fixed-length records for direct access purposes; this implies a fixed number (four in our implementation) of adjacency pairs per record (Fig. 6). If the junction has more than four connected edges an overflow mechanism is applied.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig6_HTML.gif
Fig. 6

Network tables example

In a similar way, in the traversal process, it is required that at each junction we know all the turns anchored at this junction. This has influenced the way we implement the turn storage scheme. Information about turns is stored in the “turn table”, in the form of turn triplets < turnId, firstEdgeId, lastEdgeId>. If there are any turns anchored at a junction ji, the turn table will have a record with primary key ji which also contains all the turns anchored on ji. This storage scheme can be easily optimized for the most commonly used client access patterns [28].

Note that the physical representation of hyperedges is similar to the one used for regular edges and they are stored in the connectivity tables in the same way. For example, the edge table will also have an (id, from-jn) entry for any hyperedge included in the model.

3 Maintaining connectivity in the network model

The dynamic maintenance of the network connectivity, discussed in the introduction, can be viewed as a two phase process [1]:
  • Initial establishment of connectivity when the network model is first defined, with the connectivity information being derived from the features participating in the network.

  • Incremental rebuilding the connectivity index on a periodic basis after edits occur on the spatial features in the network.

Having an incremental solution is of significant practical value—the amortized cost of maintaining an incrementally rebuildable network is far less than an ordinary network that must be periodically rebuilt in its entirety (e.g., editing a subdivision and only rebuilding that portion of the nationwide network versus rebuilding the whole nationwide network).

The applications that will benefit from the presence of this incremental connectivity maintenance functionality are the ones where the features involved in the network model are modified frequently. As discussed above many of those modifications can affect the connectivity information in the model. A hard requirement to the network model, however, is to maintain the correctness about the feature connectivity despite these frequent modifications.

The second group of applications that require incremental connectivity maintenance involves long term editing transactions inside the network model. The long transactions usually occur during the design process of transportation networks and can cause serious inconvenience with the table locks that they hold. The solution to this problem is to encapsulate these long transactions into versions in versioned environment. Similar functionality can be found in the ArcGIS topology model [13]. However in order to support versions inside the network model, it will be necessary to support incremental rebuild.

In order to keep track of the modifications to the features that occur since the last full or partial rebuilding of the connectivity index, the network model employs the concept of dirty areas. Similarly, to track changes to elements without geometrical properties (e.g., turns), we use the concept of dirty objects.

Definition 2

A dirty area corresponds to the non-overlapping regions within the feature space where features participating in the network have been modified (added, deleted, or updated) but whose connectivity has not been re-established.

To simplify its computation and storage, a dirty area in our implementation is defined as a union of envelopes (e.g., bounding boxes) around the features that have been modified. It is possible however to use other shapes—the convex hull of the feature for example.

The size of the dirty area inside the network model can be reduced through a process of connectivity rebuilding. In order to ensure that the network is correct, the portion of the network encompassed in the dirty areas will need to be rebuilt. It is not necessary to build the entire space spanned by the dirty area at one time; instead, a subset of the dirty area can be built. If the dirty area is partially built, the original dirty area will be clipped by the extent of the region that is built.

In the initial state, the network model has no features, the underlying network has no elements, and the dirty area is empty. When edits are made to the features, or new features are loaded, the dirty area is modified (or a new dirty area is created in the case of a new feature) to encompass the extent of the feature envelope.

The dirty area mechanism however cannot be applied directly to turn features in our model. The complexity comes from the fact that the turn features are defined as a relation between two or more line features and typically do not have geometrical properties. As depicted in Figs. 1 and 6, a record in the turn table consists of a turn identifier and a list of the line feature identifiers that participate in the turn. In order to cover network elements without geometrical properties, we extend our dirty area concept with the notion of dirty objects.

Definition

A dirty object is an object without geometrical properties (like turn features or traffic data records) whose modifications have not yet resulted in the incremental rebuilding of the network connectivity index.

Using the dirty areas and dirty objects, we can capture the dynamic behavior of network maintenance. It is this dynamic behavior that complicates and thus requires extra attention during the versioning process.

We will proceed with a description of the algorithm for initial establishment of connectivity followed by the incremental rebuild algorithm.

3.1 Initial establishment of connectivity

The algorithm for initial establishment of connectivity takes as input all the features (i.e., all representations of real-world objects) in the model. It assumes that the current network model is empty. The pseudo code is shown in algorithm 1.

The first step in the algorithm is to extract information about the vertices of all features participating in the network (lines 1–7). The extracted vertex coordinates and the feature identifier are stored in a temporary table called a Vertex Table.

The next step is to sort the content of the vertex table by coordinate values to group the coincident vertices together (line 8) and extract and analyze each group of coincident vertices according to the connectivity model specified by the user (line 9). Details on creating connectivity groups can be found in [14]. For each discovered connectivity group, a new junction element is created whose identifier is stored in the column jid in the vertex table for all vertices participating in that connectivity group (lines 10–13). The content of the vertex table then is resorted by feature identifier so that the vertices for each line feature are grouped together (line 15). The vertex table is scanned sequentially, and for each pair of adjacent vertices on the same line feature, a new edge is created (lines 16–21). The network model is established after all the records in the vertex table have been processed.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Figf_HTML.gif

3.2 The incremental approach

A rebuild region is an area in the network where we re-establish connectivity. A gray region is the extent of the dirty area outside of the rebuild area. For simplicity, we first describe the rebuilding algorithm when there are no line features that intersect both the rebuild region and the gray region. The extension of the algorithm to handle line features that intersect both: the dirty area and the gray region, is discussed later in this section.

As with the algorithm for initial connectivity establishment, the incremental rebuild constructs a vertex information table and uses it to create the junction and edge elements inside the network model. The difference however is that for the incremental rebuilding the network model has already being established in a previous iteration using the initial build algorithm. We can view this network model as containing historical connectivity information for the points and lines features intersecting the rebuilding region derived before all the modifications in the feature space that created the dirty areas.

The goal of the rebuild algorithm is to replace that obsolete historical information inside the dirty areas with the current connectivity data. The pseudo code is shown in algorithm 2. The input to the algorithm is the set of line features that intersect the rebuilding region. This can be provided by a simple spatial query in the feature space.

To illustrate the algorithm, we use the example shown in Fig. 7. In this example a new line feature l6 has been added to the model. The underlying logical network however has not been updated yet (a fact shown by the presence of dirty area around the newly introduced feature). In order to update the logical network and make it consistent with the feature space we have to rebuild this dirty area. For simplicity, we assume that the rebuild region is identical to the dirty area shown with gray in Fig. 7.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig7_HTML.gif
Fig. 7

Rebuilding network example. The line features are represented on the top half, and the connectivity network on the bottom half. Dirty areas are represented by shaded rectangles

The features that intersect with the rebuild region are line features l2, l3, l5, l6, and point feature p1. The first step of the algorithm (lines 2–8 in the incremental algorithm) computes the connectivity nodes inside the rebuild region. This is done in a way similar to the initial establishment of connectivity, but now the rebuild algorithm only looks at the line vertices that belong to the rebuild region and are stored in the vertex table (it is possible that only parts of the line feature are covered by the rebuild region). All other vertices are ignored. In our example, we have three connectivity nodes discovered by the algorithm (see Fig. 8a). The first connectivity node connects line features l2 and l6, the second node connects line features l2 and l3 and point feature p1 and the last connects l3, l5 and l6.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig8_HTML.gif
Fig. 8

Rebuilding network example. The line features are represented on the top half, and the connectivity network on the bottom half. Dirty areas are represented by shaded rectangles

In the second step of the algorithm (lines 9–20), we analyze the network elements associated with the features intersecting the rebuild region. For each line feature in the rebuilding region, the algorithm deletes the associated edge elements from the network model. The junction elements connected to the endpoints of those edge elements are analyzed to determine if they satisfy at least one condition from the set of saving conditions explained below. The saving conditions are used to determine which junction elements are:
  • Junctions outside the dirty area: These junctions belong to edges that are partially covered by the rebuild region. They are saved and reused later as connection points through which the rebuild portion of the network is snatched together with the rest of the model.

  • Junctions that have point features associated: Since every point feature has a junction associated in the network model, such junctions are saved for later reuse. It might happen, though, that we have to update the junction properties like the x and y coordinates.

If a junction is saved for later reuse, the connectivity information for it is added to the information about the connectivity nodes inside the rebuild region in the vertex table.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Figg_HTML.gif

In our example, we have edges e2, e3, and e5 associated with line features l2, l3, and l5, which intersect the rebuild region. Those edge elements are removed from the network.

The junctions connected to those edges (j2, j3, j4, and j5) are analyzed to determine if they satisfy one of the saving conditions. Junctions j2 and j4 are outside the rebuild region and thus satisfy the first saving condition and will be used later for resnatching of the rebuild region with the rest of the network. Junction j3 has a point feature associated with it and therefore satisfies the second saving condition. Junction j5 does not satisfy any of the saving conditions and is thus deleted.

The third step in the rebuild algorithm (lines 22–29) involves creation of the new junction elements inside the rebuild area. For each connectivity node identified in the first step, there should be a junction element in the network. Care should be taken as some of the processed point features participating in a connectivity node (more specifically those that were present in the previous iteration) may already have an associated junction element. Instead of creating new junctions, we reuse the saved junction elements. The connectivity information for the newly created junctions is added to the connectivity information for the saved junctions. In our example, connectivity nodes 1 and 3 do not include any point feature, so we have to create new junction elements j6 and j7 for them (Fig. 9). Connectivity node 2, however, includes a point feature p1 that has an associated junction element j3, and there is no need to create a new one.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig9_HTML.gif
Fig. 9

Rebuilding network example. The line features are represented on the top half, and the connectivity network on the bottom half. Dirty areas are represented by shaded rectangles

The last step of the algorithm (lines 30–36) is the re-creation of the edge elements in the rebuild region. For this purpose, we use the information for the newly created junctions inside the rebuild region and the saved junctions from step 2 (those are the junctions used for resnatching with the rest of the network) in the saved vertex table. The information in this table is sorted using the feature ID as a primary key so that junctions that belong to the same feature are grouped together. The sorted table is then scanned and for each pair of junctions that belong to the same feature, a new edge is created.

The description of the algorithm up to this point covers the basic scenario where there are no features that intersect both the rebuild region and the gray region (so-called “partial” line features). Nevertheless, a point feature cannot intersect both the rebuild and the gray regions since the regions are nonoverlapping. The problem with those line features is that we cannot save the junction at the feature endpoint outside the rebuild region for resnatching if it is in a gray region since this is an indication that the endpoint has not been processed and a junction element may not exist for it.

The incremental rebuild algorithm is thus extended to handle the case where there are “partial” line features. For each line feature that intersects the rebuild region, information about its endpoints in the gray region is added to the set of connectivity nodes computed in the first step of the algorithm. During this process, we ignore all other feature geometries that may be present there. As a result, the connectivity node for this endpoint may be inaccurate. However, the node is in the part of the dirty area remaining after the current rebuild, and we will correct the inaccuracy with a subsequent rebuild there.

3.3 Hyperedge creation

The build and rebuild algorithm are now extended to generate hyperedge elements and the relationships between them and the lower-level edge elements. We illustrate this on the working example, where the primary, secondary and tertiary line features have hierarchy ranks 3, 2 and 1 respectively. We assume that the line features are digitized from left-to-right, and from top-to-bottom.

As discussed in the first phase of the build and rebuild algorithms, we extract the line vertices (and points) from the source feature classes into a vertex table, and sort the vertex table by location. In the second phase, connectivity analysis is performed for the group of geometries at each location, and junction elements are generated. We augment this phase by assigning a hierarchy rank to the junctions as follows: a junction is given rank r where r is the highest rank such that there is a point or line endpoint with rank r, or at least two interior line vertices with rank r. Essentially, the junction exists in the network where features with rank lower than r have been removed. This is illustrated in Fig. 10a for the locations corresponding to the vertices of line l2. The line endpoints are marked with an asterisk in the vertex column. The generated junction elements j2, j5, j9 and j10 have ranks 3, 2, 1 and 3 respectively.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig10_HTML.gif
Fig. 10

Hyperedge creation

https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Figh_HTML.gif

In the third phase of the build algorithm, we re-sort the vertex table so that vertices for the same line feature are grouped back together. We then generate edge elements connecting adjacent vertices of the line, and if necessary, hyperedge elements spanning those edges. This is accomplished by keeping a stack of generated edges, and also a stack of related junction elements in rank order; the junctions in the junction stack bracket the edges by hierarchical level, and indicate the span of pending hyperedges.

Recall that the incremental network build algorithm processes each line feature that intersects the rebuilding region as a whole, i.e., all edge elements associated with the line are first deleted and then re-created. As such, we can apply the junction hierarchy rank assignment logic and the hyperedge generation logic during the incremental network rebuild.

The pseudo code is shown in Algorithm 3. The algorithm scans over the list of vertices for given feature (line 4) creating chains of low level edges and keeping track of them in the edge stack. It also keeps track for each hierarchy level the last junction element that was analyzed in the junction stack. Whenever the algorithm reaches a junction with rank higher than the previously processed one (line 9) it creates a hyperedge and replaces all low level edge elements in the stack with this hyperedge.

In Fig. 10b, we trace through the algorithm for the vertices on line l2, showing the creation of hyperedges. In the first two steps we examine the first two junctions j2 and j5 with ranks 1 and 2 respectively. Since j2 has lower rank than j5 we create only a regular edge e1 between them but we remember j2 in the junction stack since it can participate in a hyperege spanning on top of e1. In the third step we examine the second pair of junctions j5 and j9 with ranks 2 and 3 respectively. Again j9 has lower rank than j5 so we create only a regular edge e2, remembering j5 in the junction stack. In the fourth step we examine the pair j9 and j10. This time j10 has rank higher than the rank of j9 so create the low level edge e3 and start removing items from the junction stack until we find junction with higher rank than j10. (Lines 9–18 in the algorithm) In the junction stack we have junctions j2 and j5 with ranks lower or equal to that one of j10 so pop them and create the hyper edges e10 e11. After this step all junctions associated with the line feature has been visited and the algorithm terminates.

3.4 Connectivity maintenance discussion

It should be noted that both: the initial establishment of connectivity and the incremental rebuild algorithms follow the same four steps:
  • Geometrical extraction. Extract the geometry information for all features in the area of interest (the whole area in the case of initial establishment or the dirty area in the case of subsequent rebuild) and analyze the vertices in those geometries. The extracted vertex coordinates and their corresponding feature identifiers are stored in a temporary table, called the “vertex table”.

  • Connectivity analysis. The content of the vertex table is sorted by coordinate values. As a result the coincident vertices from different features are grouped together. The algorithm scans the vertex table sequentially and picks groups of coincident vertices. Every single group is examined to determine if the vertices satisfy the connectivity model specified for the network.

  • Junction creation. For each group which satisfies the connectivity model a new junction element is created in the network model. The junction id of this newly created junction element is added to all the vertices participating in this connectivity group.

  • Edge creation. The content of the vertex table is then resorted using the feature identifier as the sorting key. As a result, the vertices for each line feature are again grouped together. The vertex table is scanned sequentially once more and for each pair of adjacent vertices which belong to the same line feature a new edge is created.

The difference between the incremental rebuild and the full (re)build algorithms, is that the incremental rebuild process adds to the vertex table those vertices that are outside of the rebuild region but belong to features which intersect the rebuild region. These vertices are saved and later reused as connection points through which the rebuild portion of the network is “stitched” together with the rest of the model.

The build algorithm was extended to generate hyperedge elements and the relationships between them and the lower-level edge elements. The step where the algorithm needs to be aware of the presence of hyperedges is the edge creation step. Recall that the incremental network build algorithm processes each line feature that intersects the rebuilding region as a whole, i.e., all edge elements associated with the line are first deleted and then re-created. As such, we can apply the junction hierarchy rank assignment logic and the hyperedge generation logic during the incremental network builds.

3.5 Rebuilding turn features

We now discuss how the turn features participating in the network are rebuilt. The problem comes from the fact that the turn features (or objects like relationships and so on) are defined as a relation between two or more line features and typically do not have geometric properties. In our implementation of the network model, the information about the turns is stored in a separate turn table with references to the line feature table. A record in the turn table consists of turn identifier and list of the line feature identifiers which participate in this turn. There is a geometry column which however is optional and is not supported by all turn feature editors. It is used to store graphical representation of the turn feature, which is not always accurate and used only for visualization of the turn on map documents. Many data vendors also do not provide content for this column thus we cannot use it for the purposes of spatial discovery.

The lack of spatial properties makes it difficult to determine if a turn feature is inside a rebuild region or not. It is possible to find the line features inside the rebuild region and check for each one of them if it participates in a turn feature. However, this will require (i) a query with spatial range predicate (to find the line features in the rebuild area) and (ii) a numerical join for each column in the turn table containing line feature identifiers (to find turns associated with a given line feature). This will be inefficient because of the multiple join operators involved and the fact that in the existing geodatabase engines the execution of a spatial predicate and a numerical join are not pipelined.

Instead we propose to extend the dirty area concept to cover network elements without geometric properties. To capture edits in aspatial features like turns we use dirty objects. We define a dirty object to be an object without geometric properties whose modifications have not been propagated to the network. When an object is modified, we store its identifier in a dirty object table. Turn features are marked as dirty objects when:
  • The turn feature is directly modified (Insert, Update, Delete), or

  • The associated line features are modified (Update, Delete), or

  • The associated network turn element is deleted (this may happen during the rebuild process).

During the rebuild process, we attempt to re-create in the network all objects from the dirty object table. When a dirty object is re-created successfully, it is removed from the table.

4 Versioned spatial database basics

Spatial databases have dramatically evolved in their capability to handle multiple simultaneous editors. Some solutions have required organizations to alter their workflow so as to ensure that no two editors are editing the same geographic region within the spatial dataset. Supporting such a constrained workflow can become problematic once the need for supporting long transactions (e.g., design alternatives) is considered. In order to address this problem where design alternatives on the same geographic area are necessary (as well as very long transactions spanning weeks or months are required), versioned geographic data management technologies were developed [6, 8, 9, 19, 30, 33]. Versioning does not prevent editing conflicts from occurring, rather, it provides an infrastructure for the detection and resolution of such conflicts.

Definition 4

A version is a logical entity that represents a unique, seamless view of the database that is distinguished from other versions by the particular set of edits made to the version since it was created.

Definition 5

A state represents a discrete snapshot of the database whenever a change is made. Every edit operation creates a new database state.

In versioned databases, there are two fundamental abstractions—versions and states. Versions are organized into a tree that is used to model the hierarchical relationships between versions (e.g., projects or design alternatives). A version is associated with a current state. A state is used to represent an instance of the database that is associated with a particular version. When a child state is created, it will initially have the same set of rows in each table as its parent state. However, as the state is edited, rows will either be added, deleted, or updated. Changes made in a child state are not visible in the parent state. Updated rows in the child will take precedence over the corresponding row in the parent when materializing the version associated with the child state.

Similar to versions, states are also organized into trees. A version will commonly be associated with numerous states over its lifetime; however, it will only be associated with a single state at any given moment in time. A given state may or may not be associated with one or more versions (as shown on the left side of Fig. 11).
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig11_HTML.gif
Fig. 11

Model depicting the relationship between versions and states is on the left, while a simple example version tree is shown on the right

In Fig. 12, we highlight a simple example where there are two versions, labeled parent and child, and an associated state tree. In the example, the parent version initially is associated with state 0. When a child version is created (as a child of the parent), it will also point to state 0. Following an edit to the child version, the child will then point to state 1. Assuming that the next edit is to the parent version, the parent will then point to state 2. The child is then edited one more time (causing the child version to point to state 3) prior to reconciling (making the changes made in the parent visible to the child—see Section 4.1 for additional details) with the parent version. The reconcile will cause the changes that have been made in the parent version (i.e., the differences between states 0 and 2) to be visible in the new state that the child will point to following the reconcile (i.e., state 4). This sequence of edits and a reconcile leaves the parent version pointing to state 2, while the child version points to state 4.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig12_HTML.gif
Fig. 12

Example version tree and state tree

Versioned databases are useful in supporting a number of database usage patterns and workflows [16]; this includes:
  • Direct multiuser editing of the main database,

  • Two-level project organizations—work-order processing systems,

  • Multi-level project organizations—hierarchical design parts and alternatives,

  • Cyclical workflows (multiple stages of approval), and

  • Historical states (temporal snapshots).

Some organizations will require the versioned database to support several of these workflows simultaneously; for example, a utility company may organize itself into a two-level project organization for maintaining its ‘as built’ status, while additionally requiring the maintenance of historical states (temporal snapshots). The key point is that a versioned database must be able to support each of these usage patterns (oftentimes simultaneously).

4.1 Operations on versioned databases

There are two fundamental operations that can be performed on versioned databases that are required in order to support versioning. There two operations are termed reconciling and posting (note—in the following discussion, we will employ the general terms ‘child’ version and ‘parent’ version; child version will refer to a version of interest, while parent version will generically refer to any ancestor version of the child within the version tree). Reconciling is logically the process of taking a child version and merging all the changes that have been made in its parent version (effectively making changes made to the parent version visible in the child). These changes may be either inserted, updated, or deleted features. This results in the creation of a new state that is then associated with the child version (e.g., state 4 in Fig. 12). Note that it is possible that conflicts may be detected during reconciliation if a given feature has been modified in both the child version as well as the parent version. Additionally, if a feature is updated in one version and deleted in another, this is also a conflict (an update-delete conflict). When conflicts occur, the changes that are made in the parent version will take precedence by default (note that it is equally reasonable to implement a reconcile process where the child version takes precedence by default). Thus, human intervention is oftentimes necessary in order to resolve the differences if any of the changes made in the child version (that are in conflict with the parent) are to take precedence. In sum, reconciling is the process of making all the changes that were made to a parent version visible in a child version.

Posting is conceptually the converse operation to a reconcile. Posting involves taking a child version that has been reconciled with its parent version, and making all the changes made in the child visible to the parent version. Conceptually, changes in the child are pushed up into the parent. Once two versions have been reconciled and posted (with one version assuming the role of descendent, and the other as the ancestor in both operations), the parent and child versions will represent the same instance of data within the versioned database (at least until another edit is made to either version).

Version reconciliation (and conflict detection) may be implemented using queries against the underlying relational database that allow all inserts, updates, and deletes that occur between two states in the state tree to be detected. We term these queries ‘difference queries’ (detect the differences between two states). Note that for a conflict to occur between a feature in a child and parent version, the difference queries between the two states associated with the child and parent version relative to their common ancestor state (e.g., state 0 in Fig. 12) must show that either the feature was either updated in both, or updated in one and deleted in the other state.

Fig. 13 depicts a simple example highlighting the interaction between states, versions, and a reconcile. In the example assume that the parent version corresponds to state 2 as indicated by the dashed arrow labeled “1” between the parent version and the circle labeled “2” (note that states correspond to labeled circles in the diagram). If a child version is now created, it will also reference state 2 (also depicted by a dashed arrow labeled “1” between the child version and state 2). State 2 also becomes what is termed the common ancestor state between the parent and child version. Assume that the child version is then edited three times. Each edit operation (an atomic set of edits) results in a new state; in this instance, states 3 through 5. At the end of the three edit operations, the child version will be referencing state 5. Following the edits to the child, assume that the parent version also has three edits made to it. This results in the creation of states 6, 7, and 8, with the parent version referencing state 8 following the edits. Now assume that the child version is reconciled with the parent version. The reconcile will require that the edits made in the parent version (essentially, the edits represented by states 6–8 in what is termed the parent branch) are made visible to the child version. This is accomplished by creating a new state (state 9) off of state 8, and pushing all the changes that have occurred in the child branch (states 3–5) into state 9, and making the child version reference state 9. Finally, the application of the Post operation following the reconcile results in the parent version also referencing state 9, making the changes made to the child visible in the parent.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig13_HTML.gif
Fig. 13

Example state tree showing the interaction between child and parent versions

4.2 Implementation details

Versions are associated with a state identifier that corresponds to each update that occurs in the view. The state identifiers are unique and map to a set of updates corresponding to a single logical edit. For each state, the database keeps information about the modification type (either an insert, update, or delete). The ADDs table contains information related to inserts and updates, while the DELETEs table maintains the deletes (Fig. 14). These two tables are collectively referred to as delta tables. One set of delta tables is associated with each base table in the versioned database. Thus, if a data model contained two tables, one representing parcels, and the second representing owners, there would be four additional tables necessary to represent the two sets of delta tables. A versioned dataset, therefore, consists of the original table (referred to as the base table, which corresponds to State 0), plus the two delta tables. The versioned database keeps track of which version the user is connected to. In addition, when modifications are made to the data, the versioning system populates the delta tables as appropriate. When a user queries a dataset in a versioned environment, the system assembles the relevant rows from the base table and the delta tables to present the correct view of the data for that particular version.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig14_HTML.gif
Fig. 14

Simple edit scenario highlighting the ADDs, DELETEs, and the base table

5 Versioned network models

Network models, with their associated network connectivity indexes, dirty areas, and dirty objects, introduce complexities into the standard reconcile and post processes within a versioned database (as described in Section 3). The primary cause of this complexity is the fact that inconsistent network indexes may occur when an edited child and parent version are reconciled. This is irrespective of whether or not each version has its full extent rebuilt (i.e., no dirty areas or objects).

Consider the situation shown in Fig. 15 (an annotated state tree is depicted—the common ancestor state refers to the state that the parent version was pointing to when the child version was originally created). In this example, assume that the network is clean; no dirty areas or objects exist with the features and the network index being in a consistent state. Edits are then made to both the parent and child versions. In the child version, the network is augmented in the southeast direction, while in the parent version the network is augmented toward the southwest. Assume that the network has been incrementally rebuilt following all edits in each version (i.e., no dirty areas or objects exist). In this Figure, connectivity between line features is represented by the small black circles. As can be observed, both the parent and child versions have planar connectivity.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig15_HTML.gif
Fig. 15

Example highlighting a reconcile that results in an inconsistent network index (the inconsistent index is depicted by the shaded region at the bottom of the figure)

If the child version is then reconciled against the parent version, new edits made in the parent version are made visible in the child version. This is depicted in the southeast corner of Fig. 15. Making these new features visible in the child version results in an inconsistency between the features and the network connectivity index as depicted in the area enclosed by the gray area. Thus, we observe a simple situation where two versions that are completely rebuilt can have a network connectivity index inconsistency following reconciliation. For this reason, the version reconcile process must be augmented to handle networks correctly.

5.1 Dirty area and object management during reconciliation

As has been discussed, versioning of network models requires additional functionality on top of the versioning scheme for simple feature classes. This is due to the fact that the model includes both: (i) a feature space with features modeling real world objects, and (ii) a logical network where connectivity information about these features is stored. The connectivity information has to be kept consistent with the state of the feature space during the process of reconciliation when new features have been introduced or existing ones have been updated or deleted in the child version as a result of the reconciliation. All these modifications introduce changes in the connectivity inside the feature space of the network model, which have to be reflected in the logical network.

There are two general approaches to solve this problem. The first one employs the concept of reactive behavior which is applied to the network and has been used in the ArcGIS geometric network model [2]. The reactive behavior refers to the logical connectivity network reacting automatically to the changes in the feature space. Thus, the process of reconciliation will require the maintenance of the connectivity information. This entails both logical networks (in the child and parent versions) being analyzed concurrently during the reconciliation process and merged together in the resultant child version. The main disadvantage to this approach is the complexity of the problem (analyzing and merging graphs) which itself can deteriorate the performance of reconcile.

To avoid this disadvantage when reconciling a network model, we choose to employ another strategy which we call the lazy approach (it is termed lazy because we are deferring the actual rebuilding of the network connectivity to a later, more convenient time). Instead of analyzing and restitching the connectivity information during reconcile, we instead utilize the incremental network rebuild algorithm discussed in section 3. We relax the requirement that the connectivity network must always reflect the state of the feature space. From a connectivity perspective, the logical network is allowed to be in an incorrect state; the regions of inconsistency are tracked by marked with dirty areas (or dirty objects in the case of turn features).

Dirty area (and object) management becomes a key concept in the versioned network model. In order to ensure that the incremental rebuilding of the network index is properly handled, we rely upon a strategy where dirty areas or objects are generated for the areas where spatial features or turn features are modified (created, updated, or deleted). The user may then choose to rebuild the network over the portions of the network where these dirty areas are introduced as a byproduct of reconcile at a time of their choosing. More specifically, we summarize the set of rules related to the handling of dirty areas and objects as follows:
  • Rule 1: All dirty areas and objects that are present in the child or parent that do not exist in the common ancestor state (i.e., before the child and parent were edited) remain in the result state of the reconcile (corresponding to the child version after reconcile). This is depicted in the left side of Fig. 16.
    https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig16_HTML.gif
    Fig. 16

    Example of dirty area management Rules 1 and 2. Here (and Figs. 17 and 18), dirty areas are represented by shaded areas; dirty objects are not depicted as they have no spatial representation. The arrows represent the sequence of events; dirty areas are labeled in the left side

    https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig17_HTML.gif
    Fig. 17

    Example of dirty area management Rule 3

  • Rule 2: All dirty areas and objects that exist in the common ancestor state but do not exist in the child (i.e., an incremental network rebuild in the child) will still exist in the child following the reconcile (depicted in the right side of Fig. 16).

  • Rule 3: All dirty areas and objects that exist in the common ancestor state but do not exist in the parent version (they were validated) will not exist in the child version following the reconcile.

  • Rule 4: All dirty areas and objects created in the child version, irrespective of whether or not they exist at the time of reconciliation, will exist following the reconcile.

  • Rule 5: All dirty areas and objects created in the parent version will only exist in the child version following the reconciliation if they exist at the time of reconcile. This situation is shown in Fig. 18.
    https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig18_HTML.gif
    Fig. 18

    Example of dirty area management Rule 4 and 5. The left side depicts how a dirty area that no longer exists in the parent at the time of reconcile will not exist in the child following the reconcile. The right side depicts the opposite situation where the dirty area on the parent version exists at the time of reconcile

In summary the reconcile rules state that all changes in the child version since its creation should be brought back to dirty state no matter if they have been cleaned or not (rules 1, 2 and 4). The changes in the parent version that have being cleaned after the creation of the child version should remain clean (rule 3). The dirty areas at the parent version present at the time of reconcile should remain dirty (rules 1 and 5).

This behavior has been put in place because the logical network, which is used to store the connectivity information for the network model is not reconciled but copied from the parent side to the reconciled version. As a result of this copy processes the logical network in the reconciled version is already consistent with the changes that occurred in the parent version since the creation of the child. This means that the clean areas in the parent should remain clean (rule 1) and the dirty areas should remain dirty (rules 1 and 5). Because the child logical network is discarded after the reconcile process and all the connectivity information persisted there after the creation of the child version is lost, the changes from the child version have to be reapplied again irrespective of whether they have been cleaned in the child version or not. Hence we bring them back to dirty state (rules 1, 2 and 4).

The reason why we choose the parent side is the fact that in most workflows this is the version which is more heavily modified. For those workflows it is easier to reapply all the changes coming from the child version rather than those coming from the parent.

5.2 Detailed example

We illustrate the behavior of versioning networks using the example shown in Figs. 19 and 20. In the common ancestor state (see Fig. 19a), two new line features l2 and l6 have been added to the previously built feature space. Within the common ancestor state, the reactive behavior of the network creates two new dirty areas around the new line features in order to keep track of the modifications in the feature space. Since the area has not been rebuild with the incremental rebuild algorithm, the two new line features l2 and l6 have not been reflected in the network connectivity index.
https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig19_HTML.gif
Fig. 19

Version example—common ancestor and parent versions. The line features are represented on the top half, and the corresponding connectivity network on the bottom half. Dirty areas are represented by shaded rectangles

https://static-content.springer.com/image/art%3A10.1007%2Fs10707-011-0126-7/MediaObjects/10707_2011_126_Fig20_HTML.gif
Fig. 20

Version example—the child and reconcile versions

In the next step, a child version is created. In the parent version, additional edits are made. Within the parent, the incremental build is run over the area encompassing the dirty area surrounding feature l2.This results in new a new edge e2 and junction j2 being created in the connectivity index. Finally, an additional line feature is created in the parent version. The result of all edits to the parent version is shown in Fig. 19b.

In the child version, a different set of edits are made. In the child, two new line features are created, l7 and l8 (this will result in new dirty areas being created). Finally, an incremental build is run over the areas encompassing the dirty areas surrounding line features l6 and l8.This results in an update of the connectivity index where edges e6 and e8 and junctions j6 and j8 are created. Fig. 20a represents the results of these modifications.

The next operation performed on the child version (Fig. 20a) is to reconcile it with the parent version (Fig. 20b). Recall that the reconcile makes the edits made in the parent version visible in the child version. Applying the rules for reconciling networks as described in Section 4.1, the child version will be modified (with the result shown in Fig. 20b). The following modifications are of note; first, the application of Rule 1 results in line l7 being associated with a dirty area (i.e., new dirty areas present in the child or parent remain after reconcile). The application of Rule 2 causes line l6 to also be associated with a dirty area (a dirty area in the common ancestor state and the parent version, but not the child version prior to reconcile). Rule 3 results in line l2 being clean and reflected in the connectivity network following reconcile (a dirty area in the common ancestor state and child version, but clean in the parent version). Rule 4 results in line l8 becoming dirty following the reconcile. Finally, the application of Rule 5 causes line l5 that was created but not rebuilt in the parent being marked as dirty following the reconcile. Following the reconcile, if the incremental rebuild is applied to all dirty areas in the child version, the result will be a clean and up to date connectivity index as depicted in Fig. 20c.

6 Implementation experiences

The proposed versioning scheme for network models has currently been implemented in the ESRI ArcGIS system and will be made available to customers in the next release. It has been tested in a multiuser editing environment for large continental wide network models, including a model derived from the set of features representing the full street network within the entire continental United States (35.9 million line features). Similarly sized networks were constructed for all of Europe. Table 2 provides examples of the reconcile times (wall clock) for different real world transportation network datasets and differing numbers of edits to the child version. The edits are applied on the geometry of the features (all the edited features were offset by certain distance). The reconcile process is performed “cold” cache style. The experiments were carried on a Windows workstation with 2 GB main memory and 3.5 GHz Core 2 Duo.
Table 2

Reconcile time for different network datasets and number of child version edits

Dataset

Size (M of features)

Edited features

Ave. reconcile time

Southern California

1.3

100

≈ 2 s.

Southern California

1.3

500

≈ 3 s.

Southern California

1.3

1,000

≈ 5 s.

SW United States

10.6

1,000

≈ 6 s.

SW United States

10.6

5,000

≈ 7 s.

SW United States

10.6

10,000

≈ 10 s.

We also examine the dependency between the reconcile time and the complexity of the dirty area which has being rebuild. There are two factors that affect the complexity of a dirty area—the number of features inside and the complexity of their geometry. In order to provide single value for comparison in this experiment we measure the dirty are complexity in the terms of number of feature vertexes inside. We use the southern California dataset which has size a little bit more than 1.3 million features using the same system configuration. The results appear in Table 3.
Table 3

Reconcile time for different complexities of the dirty area

Dataset

Size (M of features)

Number of vertexes In the dirty area

Ave. reconcile time

Southern California

1.3

3334

0.5 s.

Southern California

1.3

7754

1.1 s.

Southern California

1.3

15344

2.1 s.

Southern California

1.3

32675

3.9 s.

Southern California

1.3

59887

7.8 s.

As it can be seen from the table there is linear dependency between the number of vertexes and the reconcile time. This is because all four steps of the incremental rebuild algorithm: geometry extraction, connectivity analysis, edge creation and junction creation are feature vertex driven. Such linear dependency allows our framework to scale gracefully even for large datasets.

7 Conclusion

While network models have become the standard ways to maintain topological connectivity information in large GIS systems, existing solutions fall short in supporting efficiently dynamic modifications and concurrent multiuser environments. In this paper, we presented a unified solution for implementing both dynamic editing and versioning within high performance network models. Dynamic modifications are supported through the use of “dirty” areas/objects and an incremental rebuild algorithm that “cleans” them and brings the system back to a consistent state. The incremental algorithm can also support the use of hyperedges (a feature built in our network model for high query performance). A novel versioning scheme that facilitates the notions of dirty areas and dirty objects is also presented. Our versioning scheme provides flexible reconciling rules that allow the definition of a resolving mechanism between conflicting edits according to user needs. Moreover, the utilization of dirty areas/objects minimizes the overhead of tracking editing history.

We have implemented both the editing and versioning algorithms presented in this paper within the well-established ArcGIS development framework. The proposed ideas have proven to be efficient methods in handling concurrency control of large network datasets.

Copyright information

© Springer Science+Business Media, LLC 2011