Editing and versioning for high performance network models in a multiuser environment
- First Online:
- Cite this article as:
- Bakalov, P., Hoel, E., Heng, W. et al. Geoinformatica (2011) 15: 769. doi:10.1007/s10707-011-0126-7
- 125 Views
Network data models are frequently used as a mechanism to describe the connectivity between spatial features in GIS applications. Real-life network models are dynamic in nature since spatial features can be periodically modified to reflect changes in the real world objects that they model. Such updates may change the connectivity relations with the other features in the model. In order to perform analysis the connectivity must be reestablished. Existing editing frameworks are not suitable for a dynamic environment, since they require network connectivity to be reconstructed from scratch. Another requirement for GIS network models is to provide support for a multiuser environment, where users are simultaneously creating and updating large amounts of geographic information. The system must support edit sessions that typically span a number of days or weeks, the facility to undo or redo changes made to the data, and the ability to develop models and alternative application designs without affecting the published database. The row-locking mechanisms adopted by many DBMSs is prohibitively restrictive for many common workflows. To deal with long-lasting transactions, a solution based on versioning is thus preferrable. In this paper we provide a unified solution to the problems of dynamic editing and versioning of network models. We first propose an efficient algorithm that incrementally maintains connectivity within a dynamic network. Our solution is based on the notion of dirty areas and dirty objects (i.e., regions or elements containing edits that have not been reflected in the network connectivity index). The dirty areas and objects are identified and marked during the editing of the network feature data; they are then subsequently cleaned and connectivity is re-built. Furthermore, for improving performance, we propose a ‘hyperedge’ extension to the basic network model. A hyperedge drastically decreases the number of edge elements accessed during solve time on large networks; this in turn leads to faster solve operations. We show how our connectivity maintenance algorithms can support the hyperedge enhanced model. We then propose a new network model versioning scheme that utilizes the dirty areas/objects of the connectivity rebuild algorithm. Our scheme uses flexible reconciling rules that allow the definition of a resolving mechanism between conflicting edits according to user needs. Moreover, the utilization of dirty areas/objects minimizes the overhead of tracking the editing history. The unified editing and versioning solution has been implemented and tested within ESRI’s ArcGIS system.
KeywordsVersioningNetwork modelsTransportation networks
Network data models have a long history as an efficient way to describe the topological connectivity information among spatial features (objects with geographic representation) in geographic information systems [11, 14, 17, 18, 25]. At an abstract level, the network model can be viewed as a graph whose elements explicitly represent the connectivity information about the features in the database. The presence of an edge in the graph depicts the information that the two features represented by the junctions are connected and vice versa. Variations of network models have been implemented in existing operational systems such as ARC/INFO  and TransCAD . Because of the large volume of data frequently found in these networks, the model is typically persisted inside a centralized database server. Using the connectivity information stored on the server side, those systems can then be utilized to solve a wide range of problems, typical for the transportation or utility network domains (e.g., finding the shortest path between points of interest, finding optimal resource allocation, determining the maximal flow of a resource, and other graph theoretic operations).
A common weakness in previously proposed network model designs is that they do not consider dynamic modifications in the network. Such modifications occur often in many real life scenarios where spatial features can be frequently modified (e.g., features are updated, deleted or inserted). Even a single feature update can change significant number of connectivity relations among other features and connectivity needs to be reestablished. To address this problem, all existing approaches reestablish the connectivity of the whole network from scratch. While the correct result is finally produced, this process is very time consuming (networks are typically very large e.g. 50 million linear features in a continental wide transportation networks) and can be prohibitive for many applications (for example, networks supporting on-line navigation queries or location-based services). A new and effective mechanism to maintain the correctness of the network model in dynamic environment is thus needed. We term this as the dynamic editing problem.
Another requirement for an efficient GIS network model is to efficiently support a concurrent multiuser environment that creates and updates large amounts of geographic data. In scenarios where these users are required to edit the same data at the same time, the system must provide an editing environment that supports concurrent multiuser modifications without creating multiple instances of the data. In contrast to traditional DBMSs, this editing environment must also support edit sessions that typically span a number of days or weeks (e.g., large engineering projects requiring significant interactive editing and revision), the facility to undo or redo changes made to the data, and the ability to develop models and alternative application designs without affecting the published database.
Modeling “what if” scenarios. The versioning mechanism allows end users to exploit different alternatives (versions) during a design phase.
Workflow management. Typically the design process goes through multiple steps organized in a workflow process where the output of one step is an input for another. The versioning scheme allows users to save intermediate results during the design process.
Historical queries. The versioning scheme allows the preservation of different states of the data which later can be re-visited and re-examined if necessary.
However, existing database versioning approaches cannot easily manage the specifics of the geographical data such as topological network relations, the presence of connectivity among the stored elements, and traversability. We would thus like to invent an efficient versioning scheme for a network model.
In this paper we propose a unified solution to both the dynamic editing and versioning problems. We start by introducing an incremental algorithm for maintaining the correctness of the connectivity information in the presence of modifications. In this algorithm, the users are allowed to rebuild the portions of the network model, affected by the dynamic modifications, using the notions of dirty areas and dirty objects (a similar mechanism has also been applied to our topological data model ). A dirty area is a region inside the network spatial extend where the network features have been modified but the correctness of their connectivity information has not been verified. The network data model is assumed correct only when it is free of dirty areas. A dirty area is incrementally reduced by a process called rebuilding. The rebuilding may happen over the entire dirty area in the network, or it may affect only portions of it. The end user specifies which portions of the dirty area should be cleaned and the rebuild process analyzes and re-establishes the connectivity information there. Allowing users the ability to rebuild only portions of the dirty area is a practical requirement in scenarios involving very big seamless networks. The user may clean only these portions of the network extend which are of interest to a given application or query, thus avoiding the costly total rebuild.
Effectively dirty areas can be viewed as a mechanism to support transaction functionality over complex network data (graphs etc.). The database starts with a consistent state (i.e. without any dirty areas), then the updates are applied, the dirty areas are cleaned and eventually the database returns to a consistent state when all dirty areas are cleaned (the “end” of the transaction). The connectivity rebuilding algorithm has been implemented in ArcGIS version 10 and provides an effective solution to maintain dynamic network models in an incremental manner.
Furthermore, we introduce an additional feature in the network model that greatly improves performance, namely the notion of hyperedges. Consider solve operation that computes a nationwide coast to coast shortest route. The running time of the operation is drastically affected by the large number of edges (streets, roads, etc) the shortest path algorithm has to examine. One way to reduce this is by exploiting the natural hierarchical nature of the transportation system (freeways, highways, etc.) With hyperedges users can merge together multiple compatible features and greatly improve performance. Nevertheless, the incremental rebuild algorithm needs to be modified so as it can also support hyperedges efficiently.
We then propose a novel versioning scheme for network models that utilizes the concept of dirty areas/objects and the connectivity rebuild algorithm to allow multiple simultaneous edits. Versioning of network models is different from version control over simple spatial data (“simple” meaning data that is geometrically unrelated to other data—i.e., no topological structuring). While the same basic principles are still in operation, resolving conflicts between features that are related to other features, as with network models, is different. This is because of the specific internal behavior of the network and the requirement that the connectivity information (also known as connectivity index) in the model should be kept consistent all the time.
Our dynamic editing approach was first presented in  while our work on versioning in . This paper combines them in a common framework that effectively provides concurrency control in a multiuser GIS environment. The unified solution has been implemented and successfully tested in ESRI’s ArcGIS. Moreover, the inclusion of hyperedges and their effects on the editing system are novel.
The rest of the paper is organized as follows: Section 2 provides a brief description of the network model including its logical structure and physical design; the notion of hyperedges is also introduced. Section 3 presents the algorithm used for connectivity establishment and an in-depth description of our rebuilding algorithm, including the effect of hyperedge support. Section 4 discusses versioning of spatial databases in general. Section 5 addresses our proposed extensions of these general techniques to the support of versioned network models. Section 6 discusses our implementation experiences, and Section 7 concludes the paper.
2 The network model
Most of the systems that utilize network models have client-server architectures. Because of their very large data size (e.g., many tens of millions of features for some nationwide or continent-wide transportation networks), the network models are usually located in a centralized server, persisted either in a RDBMS tables or in a file system. Typically the process of analysis is done within a GIS server (that acts as a client to the database) or within a thick client [4, 23, 27].
While the connectivity elements (edges and junctions) allow the user to express connections, they are not sufficient for expressing specific restrictions from the real world (for example, no left turn, or, no u-turn allowed at an intersection) [3, 29, 31]. Turn restrictions are used for this purpose. Turn restrictions present a problem to most network models. The presence of turns can greatly impact the movement (or traversability) through a network . A common way to model turns within a network is with a turn table . A turn table represents each explicitly specified turn restriction (or penalty) as a row with references to the associated two edges. Turn tables may be augmented with an impedance attribute if the turns may also represent delays or impedances. When traversing the network, the turn table is queried as necessary. An alternative approach is to employ a transition matrix that represents possible transitions at an intersection .
To restrict the u-turn from edge e1 to edge e3, we need a maneuver composed of the edges e1, e2 and e3 in sequence. The maneuver cannot be synthesized from the two overlapping turns e1-e2 and e2-e3, since restricting the e1-e2 turn also incorrectly restricts the left turn specified by the sequence e1-e2-e4.
To introduce the turn restriction in addition to the edge and junction elements, a network model can also have a special network elements called turns (see Fig. 1). Similar to the edges which are defined as a relation between junctions turns are defined as a relation between edges. A turn element is anchored to a specific junction (the junction where the turn starts) and controls the movement between sequence edges expressed as pairs (firstEdgeId, lastEdgeId).
One essential requirement for the network models is the ability to represent multimodal networks. An example of a multimodal network appears in transportation systems where various transportation modes (roads, bus lines, etc.) are linked. To satisfy this requirement, we use connectivity groups , where each connectivity group represents the set of features associated with a given mode of transportation. For example, in a transportation system with two modes (e.g., road network and railway system), there will be two connectivity groups of features.
When considering connectivity, there are two general connectivity policies that we support. The first policy, termed end point connectivity, is generally based upon spatial coincidence of the endpoints of a line features and other point features. This leads to 1:1 mapping between the features participating in the network and the network elements used to represent the network connectivity. This approach works reasonably well for simpler planar network datasets (e.g TIGER/Line)
However with non planar datasets (commonly available from commercial data vendors like Tele Atlas or NAVTEQ), which use long linear features to model elements of transportation systems such as highways, it is useful to allow network connectivity pathways along the line features. This leads to the mid span connectivity policy. The familiar one-to-one mapping between linear features and edge elements must then be generalized to one-to-many mapping.
Consider the problem of finding a shortest route from Los Angeles to New York City on a nationwide USA street network. The problem is computationally prohibitive due to the size of the network (about 37 million edges for the TeleAtlas 7.2 data). One possible way to reduce the search space is by exploiting the natural hierarchical classification of the streets, e.g., interstate freeways, state highways, major roads and local streets. Even then, with an algorithm that favors traversing the interstates, the resultant route may have several thousand edges. The edge count of the route affects the time taken to find the route.
The crux of the computational inefficiency is the granularity of the interstate edge elements. Since interstates connect to ramps at relatively short intervals, there are many edge elements for each interstate. In addition, there is often a one-to-one association between features and edge elements, especially with planar data that originates from TIGER. As such, a single interstate freeway may be represented by several thousand features in the street data.
In order to solve this problem we propose a hyperedge extension to our standard network model, which relies on a re-interpretation of the mid span connectivity policy for networks with hierarchies. In particular, features model the highest level of the network that they belong to, and are appropriately merged. Lower-level network connectivity is derived from the mid-span intersection of these features.
A hyperedge is an edge element that spans on top of regular edge elements and has hierarchical rank higher or equal to the hierarchical rank of the feature it models.
This pre-computation approach is different from the use of oracles  that yields an estimate of the network distance.
In this example feature l2 is represented in the hyperedge enabled network model by three regular edges e2, e7 and e9 at the lowest level of hierarchy and two hyperedges e10 and e11. Hyperedge e10 is placed on the second level of hierarchy and covers the regular edge elements e7 and e9. Hyperedge e11 is placed on the highest level of hierarchy and covers the regular edge elements e1 and hyperedge e10. As it can be seen from this example hyperedges can be nested where the hyperedges on the higher hierarchical level cover hyperedges on the lower levels.
Solver time for different network datasets
Size (M of features)
Ave. solve time (with hyperedges)
Ave. solve time (no hyperedges)
ArcGIS Online Data
≈ 0.05 s.
≈ 0.07 s.
≈ 0.12 s.
≈ 0.69 s.
≈ 1 s.
≈ 10.3 s.
Hyperedges can be seen as edge elements that model the connectivity of a network where all the features with hierarchical rank smaller than the rank of the hyperedge have been deleted. A network without the hyperedge elements is the same as that for any-vertex connectivity, but with a different assignment of hierarchy ranks. In our example, hyperedge e11 which has the highest rank, models what the connectivity for feature l2 will be, if all features with rank lower than the rank of e11 are removed from the model (these are features l3, l4 and l5) e.g if we model only the connectivity for the primary roads.
The generation of hyperedge elements and the relationships between them and lower-level elements is accomplished by extending the junction and edge generation phases of the network rebuild algorithm. The details are discussed in section 3.3 of this paper.
2.4 Physical implementation
In a similar way, in the traversal process, it is required that at each junction we know all the turns anchored at this junction. This has influenced the way we implement the turn storage scheme. Information about turns is stored in the “turn table”, in the form of turn triplets < turnId, firstEdgeId, lastEdgeId>. If there are any turns anchored at a junction ji, the turn table will have a record with primary key ji which also contains all the turns anchored on ji. This storage scheme can be easily optimized for the most commonly used client access patterns .
Note that the physical representation of hyperedges is similar to the one used for regular edges and they are stored in the connectivity tables in the same way. For example, the edge table will also have an (id, from-jn) entry for any hyperedge included in the model.
3 Maintaining connectivity in the network model
Initial establishment of connectivity when the network model is first defined, with the connectivity information being derived from the features participating in the network.
Incremental rebuilding the connectivity index on a periodic basis after edits occur on the spatial features in the network.
Having an incremental solution is of significant practical value—the amortized cost of maintaining an incrementally rebuildable network is far less than an ordinary network that must be periodically rebuilt in its entirety (e.g., editing a subdivision and only rebuilding that portion of the nationwide network versus rebuilding the whole nationwide network).
The applications that will benefit from the presence of this incremental connectivity maintenance functionality are the ones where the features involved in the network model are modified frequently. As discussed above many of those modifications can affect the connectivity information in the model. A hard requirement to the network model, however, is to maintain the correctness about the feature connectivity despite these frequent modifications.
The second group of applications that require incremental connectivity maintenance involves long term editing transactions inside the network model. The long transactions usually occur during the design process of transportation networks and can cause serious inconvenience with the table locks that they hold. The solution to this problem is to encapsulate these long transactions into versions in versioned environment. Similar functionality can be found in the ArcGIS topology model . However in order to support versions inside the network model, it will be necessary to support incremental rebuild.
In order to keep track of the modifications to the features that occur since the last full or partial rebuilding of the connectivity index, the network model employs the concept of dirty areas. Similarly, to track changes to elements without geometrical properties (e.g., turns), we use the concept of dirty objects.
A dirty area corresponds to the non-overlapping regions within the feature space where features participating in the network have been modified (added, deleted, or updated) but whose connectivity has not been re-established.
To simplify its computation and storage, a dirty area in our implementation is defined as a union of envelopes (e.g., bounding boxes) around the features that have been modified. It is possible however to use other shapes—the convex hull of the feature for example.
The size of the dirty area inside the network model can be reduced through a process of connectivity rebuilding. In order to ensure that the network is correct, the portion of the network encompassed in the dirty areas will need to be rebuilt. It is not necessary to build the entire space spanned by the dirty area at one time; instead, a subset of the dirty area can be built. If the dirty area is partially built, the original dirty area will be clipped by the extent of the region that is built.
In the initial state, the network model has no features, the underlying network has no elements, and the dirty area is empty. When edits are made to the features, or new features are loaded, the dirty area is modified (or a new dirty area is created in the case of a new feature) to encompass the extent of the feature envelope.
The dirty area mechanism however cannot be applied directly to turn features in our model. The complexity comes from the fact that the turn features are defined as a relation between two or more line features and typically do not have geometrical properties. As depicted in Figs. 1 and 6, a record in the turn table consists of a turn identifier and a list of the line feature identifiers that participate in the turn. In order to cover network elements without geometrical properties, we extend our dirty area concept with the notion of dirty objects.
A dirty object is an object without geometrical properties (like turn features or traffic data records) whose modifications have not yet resulted in the incremental rebuilding of the network connectivity index.
Using the dirty areas and dirty objects, we can capture the dynamic behavior of network maintenance. It is this dynamic behavior that complicates and thus requires extra attention during the versioning process.
We will proceed with a description of the algorithm for initial establishment of connectivity followed by the incremental rebuild algorithm.
3.1 Initial establishment of connectivity
The algorithm for initial establishment of connectivity takes as input all the features (i.e., all representations of real-world objects) in the model. It assumes that the current network model is empty. The pseudo code is shown in algorithm 1.
The first step in the algorithm is to extract information about the vertices of all features participating in the network (lines 1–7). The extracted vertex coordinates and the feature identifier are stored in a temporary table called a Vertex Table.
3.2 The incremental approach
A rebuild region is an area in the network where we re-establish connectivity. A gray region is the extent of the dirty area outside of the rebuild area. For simplicity, we first describe the rebuilding algorithm when there are no line features that intersect both the rebuild region and the gray region. The extension of the algorithm to handle line features that intersect both: the dirty area and the gray region, is discussed later in this section.
As with the algorithm for initial connectivity establishment, the incremental rebuild constructs a vertex information table and uses it to create the junction and edge elements inside the network model. The difference however is that for the incremental rebuilding the network model has already being established in a previous iteration using the initial build algorithm. We can view this network model as containing historical connectivity information for the points and lines features intersecting the rebuilding region derived before all the modifications in the feature space that created the dirty areas.
The goal of the rebuild algorithm is to replace that obsolete historical information inside the dirty areas with the current connectivity data. The pseudo code is shown in algorithm 2. The input to the algorithm is the set of line features that intersect the rebuilding region. This can be provided by a simple spatial query in the feature space.
Junctions outside the dirty area: These junctions belong to edges that are partially covered by the rebuild region. They are saved and reused later as connection points through which the rebuild portion of the network is snatched together with the rest of the model.
Junctions that have point features associated: Since every point feature has a junction associated in the network model, such junctions are saved for later reuse. It might happen, though, that we have to update the junction properties like the x and y coordinates.
In our example, we have edges e2, e3, and e5 associated with line features l2, l3, and l5, which intersect the rebuild region. Those edge elements are removed from the network.
The junctions connected to those edges (j2, j3, j4, and j5) are analyzed to determine if they satisfy one of the saving conditions. Junctions j2 and j4 are outside the rebuild region and thus satisfy the first saving condition and will be used later for resnatching of the rebuild region with the rest of the network. Junction j3 has a point feature associated with it and therefore satisfies the second saving condition. Junction j5 does not satisfy any of the saving conditions and is thus deleted.
The last step of the algorithm (lines 30–36) is the re-creation of the edge elements in the rebuild region. For this purpose, we use the information for the newly created junctions inside the rebuild region and the saved junctions from step 2 (those are the junctions used for resnatching with the rest of the network) in the saved vertex table. The information in this table is sorted using the feature ID as a primary key so that junctions that belong to the same feature are grouped together. The sorted table is then scanned and for each pair of junctions that belong to the same feature, a new edge is created.
The description of the algorithm up to this point covers the basic scenario where there are no features that intersect both the rebuild region and the gray region (so-called “partial” line features). Nevertheless, a point feature cannot intersect both the rebuild and the gray regions since the regions are nonoverlapping. The problem with those line features is that we cannot save the junction at the feature endpoint outside the rebuild region for resnatching if it is in a gray region since this is an indication that the endpoint has not been processed and a junction element may not exist for it.
The incremental rebuild algorithm is thus extended to handle the case where there are “partial” line features. For each line feature that intersects the rebuild region, information about its endpoints in the gray region is added to the set of connectivity nodes computed in the first step of the algorithm. During this process, we ignore all other feature geometries that may be present there. As a result, the connectivity node for this endpoint may be inaccurate. However, the node is in the part of the dirty area remaining after the current rebuild, and we will correct the inaccuracy with a subsequent rebuild there.
3.3 Hyperedge creation
The build and rebuild algorithm are now extended to generate hyperedge elements and the relationships between them and the lower-level edge elements. We illustrate this on the working example, where the primary, secondary and tertiary line features have hierarchy ranks 3, 2 and 1 respectively. We assume that the line features are digitized from left-to-right, and from top-to-bottom.
In the third phase of the build algorithm, we re-sort the vertex table so that vertices for the same line feature are grouped back together. We then generate edge elements connecting adjacent vertices of the line, and if necessary, hyperedge elements spanning those edges. This is accomplished by keeping a stack of generated edges, and also a stack of related junction elements in rank order; the junctions in the junction stack bracket the edges by hierarchical level, and indicate the span of pending hyperedges.
Recall that the incremental network build algorithm processes each line feature that intersects the rebuilding region as a whole, i.e., all edge elements associated with the line are first deleted and then re-created. As such, we can apply the junction hierarchy rank assignment logic and the hyperedge generation logic during the incremental network rebuild.
The pseudo code is shown in Algorithm 3. The algorithm scans over the list of vertices for given feature (line 4) creating chains of low level edges and keeping track of them in the edge stack. It also keeps track for each hierarchy level the last junction element that was analyzed in the junction stack. Whenever the algorithm reaches a junction with rank higher than the previously processed one (line 9) it creates a hyperedge and replaces all low level edge elements in the stack with this hyperedge.
In Fig. 10b, we trace through the algorithm for the vertices on line l2, showing the creation of hyperedges. In the first two steps we examine the first two junctions j2 and j5 with ranks 1 and 2 respectively. Since j2 has lower rank than j5 we create only a regular edge e1 between them but we remember j2 in the junction stack since it can participate in a hyperege spanning on top of e1. In the third step we examine the second pair of junctions j5 and j9 with ranks 2 and 3 respectively. Again j9 has lower rank than j5 so we create only a regular edge e2, remembering j5 in the junction stack. In the fourth step we examine the pair j9 and j10. This time j10 has rank higher than the rank of j9 so create the low level edge e3 and start removing items from the junction stack until we find junction with higher rank than j10. (Lines 9–18 in the algorithm) In the junction stack we have junctions j2 and j5 with ranks lower or equal to that one of j10 so pop them and create the hyper edges e10 e11. After this step all junctions associated with the line feature has been visited and the algorithm terminates.
3.4 Connectivity maintenance discussion
Geometrical extraction. Extract the geometry information for all features in the area of interest (the whole area in the case of initial establishment or the dirty area in the case of subsequent rebuild) and analyze the vertices in those geometries. The extracted vertex coordinates and their corresponding feature identifiers are stored in a temporary table, called the “vertex table”.
Connectivity analysis. The content of the vertex table is sorted by coordinate values. As a result the coincident vertices from different features are grouped together. The algorithm scans the vertex table sequentially and picks groups of coincident vertices. Every single group is examined to determine if the vertices satisfy the connectivity model specified for the network.
Junction creation. For each group which satisfies the connectivity model a new junction element is created in the network model. The junction id of this newly created junction element is added to all the vertices participating in this connectivity group.
Edge creation. The content of the vertex table is then resorted using the feature identifier as the sorting key. As a result, the vertices for each line feature are again grouped together. The vertex table is scanned sequentially once more and for each pair of adjacent vertices which belong to the same line feature a new edge is created.
The difference between the incremental rebuild and the full (re)build algorithms, is that the incremental rebuild process adds to the vertex table those vertices that are outside of the rebuild region but belong to features which intersect the rebuild region. These vertices are saved and later reused as connection points through which the rebuild portion of the network is “stitched” together with the rest of the model.
The build algorithm was extended to generate hyperedge elements and the relationships between them and the lower-level edge elements. The step where the algorithm needs to be aware of the presence of hyperedges is the edge creation step. Recall that the incremental network build algorithm processes each line feature that intersects the rebuilding region as a whole, i.e., all edge elements associated with the line are first deleted and then re-created. As such, we can apply the junction hierarchy rank assignment logic and the hyperedge generation logic during the incremental network builds.
3.5 Rebuilding turn features
We now discuss how the turn features participating in the network are rebuilt. The problem comes from the fact that the turn features (or objects like relationships and so on) are defined as a relation between two or more line features and typically do not have geometric properties. In our implementation of the network model, the information about the turns is stored in a separate turn table with references to the line feature table. A record in the turn table consists of turn identifier and list of the line feature identifiers which participate in this turn. There is a geometry column which however is optional and is not supported by all turn feature editors. It is used to store graphical representation of the turn feature, which is not always accurate and used only for visualization of the turn on map documents. Many data vendors also do not provide content for this column thus we cannot use it for the purposes of spatial discovery.
The lack of spatial properties makes it difficult to determine if a turn feature is inside a rebuild region or not. It is possible to find the line features inside the rebuild region and check for each one of them if it participates in a turn feature. However, this will require (i) a query with spatial range predicate (to find the line features in the rebuild area) and (ii) a numerical join for each column in the turn table containing line feature identifiers (to find turns associated with a given line feature). This will be inefficient because of the multiple join operators involved and the fact that in the existing geodatabase engines the execution of a spatial predicate and a numerical join are not pipelined.
The turn feature is directly modified (Insert, Update, Delete), or
The associated line features are modified (Update, Delete), or
The associated network turn element is deleted (this may happen during the rebuild process).
During the rebuild process, we attempt to re-create in the network all objects from the dirty object table. When a dirty object is re-created successfully, it is removed from the table.
4 Versioned spatial database basics
Spatial databases have dramatically evolved in their capability to handle multiple simultaneous editors. Some solutions have required organizations to alter their workflow so as to ensure that no two editors are editing the same geographic region within the spatial dataset. Supporting such a constrained workflow can become problematic once the need for supporting long transactions (e.g., design alternatives) is considered. In order to address this problem where design alternatives on the same geographic area are necessary (as well as very long transactions spanning weeks or months are required), versioned geographic data management technologies were developed [6, 8, 9, 19, 30, 33]. Versioning does not prevent editing conflicts from occurring, rather, it provides an infrastructure for the detection and resolution of such conflicts.
A version is a logical entity that represents a unique, seamless view of the database that is distinguished from other versions by the particular set of edits made to the version since it was created.
A state represents a discrete snapshot of the database whenever a change is made. Every edit operation creates a new database state.
In versioned databases, there are two fundamental abstractions—versions and states. Versions are organized into a tree that is used to model the hierarchical relationships between versions (e.g., projects or design alternatives). A version is associated with a current state. A state is used to represent an instance of the database that is associated with a particular version. When a child state is created, it will initially have the same set of rows in each table as its parent state. However, as the state is edited, rows will either be added, deleted, or updated. Changes made in a child state are not visible in the parent state. Updated rows in the child will take precedence over the corresponding row in the parent when materializing the version associated with the child state.
Direct multiuser editing of the main database,
Two-level project organizations—work-order processing systems,
Multi-level project organizations—hierarchical design parts and alternatives,
Cyclical workflows (multiple stages of approval), and
Historical states (temporal snapshots).
Some organizations will require the versioned database to support several of these workflows simultaneously; for example, a utility company may organize itself into a two-level project organization for maintaining its ‘as built’ status, while additionally requiring the maintenance of historical states (temporal snapshots). The key point is that a versioned database must be able to support each of these usage patterns (oftentimes simultaneously).
4.1 Operations on versioned databases
There are two fundamental operations that can be performed on versioned databases that are required in order to support versioning. There two operations are termed reconciling and posting (note—in the following discussion, we will employ the general terms ‘child’ version and ‘parent’ version; child version will refer to a version of interest, while parent version will generically refer to any ancestor version of the child within the version tree). Reconciling is logically the process of taking a child version and merging all the changes that have been made in its parent version (effectively making changes made to the parent version visible in the child). These changes may be either inserted, updated, or deleted features. This results in the creation of a new state that is then associated with the child version (e.g., state 4 in Fig. 12). Note that it is possible that conflicts may be detected during reconciliation if a given feature has been modified in both the child version as well as the parent version. Additionally, if a feature is updated in one version and deleted in another, this is also a conflict (an update-delete conflict). When conflicts occur, the changes that are made in the parent version will take precedence by default (note that it is equally reasonable to implement a reconcile process where the child version takes precedence by default). Thus, human intervention is oftentimes necessary in order to resolve the differences if any of the changes made in the child version (that are in conflict with the parent) are to take precedence. In sum, reconciling is the process of making all the changes that were made to a parent version visible in a child version.
Posting is conceptually the converse operation to a reconcile. Posting involves taking a child version that has been reconciled with its parent version, and making all the changes made in the child visible to the parent version. Conceptually, changes in the child are pushed up into the parent. Once two versions have been reconciled and posted (with one version assuming the role of descendent, and the other as the ancestor in both operations), the parent and child versions will represent the same instance of data within the versioned database (at least until another edit is made to either version).
Version reconciliation (and conflict detection) may be implemented using queries against the underlying relational database that allow all inserts, updates, and deletes that occur between two states in the state tree to be detected. We term these queries ‘difference queries’ (detect the differences between two states). Note that for a conflict to occur between a feature in a child and parent version, the difference queries between the two states associated with the child and parent version relative to their common ancestor state (e.g., state 0 in Fig. 12) must show that either the feature was either updated in both, or updated in one and deleted in the other state.
4.2 Implementation details
5 Versioned network models
Network models, with their associated network connectivity indexes, dirty areas, and dirty objects, introduce complexities into the standard reconcile and post processes within a versioned database (as described in Section 3). The primary cause of this complexity is the fact that inconsistent network indexes may occur when an edited child and parent version are reconciled. This is irrespective of whether or not each version has its full extent rebuilt (i.e., no dirty areas or objects).
If the child version is then reconciled against the parent version, new edits made in the parent version are made visible in the child version. This is depicted in the southeast corner of Fig. 15. Making these new features visible in the child version results in an inconsistency between the features and the network connectivity index as depicted in the area enclosed by the gray area. Thus, we observe a simple situation where two versions that are completely rebuilt can have a network connectivity index inconsistency following reconciliation. For this reason, the version reconcile process must be augmented to handle networks correctly.
5.1 Dirty area and object management during reconciliation
As has been discussed, versioning of network models requires additional functionality on top of the versioning scheme for simple feature classes. This is due to the fact that the model includes both: (i) a feature space with features modeling real world objects, and (ii) a logical network where connectivity information about these features is stored. The connectivity information has to be kept consistent with the state of the feature space during the process of reconciliation when new features have been introduced or existing ones have been updated or deleted in the child version as a result of the reconciliation. All these modifications introduce changes in the connectivity inside the feature space of the network model, which have to be reflected in the logical network.
There are two general approaches to solve this problem. The first one employs the concept of reactive behavior which is applied to the network and has been used in the ArcGIS geometric network model . The reactive behavior refers to the logical connectivity network reacting automatically to the changes in the feature space. Thus, the process of reconciliation will require the maintenance of the connectivity information. This entails both logical networks (in the child and parent versions) being analyzed concurrently during the reconciliation process and merged together in the resultant child version. The main disadvantage to this approach is the complexity of the problem (analyzing and merging graphs) which itself can deteriorate the performance of reconcile.
To avoid this disadvantage when reconciling a network model, we choose to employ another strategy which we call the lazy approach (it is termed lazy because we are deferring the actual rebuilding of the network connectivity to a later, more convenient time). Instead of analyzing and restitching the connectivity information during reconcile, we instead utilize the incremental network rebuild algorithm discussed in section 3. We relax the requirement that the connectivity network must always reflect the state of the feature space. From a connectivity perspective, the logical network is allowed to be in an incorrect state; the regions of inconsistency are tracked by marked with dirty areas (or dirty objects in the case of turn features).
- Rule 1: All dirty areas and objects that are present in the child or parent that do not exist in the common ancestor state (i.e., before the child and parent were edited) remain in the result state of the reconcile (corresponding to the child version after reconcile). This is depicted in the left side of Fig. 16.
Rule 2: All dirty areas and objects that exist in the common ancestor state but do not exist in the child (i.e., an incremental network rebuild in the child) will still exist in the child following the reconcile (depicted in the right side of Fig. 16).
Rule 3: All dirty areas and objects that exist in the common ancestor state but do not exist in the parent version (they were validated) will not exist in the child version following the reconcile.
Rule 4: All dirty areas and objects created in the child version, irrespective of whether or not they exist at the time of reconciliation, will exist following the reconcile.
- Rule 5: All dirty areas and objects created in the parent version will only exist in the child version following the reconciliation if they exist at the time of reconcile. This situation is shown in Fig. 18.
In summary the reconcile rules state that all changes in the child version since its creation should be brought back to dirty state no matter if they have been cleaned or not (rules 1, 2 and 4). The changes in the parent version that have being cleaned after the creation of the child version should remain clean (rule 3). The dirty areas at the parent version present at the time of reconcile should remain dirty (rules 1 and 5).
This behavior has been put in place because the logical network, which is used to store the connectivity information for the network model is not reconciled but copied from the parent side to the reconciled version. As a result of this copy processes the logical network in the reconciled version is already consistent with the changes that occurred in the parent version since the creation of the child. This means that the clean areas in the parent should remain clean (rule 1) and the dirty areas should remain dirty (rules 1 and 5). Because the child logical network is discarded after the reconcile process and all the connectivity information persisted there after the creation of the child version is lost, the changes from the child version have to be reapplied again irrespective of whether they have been cleaned in the child version or not. Hence we bring them back to dirty state (rules 1, 2 and 4).
The reason why we choose the parent side is the fact that in most workflows this is the version which is more heavily modified. For those workflows it is easier to reapply all the changes coming from the child version rather than those coming from the parent.
5.2 Detailed example
In the next step, a child version is created. In the parent version, additional edits are made. Within the parent, the incremental build is run over the area encompassing the dirty area surrounding feature l2.This results in new a new edge e2 and junction j2 being created in the connectivity index. Finally, an additional line feature is created in the parent version. The result of all edits to the parent version is shown in Fig. 19b.
In the child version, a different set of edits are made. In the child, two new line features are created, l7 and l8 (this will result in new dirty areas being created). Finally, an incremental build is run over the areas encompassing the dirty areas surrounding line features l6 and l8.This results in an update of the connectivity index where edges e6 and e8 and junctions j6 and j8 are created. Fig. 20a represents the results of these modifications.
The next operation performed on the child version (Fig. 20a) is to reconcile it with the parent version (Fig. 20b). Recall that the reconcile makes the edits made in the parent version visible in the child version. Applying the rules for reconciling networks as described in Section 4.1, the child version will be modified (with the result shown in Fig. 20b). The following modifications are of note; first, the application of Rule 1 results in line l7 being associated with a dirty area (i.e., new dirty areas present in the child or parent remain after reconcile). The application of Rule 2 causes line l6 to also be associated with a dirty area (a dirty area in the common ancestor state and the parent version, but not the child version prior to reconcile). Rule 3 results in line l2 being clean and reflected in the connectivity network following reconcile (a dirty area in the common ancestor state and child version, but clean in the parent version). Rule 4 results in line l8 becoming dirty following the reconcile. Finally, the application of Rule 5 causes line l5 that was created but not rebuilt in the parent being marked as dirty following the reconcile. Following the reconcile, if the incremental rebuild is applied to all dirty areas in the child version, the result will be a clean and up to date connectivity index as depicted in Fig. 20c.
6 Implementation experiences
Reconcile time for different network datasets and number of child version edits
Size (M of features)
Ave. reconcile time
≈ 2 s.
≈ 3 s.
≈ 5 s.
SW United States
≈ 6 s.
SW United States
≈ 7 s.
SW United States
≈ 10 s.
Reconcile time for different complexities of the dirty area
Size (M of features)
Number of vertexes In the dirty area
Ave. reconcile time
As it can be seen from the table there is linear dependency between the number of vertexes and the reconcile time. This is because all four steps of the incremental rebuild algorithm: geometry extraction, connectivity analysis, edge creation and junction creation are feature vertex driven. Such linear dependency allows our framework to scale gracefully even for large datasets.
While network models have become the standard ways to maintain topological connectivity information in large GIS systems, existing solutions fall short in supporting efficiently dynamic modifications and concurrent multiuser environments. In this paper, we presented a unified solution for implementing both dynamic editing and versioning within high performance network models. Dynamic modifications are supported through the use of “dirty” areas/objects and an incremental rebuild algorithm that “cleans” them and brings the system back to a consistent state. The incremental algorithm can also support the use of hyperedges (a feature built in our network model for high query performance). A novel versioning scheme that facilitates the notions of dirty areas and dirty objects is also presented. Our versioning scheme provides flexible reconciling rules that allow the definition of a resolving mechanism between conflicting edits according to user needs. Moreover, the utilization of dirty areas/objects minimizes the overhead of tracking editing history.
We have implemented both the editing and versioning algorithms presented in this paper within the well-established ArcGIS development framework. The proposed ideas have proven to be efficient methods in handling concurrency control of large network datasets.