A cross-technology benchmark for incremental graph queries

To cope with the increased complexity of systems, models are used to capture what is considered the essence of a system. Such models are typically represented as a graph, which is queried to gain insight into the modelled system. Often, the results of these queries need to be adjusted according to updated requirements and are therefore a subject of maintenance activities. It is thus necessary to support writing model queries with adequate languages. However, in order to stay meaningful, the analysis results need to be refreshed as soon as the underlying models change. Therefore, a good execution speed is mandatory in order to cope with frequent model changes. In this paper, we propose a benchmark to assess model query technologies in the presence of model change sequences in the domain of social media. We present solutions to this benchmark in a variety of 11 different tools and compare them with respect to explicitness of incrementalization, asymptotic complexity and performance.


Introduction
Models are a highly valuable asset in any engineering process as they capture the knowledge of a system in a formal abstraction. This abstraction allows to reason on properties in order to obtain insights on the underlying physical system through analysis.
These insights need to be refreshed as soon as the models of the system change in order to stay meaningful. However, for large systems it is often not viable to recalculate the entire model analysis for every change. Rather, it is desirable to propagate these changes to the analysis results incrementally, i.e. only recalculate those parts of the analysis results that are affected by a given change.
Requirements regarding these insights often change over time. This makes it very important to express such analyses in a maintainable and understandable form. Recently, multiple model query technologies [8,9,18,46,65] have been proposed to aid this problem by deriving an incremental change propagation from a declarative specification. Further, of course one could also use existing and established non-incremental query technology, particularly if some parts of the analysis are more complex and may not be supported by tools that target incremental change propagation. Finally, it is also an option to translate the model into dedicated analysis methods to reuse query technology not based on models. Given this plethora of options, it is difficult to estimate the differences and find the trade-offs between these approaches in terms of understandability, conciseness, efficiency and others. Does the tool fit into my technology space? Is it useful to rely on the incrementalization of a tool or is it better-at least performance-wise-to implement change propagation explicitly? How long does it take to recover from an application crash? How much development effort will be necessary to implement change propagation? How does it scale? Can I speed it up by adding more CPU cores? Is the tool extensible or can it happen that my analysis is not supported at some point? 1 To aid this comparison and assess how current modelling technologies are capable of offering a concise and understandable language for model analysis, yet still offer a good performance in the presence of frequent model changes, we propose the "Social Media" benchmark. In this benchmark, two queries should be formulated that analyse a model of a social media network. In social networks, new Posts, Comments and likes arrive at a very high frequency and thus cause analyses of the entire network to invalidate quickly. In our benchmark, the analyses shall find the most influential posts and comments, according to selected criteria. While the first query is rather simple in the sense that it can easily be solved with standard query operators, the second query is more complex as graph algorithms are used that are not directly supported by query technologies.
Graph queries are difficult from an incrementalization point of view [31] as they capture a rich family of algorithms and they are often described in an imperative pseudo-code that usually utilizes state. Using a generic incrementalization system, this often spans a state space that is too large for an incrementalization to be efficient. To aid this situation, dynamic algorithms are known for a number of graph 1 These questions have been selected by the authors. We do not claim that this list of questions is complete and a proper analysis of the needs of developers is out of scope for this paper. problems that offer strategies to propagate graph changes efficiently, often with a radically different approach than how with the computation was performed in the batch (or initial) scenario. However, there is no comparison yet how and how efficient such dynamic algorithms can be included into incremental query technologies.
We collected 11 implementations of this benchmark that cover most of the possible approaches mentioned above and have mostly been created by authors or active developers of the respective tools. Therefore, we claim that we can compare the tools through the solutions for this benchmark. The tools cover a wide area of model query technology and database management systems; thus, the analysis of the solutions using those tools allows us to reason on the previously mentioned questions.
The benchmark was originally proposed at the Transformation Tool Contest (TTC) 2018 as live contest [51]. At the TTC, several solutions have been submitted [12,17,34,49,78]. After the TTC, we invited further researchers working in incremental query technology to provide solutions in their favourite tools. In this paper, we present all of these solutions and compare them with respect to their features, complexity of change propagation and overall performance.
The remainder of this paper is structured as follows: Sect. 2 presents the benchmark and the queries that it consists of. Section 3 explains the solutions. Section 4 compares the solutions with respect to the declarativeness of the query language, the used data model, the explicitness of the incrementalization, the durability, support for parallelism and the asymptotic complexity of change propagations. Section 5 presents experimental performance results and their analysis. Section 6 discusses related work, and Sect. 7 concludes the paper.

The social media benchmark
In this section, we present the benchmark which we use to compare existing incrementalization approaches. First, Sect. 2.1 explains the metamodel that we use to model a social network graph, and then, Sect. 2.2 presents the two queries used in the benchmark. We describe the change sequences in Sect. 2.3 and the phases of the benchmark in Sect. 2.4.

Metamodel and change sequences
In this benchmark, we use the data from the 2016 DEBS Grand Challenge, 2 adapted from the LDBC Social Network Benchmark [5,30] and the SIGMOD 2014 Programming Contest [27]. For this version of the benchmark, we created an Ecore metamodel to describe the social network, translated the series of events in the original data sources to models and change sequences and manually tuned the obtained change sequences. 3 The metamodel of the social network is depicted in Fig. 1. The social network consists of Users, Posts and Comments. Users may have friends, which is a unidirectional edge (i.e. it behaves as a symmetric relationship). Submissions form a tree with a Post in its root and Comments as the rest of its nodes. Users may like Comments (likes edge).

Queries
In the scope of the proposed benchmark, we focus on two model queries. The first query (Q1) is rather easy and is expected to be directly supported by the tools. It shall return the most controversial Posts in the social media network. The second query (Q2) is more sophisticated, and we expected it not to be directly supported by tools and force the solutions into case-specific extensions. It shall return the most influential Comments. The results of both queries are concatenated to strings in order to use them for automated correctness checks.

Query 1: Most controversial posts
We consider a Post as controversial, if it starts a debate through its Comments. For this, we assign a score of 10 for each Comment that belongs to a Post. Hereby, we consider a Comment belonging to a Post, if it is a reply to either (1) the Post itself or (2) another Comment that already belongs to the Post. In addition, we also value if Users liked Comments, so we additionally assign a score of 1 for each User that 2 https://web.archive.org/web/20190131182518/https://debs.org/ debs-2016-grand-challenge-social-networks/. 3 We introduced changes that would actually affect the results of the queries defined in Sect. 2.2. has liked a Comment (Users are counted per Comment liked, i.e. a User can contribute multiple times to the overall score). In short, the score of a Post is calculated by taking all Comments which are in the Submission tree rooted in the Post, and scoring each of them as the 10 plus the number of Users that liked the Comment, then summing up these scores.
The goal of the query is to find the three Posts with the highest score. Ties are broken by timestamps, i.e. more recent Posts should take precedence over older Posts. The result string of this query is a concatenation of the Posts ids, separated by the character | (Fig. 2).

Query 2: Most influential comments
In this query, we aim to find Comments that are liked by groups of Users. We identify groups through the friendship relation. Hereby, Users that liked a specific Comment form an induced subgraph where two Users are connected if they are friends (but still, only Users who have liked the Comment are considered). The goal of the second query is to find connected components in that graph. We assign a score to each Comment which is the sum of its squared component sizes.
Similarly to the previous query, we aim to find the three Comments with the highest score. Ties are broken by timestamps, i.e. more recent Comments should take precedence over older Comments. The result string is again a concatenation of the Comment IDs, separated by the character | (Fig. 3).

Change sequences
To measure the incremental performance of solutions, the benchmark uses generated change sequences. The changes are available in the form of models. An excerpt of the change metamodel is depicted in Fig. 4. There are classes for each elementary change operation that distinguish between the type of a feature, whether it is an attribute, association or composition change. Subclasses further specify the kind of operation, i.e. whether elements are added, removed or changed. In these concrete classes, the added, deleted or assigned items are included. 4 The change metamodel also supports change transactions where a source change implies some other changes, for example, setting opposite references.
The elementary changes can be categorized into the following five change types: -A new User is added -A new Post is added -A new Comment is added as a reply to an existing Submission -A Comment is liked -Two existing Users become friends The changes are always additive, i.e. nodes or edges are never deleted. During the contest, these change sequences were made available in NMF and EMF. However, solutions are also allowed to transform the change models into their own, internal change representation. In such case, the transformation of the change representation is excluded from the time measurements.
An example of an update consisting of five elementary changes is depicted in

Benchmark phases
The benchmark consists of the following execution phases: 1. Initialization: Setup modelling framework. 2. Loading: Load initial models. 3. Initial evaluation: Compute selected query for initial model. 4. Updates: Change sequences are applied to the model.
Each change sequence consists of several elementary change operations. After each change sequence, the query result must be consistent with the changed source models, either by entirely creating the view from scratch (batch evaluation) or by propagating changes to the view result (incremental evaluation).
Solutions are mainly competing on their update performance, but are also expected to keep the times of the loading and initial evaluation at a reasonable level.

Solutions
In this section, we present the solutions developed for the Social Media benchmark from Sect. 2. We present the following solutions: -Modelling tools and graph databases running in a managed runtime (JVM 5 or CLR 6 ): -A solution using the plain .NET query API and NMF to represent the models in memory (Sect. 3.2) and a solution using the incrementalization system of NMF (1) the AOF solution with Xtend surface language, (2) the ATL Incremental solution with ATL surface language. -Batch and incremental solutions using the Neo4j graph DBMS (Sects. 3.10-3.11).
-Tools using relational or matrix-based representations, running on native runtimes: -Batch and incremental solutions using the Post-greSQL relational DBMS (Sects. 3.12-3.13). -Batch and incremental solutions formulated in linear algebra using the GraphBLAS API and the SuiteSparse:GraphBLAS library (Sects. 3.14-3.15), implemented in C++. -A solution using the Differential Dataflow programming model implemented in Rust (Sect. 3.16).
In broad terms, there have been recurrent approaches for solving the two queries. To help reduce the length of the solution descriptions, a first subsection will provide a broad categorization of the approaches followed by the various solutions for each query. This will be followed by a subsection for each solution: it will start with a description of the used tool, followed by detailed explanations on how they implemented the queries. Given that most solutions have been developed by the tool authors or expert developers, we assume that the solutions represent the best or close to optimal solution possible for each tool.
Solutions created during the contest were described in the proceedings of the event [35]. Preliminary version of the Neo4j and GraphBLAS solutions was discussed in [29] and [28], respectively.

Common solution approaches
Further examination of the various solutions showed that several common patterns were recurrent through the approaches chosen to solve the two queries. In this section, we will provide a broad classification that will be used in the solution descriptions, in order to allow them to focus on their technology-specific aspects.

Query 1: Most controversial posts
The solutions for this query can largely be characterized across two dimensions: whether they were incremental in their maintenance of the scores, and how they maintained the set of the top three elements. Table 1 summarizes this  classification. Regarding the first dimension, we acknowledge similarly to Giese and Wagner [36] that there can be different degrees of incrementality in a transformation. 7 Specifically, we consider these three variants: -Non-incremental or batch solutions, which re-compute the scores from scratch after each change. -Partially incremental solutions, which re-compute scores of Posts impacted by changes from scratch. -Fully incremental solutions that update the scores of the Posts impacted by changes as needed, without recomputing them from scratch (for instance, a new like would simply increase the previous score by 1).
In regard to the second dimension, four broad variants were found: Note that the online top-x is only feasible due to the fact that the possible changes mentioned in Sect. 2.3 never remove likes or delete comments: this means that scores are monotonically increasing. This optimization will typically be one that a generic "top-x" incremental operator would not do, as the sorting key could go down as well as up in value in the general case.

Query 2: Most influential comments
The solutions can be characterized across three dimensions: whether they maintained the set of connected components in the graph incrementally, the approach used to compute the set of connected components, and how they sorted the scored results. The general classification is summarized in Table 2.
For the first dimension of incrementality, this is generally easier than in the previous solution, as it is simply whether after each change, they update the set of connected components, or re-compute it. For the second dimension, five broad approaches were followed: -The most popular approach was to reuse Tarjan's algorithm for computing the set of connected components in a graph [84], which has a worst-case performance of O(|V | + |E|). -The next most popular approach was a naive recomputation of the component that each node belonged to after each change, by using breadth-first or depth-first traversals from that node. -Some solutions took advantage of the fact that the changes never remove links, using a simpler union-find data structure and using the union operation whenever a new edge was added. In a way, this is similar to how some solutions of query 1 used this "monotonically growing" graph to simplify the way to maintain the top-x elements. -There were a few individual solutions that used specialized algorithms, such as FastSV [88], or label propagation.
The third dimension is the approach followed to compute the top scoring results, which uses the same four broad variants as in Sect. 3.1.1.

Reference solution: C# query syntax
As reference, we use a solution using NMF [45,50] for the model representation and the standard .NET collection operators through the C# query syntax. During the TTC and throughout the remainder of this paper, this solution is used as a reference in terms of performance and understandability as it represents a solution if one just uses what mainstream programming languages (in this case C#) can provide without a dedicated incremental model query technology.

Tool description
The query syntax and the underlying collection operators are features built into the C# programming language and actively used by millions of developers. The reference solution for Q2 is depicted in Listing 2, a full sorting batch implementation using Tarjan's algorithm. It is very similar to Listing 1 in its overall structure except for the slightly more complex score calculation due to the required computation of connected components. For this, we are using the class Layering, an implementation of Tarjan's algorithm [84]. This implementation requires specifying the underlying graph through its nodes (for each Comment, this is the set of Users that have liked the Comment) and a function obtaining incident nodes of a given node (in this case the friends of a given User that have also liked the Comment).
The result of the scoring computation is then again ordered and the IDs of the first three elements are returned.

Tool description
NMF Expressions [46,52] is an incrementalization system that uses the feature of the C# language to compile lambda expressions into models of the code, instead of machine code. These models are then used to derive a dynamic dependency graph (DDG) from a given expression and observe changes. These changes originate from elementary update notifications and are propagated through the dependency graph.
Based on the underlying formalization as a categorial functor, NMF Expressions is able to include custom incrementalizations of functions and includes a library of such manually incrementalized operators, including most of the Standard Query Operators (SQO). 8

Query 1
While NMF Expressions includes an incrementalization of most SQOs, the Take method used in the reference solution for Q1 is not supported, same as any other query operators of the SQO that deal with indices. The reason for this is that it is costly to find out the index of an element in, for example, a filtered, unsorted list given the index in the source collection. This always requires a linear scan; meanwhile, the SQO implementations generally try to propagate changes in constant or near constant time (logarithmic effort for updating sorted lists). Therefore, the fact that Take is not supported is simply because it cannot be implemented efficiently, at least not in the general case.
However, we identified top-x queries, i.e. analyses that sort elements for a given criteria and then report on a small number of elements with the best scores, as a rather common pattern. Therefore, we added dedicated support for this kind of analysis operators into NMF. This operator is called TopX. It is essentially a combination of an incremental sort (using balanced binary search trees) and a simple poll for the first x elements upon any change of the balanced binary search tree, assuming that x is small (in comparison with the size of the search tree). The computation of the scores is, however, using the incrementalization system and fully incremental. 8 http://msdn.microsoft.com/en-us/library/bb394939.aspx.
Ev en though the term SQO may not be common to many developers, the syntax is very commonly used in the .NET community, in particular through the LINQ technology that translates queries to SQL or other declarative languages. The solution to Q1 is shown in Listing 3. We simply calculate the top-3 Posts where the score of a Post is calculated as a tuple of the score and the timestamp in order to break ties. The result of the lambda expression in Line 2 is an array of tuples of the Posts and their scores (actual score and timestamp). Lines 3 to 6 determine how this tuple is created: we iterate over all Comments in a Post and, for each, sum up 10 plus the number of Users that have liked the Comment.
Lines 1 and 7 surround this lambda expression with a call to NMF Expressions to obtain an incrementalization of this analysis. With that call, we tell NMF Expressions to create a DDG for us. The return value of this function is the root node of this DDG. This node implements a generic interface INotifyValue that provides the current analysis result as well as an event to notify clients when the analysis result changes. In the benchmark solution, we do not make use of this event but repeatedly query the current value of the DDG node. This call is fast, because each node in the DDG always references its current value.

Query 2
The solution for Q2 is very similar to the solution of Q1, except for the fact that for each Comment, it involves finding connected components in a graph spanned by Users that liked the Comment and their friendship relations. There is no incremental implementation for finding connected components available that is built into NMF.
From an algorithmic point of view, the incrementalization of the connected components will be rather simple: we cache the current set of connected components and re-compute whenever an edge is added to the graph between two nodes of different components. This incremental algorithm can be isolated into a class ConnectedComponents that is theoretically reusable in different contexts.
With this algorithmics class, we can solve Q2 as in Listing 4. Similar to Q1, the solution is an incremental solution with incrementally maintained full sort and an incremental version of Tarjan's algorithm. In Lines 1 and 2, we create a function that, given a Comment and a User, creates the collection of Users who (1) are friends with the given User and (2) like the Comment. Because we do not care about changes of the incident function, the function to get the connected Users is used as compiled code, using the Func class. This is slightly more efficient than the format of lambda expressions that NMF Expressions can use for the incrementalization. However, NMF Expressions currently has some problems to integrate compiled lambda expressions, so we need to specify this function separately, outside the scope of NMF Expressions.
In Lines 4 and 12, we frame the actual analysis with NMF Expressions, allowing it to create the DDG for the inner analysis that is in Lines 5 to 11. In particular, we first iterate all elements in the social network model and filter for Comments in Line 5. From these, we pick the topmost elements according to the tuple of scores and timestamps, similar to the solution of Q1. To calculate the score of a Post, we simply run the analysis of connected components where the incident nodes of a given User are the subset of his friends that also liked that Comment (Lines 7 to 9). Given these connected components, we calculate the sum of the squared sizes (Line 10) and break ties using the timestamps (Line 11).

Transactions and parallelism
NMF Expressions has some support for transactions and parallelism. The support for transactions means that a DDG node is only processed if all of its dependencies have been processed. Because each transaction may invalidate a different set of DDG nodes, this implies that changes in the DDG need to be processed in a two-pass fashion: in the first pass, the set of potentially affected DDG nodes is calculated, while in the second pass, the changes are actually propagated. Because the transactional behaviour guarantees that each DDG node is only updated at most once in each transaction, the overhead of two passes may be saved. However, whether or not this is the case largely depends on the analysis and the change sequence.
Q1 cannot profit from transactions at all, because whether the changes to any DDG node come at once or one after another does not matter. For Q2, this is slightly different because we are recalculating the connected components in multiple cases. If a change sequence contained multiple events that would cause to re-compute the connected com-ponents for a Comment, this calculation is only needed once if the changes are propagated in a transaction.
The transactional support in NMF Expressions is very easy for a developer to use: all that needs to be done is to put the changes inside a transaction. Because in the scope of the benchmark, these changes come in a dedicated change sequence object, we just need to wrap the application of such a change sequence in a transaction as done in Listing 5. Lastly, NMF Expressions also allows to propagate the changes within such a transaction in parallel. This is done by changing the execution engine implementation as depicted in Listing 6. The parallel change propagation then allows changes within a transaction to be propagated in parallel on different threads, synchronizing at each DDG node.

Tool description
Eclipse Hawk is a tool to manage models that have been fragmented (e.g. for versioning purposes) by incrementally indexing the various connected fragments into a common graph database [8]. Hawk can watch over a local folder or a version control system and update the graph whenever the model files change. For this case study, Hawk was configured with the ability to index EMF models, maintain a graph database using one of three backends (Neo4j, SQLite, or Greycat [42]) and run queries in a dialect of the Epsilon Object Language (EOL) [61]. Greycat implements a graphoriented data model on top of existing key-value stores, such as LevelDB or RocksDB. For the present work, LevelDB was chosen as the underlying key-value store.
Further, Hawk provides the concept of "derived attributes", which extend a type with pre-computed expressions which are updated incrementally as the graph changes. These attributes are also indexed for fast lookup.

Query 1
Several versions of the first query were implemented. For the sake of clarity, we will call these "batch", "incremental update" (IU) and "incremental update and query" (IUQ). As mentioned in Table 1, all Hawk variants have partial incrementality. Batch and IU use full result sorting, and IUQ uses an "online top-x" approach.
The batch mode is the most direct use of Hawk. In this version, the tool is told to watch a folder that contains the initial version of the model. An EOL script replays the change sequences on the model, and Hawk updates the graph based on the new versions of the model. The Post type is extended with the score derived attribute, which is updated incrementally by Hawk. The definition of score is as follows, using self as the Post being extended: With this derived attribute, it is simple to implement the main query itself. However, in order to sort the Posts with the Java Collections sort method, a native Java class implementing the Comparator interface is needed: 1 var scored = Post.all.collect(p | Sequence { 2 p.id, p.score, p.timestamp}).asSequence; 3 Native('java.util.Collections').sort(scored, 4 new Native('org.hawk.ttc2018.queries.ResultComparator')); 5 return scored.subList(0, scored.size.min(3)); Listing 8 Hawk Q1: Use of derived attribute score in the batch and incremental update solutions.
The incremental update mode uses an alternative graph update class, ChangeSequenceAwareUpdater. This speeds up the process by directly applying the change sequence on the graph, without touching the original model file. The query itself remains the same as the one in the batch mode, using the same derived attribute.
Finally, the incremental update and query mode reuses the same custom updater, while changing the way the query is run. A graph change listener is attached to Hawk: on each new model version, only the updated Posts are re-scored before selecting the new top-3 elements. The re-scoring is done by invoking the expression in Listing 7 on each Post that has been updated.

Query 2
The second query goes through the same three versions: batch, incremental update and incremental update and query.
The actual query is noticeably more complex, since it essentially requires implementing Tarjan's strongly connected components algorithm [84] in about 37 lines of EOL. As indicated in Table 2, Tarjan is re-run after each change, and IUQ uses an "online top-x" sorting approach instead of doing a full sort.
In this case, the Comment class is extended with a score derived attribute. A high-level view of the query shows how it loops over the Users that liked the Comment, detecting connected components in each of them and computing the score of the Comment as the sum of the squares of the sizes of each component: The queries for the batch and incremental update mode are the same, with a collection of all the IDs, scores and timestamps, and sorting to get the top-3 elements: The incremental update and query mode works the same as in the previous query. A graph change listener detects the Comments that should be re-scored, and the new scores are merged with the old ones to keep the top-3 elements up to date.

Tool description
Attribute Grammars [60] can be used to describe the structure of context-free data along with their static semantics. Reference Attribute Grammars (RAGs) [43] extend this paradigm such that attributes are allowed to return other nodes of the abstract syntax tree (AST) as a result of their computation.
The tool JastAdd [44] is based on Reference Attribute Grammars and offers non-terminal attributes [86] and circular attributes [62], used to represent a part of the model and to compute parts of the second query, respectively. In JastAdd, a grammar specified in BNF with inheritance and attributes specified written in Java are woven together to generate plain Java code. It contains a Java class for every non-terminal specified within the grammar, uses the code of the attributes inside its methods and has additional boilerplate code, e.g. to handle caching.
JastAdd does not work with EMF models directly, but requires users to transform the input metamodel into an AST representation using a dedicated syntax [67]. Listing 11 shows the grammar representing the metamodel of Fig. 1. All identifiable nodes inherit from ModelElement giving them a unique number. Also, the root node SocialNetwork is identifiable to later be able to insert new Users and Posts. The two non-containment references friends and submissions are replaced by explicit unidirectional relations, whereas the likes and likedBy edges are modelled using an explicit bidirectional relation (Line 9 in Listing 11).

Query 1
To solve the first query, all Comments referring to a Post are computed, and afterwards, the score of this Post can be calculated with the following attribute: After gathering all scores, a simple iteration over all Posts and keeping the top-3 is performed to get the final result.

Query 2
For the second query, two variants are evaluated, showing two different approaches within the JastAdd solution. Those variants differ in the way how the Users liking the same Comment are calculated. For the first variant, a circular attribute User.getCommentLikerFriends (shown below) is used. This attribute will start with the set containing the User it should compute the friends for. Then, if another friend likes the Comment, it adds this friend and calls itself recursively. The recursion always terminates as the number of Users is finite and the circular attribute is only invoked again if the returned set has changed.
The second variant follows the approach described in [66] highlighting reuse of application-independent analysis. In fact, the algorithm for computing an SCC presented in that work was reused without modification for solving the subproblem of query 2 to compute Users liking the same Comment. To integrate it, a mapping from User to Component was established, while the friend relationship served as a basis to connect those components.
To compute the score of a Comment using the second variant, the squared size of every set of components is summed: To get the final result, the same iteration as for the first query is used (not shown here).

Tool description
YAMTL [16] is a model transformation language for EMF models, with support for incremental execution [18], designed as an internal DSL of Xtend. The solution to the Social Media benchmark uses query rules that only consist of input patterns, whose filters define the queries, and uses the YAMTL pattern matcher for evaluating them. Queries are defined using Xtend and the Java Collections Framework. The YAMTL solutions implement the online top-x approach, keeping the best three candidates at all times, with the operation bestThreeCandidates.
Three variants of the solutions have been implemented: batch (YAMTL-B), implicitly incremental (YAMTL-II) and explicitly incremental (YAMTL-EI). YAMTL-B disables dependency tracking enabling a faster initial transformation but subsequent updates compute queries from scratch. YAMTL-II detects which matches need to be re-computed according to updates to the input model, without requiring additional logic in the queries. For each impacted match, the filter expression is re-evaluated from scratch, and the solution is therefore partially incremental. YAMTL-EI exposes model updates affecting an impacted match so that these can be processed explicitly in the filter expression, and the solution is therefore fully incremental. The additional logic that handles incremental updates explicitly in YAMTL-EI has been highlighted in grey. YAMTL-B and YAMTL-II solutions can be obtained by deleting this code. Enabling incremental evaluation and explicit handling of updates is done via configuration parameters.

Query 1
Q1 is implemented as a query rule, whose input pattern is formed by an in element that will match Posts that contain Comments as indicated in the filter. In particular, the EMF method post.eAllContents() fetches all contained Comments within the matched Post. The implementation of the query is shown in Listing 12.  The expression this.fetch('dirtyObjects') returns the objects that are added under the Post object being matched. For each such added object that is a Comment, the score is computed. The solution to Q2 shows how this case could be handled.

Query 2
The computation of connected components has been implemented using Sedgewick and Wayne's weighted quick unionfind with path compression algorithm [80]. 9 The query, including logic handling updates explicitly, is shown in Listing 13. The instantiation of the class FriendComponentUtil_UF computes the connected components of the graph whose nodes are the set of Users who liked the Comment, i.e.comment.likedBy. Comment scores are stored so that they can be subject to updates.
The expression this.fetch('dirtyFeatures') returns the collection of features for the Comment being matched, which have been updated so that they can be handled explicitly. The original union find algorithm has been extended to enable incremental updates of the computed components when a Comment is liked by a User (addLikedBy()) and when a new friendship is declared (addFriendship()). When there are less than three Comments with a score different from zero, Comments from candidatesWithNilScore are used to complete the list.

Tool description
ATL [55] is one of the most common model transformation languages. The solution to the Social Media benchmark only uses ATL queries, i.e. the ATL constructs that allow users to define expressions in the ATL flavour of OCL. ATL queries can call helper OCL functions and libraries and are evaluated over the source model(s) by the ATL virtual machine, that is optimized for model operations.
The vanilla ATL that was used to implement the benchmark does not have support for incremental execution. Classical engines that execute ATL incrementally [56,63] do not support the incremental evaluation of OCL expressions. Changes on the source model trigger the recomputation only of the impacted OCL expressions, but the whole expressions are re-computed. In Sect. 3.9, we describe a solution that achieves incremental OCL expression evaluation for ATL code, by compiling it towards AOF.
The solution is a pure ATL query and executed on the most recent ATL virtual machine (EMFTVM). Since ATL queries are OCL expressions, this solution includes a complete encoding of the case study as declarative and functional OCL code.

Query 1
The full code for Q1 is presented in Listing 14. The recursive allComments helper gathers the set of Comment for a given Post, and a score for the Post is computed by the given formula (Line 11) considering the number of Comments and likes to these Comments. The main query topPosts sorts the set of Posts by score (and timestamp) and picks the top-3 Posts.

Query 2
The code for Q2 is shown in Listing 15. In particular, the allComponents helper implements a one-pass algorithm for the detection of all the connected components. The algorithm iterates on the liker Users: if the liker has not been visited, then compute a new component by the allFriends helper. The allFriends helper (whose implementation is not shown in the listing) is just a standard depth-first traversal, limited to the subgraph s. Finally, a score is computed for each Comment (Line 5), and the top-3 Comments are identified similarly to Q1 (Lines 1-2).

Tool description
Xtend 10 [13] is a modern Java dialect suited for rapid prototyping thanks to its flexibility and expressiveness. Like ATL, the vanilla Xtend does not support incremental execution.

Query 1
We have written a first batch implementation of Q1 and Q2 in pure Xtend using the Eclipse Modeling Framework (EMF) plugin to perform loading and navigation into models. In a second step, we optimize this solution using Java 8 Streams to parallelize some operations on collections. The Xtend code used for the implementation of Q1 (Listing 16) shows that this mechanism is used two times: (1) to process all Posts in parallel and (2) to compute the sum of all likes received by Comments of a Post in the computeScore method. For better performance, we have also implemented a specific stream operation, called Greatest3, to avoid sorting the whole list of Posts while only the top-3 Posts can be considered.

Query 2
The code of Q2 (Listing 17) is similar to the implementation of Q1 except for the computeScore method. Indeed, the second query requires to find connected groups of Users through the friend relationship. For this purpose, the computeScore method uses a connected components algorithm based on Tarjan's algorithm [84].

Tool description
Active operations [9] are OCL-like operations equipped with incremental propagation algorithms. They may thus be used to incrementally evaluate OCL expressions [19,Section 5] such as the one found in ATL-like model transformations. It 10 https://www.eclipse.org/xtend/. is therefore possible to use active operations to write incremental queries and transformations. The AOF implementation [53] of active operations supports EMF models and is based on the Observer design pattern, although alternative execution strategies [22] have been explored. It is implemented in Java and can be used from Java, Xtend, or ATL code. Each mutable value is wrapped in an observable box, which is either a collection, or a singleton value.
Though AOF provides enough basic active operations to implement the case study, creating specific operations sometimes helps [54] achieve a better performance. For this case study, we developed four new operations: 1. sortedBy returns a sorted copy of its source collection using one or more criteria using balanced binary trees. 2. take returns the n first elements of a collection. 3. allContents retrieves all model elements contained in a given source element, filtering them by type. 4. layering implements an incremental connected component algorithm.
From these, layering is more specific to some graph-related transformations, and the others are relatively generic. We present two variants of this solution: 1. The AOF solution is written in Xtend, as shown in Listings 18 and 19. 2. The ATL Incremental solution is written in ATL and leverages the ATOL [23] compiler that translates it to Java code that makes use of AOF. It is basically a transliteration of the Xtend code from Listings 18 and 19 into ATL syntax, The main advantage of this variant w.r.t. the AOF variant is that it makes it possible to use the declarative ATL syntax.

Query 1
The implementation of Q1 in AOF is depicted in Listing 18. The actual computation is stored in a hash table such that it does not have to be computed repeatedly. Within the score calculation, we use a method that iterates through the containment hierarchy in conjunction with a type filter, similar to the NMF solution. To calculate the score based on likes, we use the lifting mechanism of OCL that implicitly lifts the property likedBy to collections. The ATL Incremental implementation of Q1 is shown in [23,Listing 6]. It is very similar to the code from Listing 14, with the most notable difference being that the ATOL compiler does not support the query keyword.

Query 2
The solution for Q2 is depicted in Listing 19. Again, the actual score calculation is moved to a helper method. The score calculation itself is then making use of the layering operation.

Tool description
Neo4j is a graph database management system using the property graph data model. Such graphs consist of labelled entities, i.e. nodes and edges, which can be described with properties encoded as key-value pairs. Neo4j uses the Cypher query language [33] which offers both read and update constructs [37]. While the main focus of Neo4j is to run graph queries in an online transaction processing (OLTP) setup, it also supports graph analytical algorithms with the Graph Data Science library 11 [71].

Query 1
Q1 c Cypher query in Listing 20. The Cypher language uses node labels (e.g. Post, Comment, User), edge types (e.g. COMMENTED, LIKES) to express graph patterns. The query matches every node with label Post, then all its Comments via a series of COMMENTED edges, then the Users via direct LIKES edges. The OPTIONAL MATCH clause denotes an optional pattern, where variables are set to NULL values if there is no match. The RETURN clause is used to group and aggregate. The results are grouped by the id and timestamp properties of the Posts, aggregated, and then, the top-3 scores are returned. The aggregation counts the likes using the number of Users (a User can like multiple Comments) and counts the number of Comments (DISTINCT is used to remove duplicate Comments).

Query 2
Listing 21 shows the batch solution for Q2 using the variant of the union-find algorithm [68] implemented in the Neo4j Graph Data Science library. The procedure gds.wcc.stream is used to find connected components of the subgraph given by Cypher queries matching the nodes and the edges. For each Comment with likes, the first Cypher query in Lines 3-7 selects Users who like the Comment, and the second query selects all FRIEND edges as pairs of Users. The library loads each subgraph into an in-memory projected subgraph before running the computations. The procedure returns the ID of the component containing the User node. Lines 10-13 calculate the squared sum of the component sizes and select the top-3 scores. On Lines 15-18, the query enumerates the top-3 Comments without likes and the UNION of the two sets are returned. For a detailed comparison of strategies to compute Q2, we refer the reader to [29].

Tool description
The incremental solution for Neo4j uses node properties and new nodes to materialize the result of previous iterations. For every batch of updates these elements are refreshed, then the top-3 scores are collected. While Q1 can be computed efficiently with only Cypher constructs, the solution for Q2 uses the fixed-point calculation, dynamic node manipulation and reachability procedures of Neo4j's APOC stored procedure library. 12

Query 1
To incrementally evaluate Q1, we initially compute the score for each Post as in the Neo4j Batch solution but, instead of returning it, we store it in the score property as shown in Listing 22. Based on this property, the current top-3 scores can be computed using Listing 25. The score property is indexed to improve lookup times. After new elements of an update are inserted, the score property of new Posts is initialized to zero, Listings 23-24 maintain the property for new Comment nodes and LIKES edges, and then, Listing 25 is used to get the top-3 elements.

Query 2
The incremental Neo4j solution for Q2 materializes the components of the subgraph for each Comment comm by edges to a Component node and inserting two edges: where the Component node connects all Users who know each other directly or via friends who also liked the Comment comm. The conversion is executed for each component one by one using the fixed-point query execution mechanism of APOC. To achieve this, first the solution marks the nodes of each subgraph with dynamically named labels (Listing 41) and finds reachable nodes using the APOC library (Listing 42). The incremental evaluation is performed by merging the components then maintaining their sizes and the resulting scores (Listings 43-44).

Tool description
To study the usability and performance of relational database management systems (RDBMSs), we implemented a batch solution in PostgreSQL. Figure 6 shows the database schema capturing the social network model. Instances of each node type (e.g. Comment) and the edge type with many-to-many cardinality (friends) are stored in relations (tables) with the following schemas: comments(id, ts, content, submitterid, parentid) posts(id, ts, content, submitterid) likes(userid, commentid) users(id, name) friends(user1id, user2id) Each relation representing a node has a primary key. Many-to-many edges are represented in association tables with two foreign keys. Many-to-one edges are stored as a foreign key in the table representing the node at the endpoint of the edge with a cardinality of "one". Additionally, indexes were defined on the foreign keys. This supports the SQL optimizer in choosing arbitrary join orders.
Evaluating both Q1 and Q2 require checking transitive reachability between nodes, a common recursive query which cannot be expressed in first-order logic or relational algebra [4]. However, it is possible to express such queries using a relational database by 1. either defining additional data structures and running a sequence of SQL queries in a loop until reaching a fixed point [26] or 2. or using SQL:1999's WITH RECURSIVE construct, which allows the formulation of recursive queries.
In this solution, we use WITH RECURSIVE as it is widely available in modern SQL implementations [89].

Query 1
To evaluate Q1, for each Comment, we need to first find the root Post of the Comment-chain, which is computed as the transitive closure of the Comment-parentid-Comment/Post edge type.
Having the transitive closure enables us to match the corresponding Post, Comment, User triples, where the Comment is a response rooted in the Post and the User is someone who liked the Comment. We use two left outer joins to ensure that Posts without Comments and Comments without likes are kept with NULLs. This is followed by an aggregation computing the score and finalized with a top-3 selection as shown in Listing 26.

Query 2
To evaluate Q2, we need to determine the connected components of the induced subgraphs on the User-friends-User edge type, which is computed using the transitive closure on the graph.

PostgreSQL incremental
To improve performance for repeated query executions, we have extended the PostgreSQL Batch solution (Sect. 3.12) with support for incremental updates. This section discusses the changes introduced in the schema and presents the queries that maintain the results upon changes.

Tool description
For the incremental solution, we have extended the database schema as shown in Fig. 7 tioning strategy [1] on each table. The partitioning uses an attribute status consisting of a single character "B" or "D". During a particular update phase, rows that were already in the database are stored in the "B" (before) partition, while rows that have just been inserted are temporarily stored in the "D" (diff ) partition. At the end of each update phase, rows from the diff partition migrate to the before partition. For both queries, we implemented algebraic incremental view maintenance with delta queries derived using the rules given in [38] and [81, Appendix E].

Query 1
The incremental solution for Q1 consists of three steps, each affecting the q1_scoring auxiliary relation, i.e. an initial step, then, for each update of the graph, a sequence of interim result maintenance and final result retrieval queries. Comments. This is expressed as the inner join of the before partition of Posts and Comments, and the diff partition of likes. In each subquery, one operand of the inner join is the diff partition of either the posts, comments or likes relations. Their usually small record count can be exploited by the SQL query optimizer to speed up joins. Using the calculations above, q1_scoring is updated by increasing the scores of old Posts and inserting new Posts along with their scores. The relational algebraic formula for this interim result maintenance query is given in Fig. 13 and proved in Fig. 14.
3. The retrieval step is a simple top-3 query on q1_scoring (Listing 35).

Query 2
Similarly to Q1, the incremental solution for Q2 again consists of three steps, i.e. an initial step, then, for each update of the graph, an alternating sequence of interim result maintenance and final result retrieval queries. Both the initial and the maintenance steps are further divided into two queries affecting the cf and cfc interim relations.

Tool description
GraphBLAS is a recently proposed standard built on the theoretical framework of matrix operations on semirings [57], which allows concise and portable formulation of graph algorithms [58]. The goal of GraphBLAS is to create a layer of abstraction between the graph algorithms and the graph analytics framework, separating the concerns of the algorithm developers from those of the framework developers and hardware designers. The GraphBLAS standard defines a C API [20] that can be implemented on a variety of hardware components including GPUs.
Data format An untyped graph can be represented as an adjacency matrix A ∈ N |V |×|V | , where rows and columns both represent nodes of the graph and element A i j represents the number of edges from node i to node j. If the number of edges is not important, the adjacency matrix is defined over Boolean values, i.e. A ∈ B |V |×|V | , with A i j = 1 if there is an edge from node i to node j and A i j = 0 otherwise. Bipartite graphs can be represented with a non-square adjacency matrix A ∈ N |V 1 |×|V 2 | , where V 1 and V 2 are the sets of vertices in the two partitions. Typed graphs such as the ones used in this paper can be represented as using a bipartite adjacency matrix for each edge type, where V 1 represent the source nodes and V 2 represents the target nodes, e.g. Likes ∈ B |users|×|comments| . The graphs used in practical applications such as social networks are sparse. The adjacency matrices representing these graphs are also sparse, i.e. most elements in their adjacency matrix are zero values. These sparse matrices can be represented efficiently using matrix compression techniques such as CSR (Compressed Sparse Row). A graph can also be stored as an incidence matrix B ∈ B |V |×|E| , where rows and columns represent nodes and edges (resp.). For undirected graphs, each column contains two 1 values in the positions of the source and the target nodes of the edge, and other elements are 0. Incidence matrices are sparse for all graphs with more than a few nodes. Notation We follow the notation conventions of Graph-BLAS as presented in [25]. Table 3 contains the list of GraphBLAS operations and methods used in this paper. Matrices are typeset in bold and start with an uppercase letter, e.g. Friends. Vectors are typeset in bold and start with a lowercase letter, e.g. scores. Additionally, sets are typeset in italic and start with a lowercase letter, e.g. posts.
Step 3 finds connected components in the induced subgraph using the FastSV algorithm [88] of the LAGraph library (Line 12). This produces a vector containing the component ID for each User.
Step 4 yields the squared sum of component sizes, i.e. the score for each Comment (Line 14).

Tool description
To incrementalize the GraphBLAS Batch solution (Sect. 3.14), we have reworked the solution to compute changes to the results upon updates instead of running a full re-evaluation.

Approach
The incremental version performs a full batch evaluation for the first run and computes the scores. Then, for each update only those parts of the model are re-evaluated which might be affected by the update. Finally, the previous top-3 scores and the new ones are compared to maintain the result. Incremental computation of Q1 uses fine-grained maintenance as it stores the scores of each Post and updates them when new Comment nodes and likes edges appear. The granularity of the incremental version of Q2 is coarser, as it collects the Comments whose scores can change and re-evaluates them.
Notation The updated variables are denoted with prime, e.g. the updated version of vector scores is scores , which contains the scores for the new nodes and the updated scores for the existing ones. The changes can be stored as increment matrices/vectors, denoted with a superscript plus symbol ( + ) and are applied with the ⊕ operation to the original values, e.g. scores = scores ⊕ scores + . Another option is to store the changed values, denoted with , and apply them by overwriting the existing values: the new vector is initialized with the original one: scores = scores, then the new values overwrite the existing ones via a mask, which preserves the unaffected values from modification: scores scores = scores.

Query 1
To incrementally evaluate Q1, Algorithm 3 updates the scores as follows. Lines 9 and 10 compute the increment of the score induced by new Comments. In Line 11, the number of likes the Comments newly received is summed up per Post.
The two types of increments are summed up in Line 12.
For subsequent evaluations, the scores are updated using the increment vector (Line 13). To find the top-3 scores, only the previous maximum values and the Posts with updated scores are considered. Line 14 yields the updated score values by assigning the scores vector via the scores + increment vector as a mask, which allows changes in the result only if the mask has an element at the corresponding position. Figure 8a shows an example calculation. Using the output of this algorithm, merging the previous top-3 scores and the new ones yields the new result. RootPost T ∈ B |posts |×|comments | all root Posts 6: Output 7: scores ∈ N |posts | only changed scores 8: scores ∈ N |posts | all updated scores 9: sum = ⊕ j RootPost T (:, j) # of new comments 10: repliesScores + = 10 × sum 11: likesScores + = RootPost T ⊕.⊗ likesCount + 12: scores + = repliesScores + ⊕ likesScores + increment vec. 13: scores = scores ⊕ scores + update scores 14: scores scores + = scores collect updated scores 15: return scores, scores

Query 2
The incremental evaluation of Q2 is depicted in the lower part of Fig. 8b. Algorithm 4 returns the Comments with new scores ( scores) by re-evaluating the Comments which the updates might impact on. Merging the previous top-3 scores and the new ones yields the new result. (New scores overwrite existing ones.) The first phase of the algorithm (Steps 1 -5 , Lines 14-20) collects the Comments which might be affected by the updates (affected comments, acSet set), and then, the second phase (Steps 6 -9 , Line 21) computes the new scores of these Comments using the batch algorithm described in Algorithm 2.
A Comment might be affected by an update if (1) it is a new Comment, or (2) the Comment receives a new incoming likes edge from a User, resulting in a new component or the expansion of an existing one, or (3) two Users who like the Comment become friends, which merges the components the Users belong to (if they previously belonged to different components). Case (3) is covered by Lines 14-18, where Steps 1 -4 compute the Comments which might be affected by new friends edges. NewFriends incidence matrix represents each new friendship by a column having two 1-/valued elements for the two Users. For each new friendship (i.e. pair of Users), Step 1 computes how many User of the pair likes each Comment (0, 1, or 2). During the matrix-matrix multiplication, each column of new friendships selects two columns of Likes T matrix and sums them up into AC matrix (Line 15). In Line 16, Step 2 keeps only 2-/valued elements, i.e. where both Users of a friendship liked the Comment, so both of them are present in the subgraph and the new friendship might merge components. Then, Step 3 produces a row-wise sum using the logical or operator ∨ (Line 17).
Step 4 extracts (c, 1) tuples from the result vector and collects the Comment IDs from these tuples, then Step 5 collects all the Comments which might be affected by the update (Line 19). Cases (1) and (2) are covered by Line 20, which produces the union of all the three cases. The next steps re-evaluate the scores of these Comments (Line 21). comments = comments ∪ comments 3: likes ⊆ comments × users 4: friends ⊆ users × users friend pairs in both orders 5: Likes T ∈ B |comments |×|users |

Tool description
Dataflow-based computational models were proposed to perform complex analytics on high-volume data sets: the timely dataflow [69,70] model targets batch processing, while its extension, differential dataflow [65], targets incremental processing.
We created two differential dataflow-based solutions: .NET-based implementation Naiad 14 is a data processing prototype system developed at Microsoft Research 15 between 2011 and 2014. Naiad supports both the timely and differential dataflow computational models. The Naiad implementation was written in C#.

Rust-based implementation
The computational models of Naiad have been implemented in the Rust programming language 16 [59] as two separate projects: Timely Dataflow 17 and Differential Dataflow. 18 For the sake of brevity, we only discuss the Rust-based implementation here-the one using Naiad is available online but has been omitted from this paper.

Query 1
The implementation of Q1 (shown in Listing 28) uses the comms, posts, and likes collections. In Line 8, we extract the Comments and their direct parent objects, which can either be a Post or another Comment. In Lines 19-13, we determine the direct replies for each Post. Next, in Lines 14-25, we collect all transitive replies for each Post using the iterate operator, which executes the specified function until a stable result set is reached. In Lines 26-31, we collect the number of likes for each Comment, each having a cardinality of 1 (corresponding to their contribution to the 14 https://github.com/TimelyDataflow/Naiad 15 https://www.microsoft.com/en-us/research/project/naiad/ 16 https://www.rust-lang.org/ 17 https://github.com/TimelyDataflow/timely-dataflow. 18 https://github.com/TimelyDataflow/differential-dataflow. overall score). Then, in Line 32, we collect each Comment that belongs to the Post with a cardinality of 10 (as they are worth 10 points). Next, in Lines 33-38, we concatenate the collections containing the likes and the reply Comments and count their total cardinality. Finally, in Lines 39-41, we add the timestamps of each Post and get the top-3 values.

Query 2
The implementation of Q2 (shown in Listing 29) uses the comms, knows and likes collections. First, in Lines 8-25, we use iterative label propagation to find connected components. Initially, we add the labels based on the likes edges and then propagate them across the knows edges. When the propagation reaches its fixed point, we will have a list of (user, label, comment) tuples, where each tuple shows that the given User is part of the connected component of the given

Classification
In this section, we compare the solutions according to our predefined classification criteria, evaluating general tool properties as well as case-specific applications of them. Aligned with the research questions, we used the following classification criteria: Declarative query language To aid the development of the solutions using the respective tools, it is helpful for the tool to adhere to a standard declarative query language. Data model For a usage integrated into a modelling tool set, ideally model query tools should be able to use models created and maintained in a modelling environment as they are, without adapters. Explicitness of incrementalization Ideally, tools are able to incrementalize the query in an implicit manner, i.e. the developer only has to specify the query, but not how changes are propagated or how the query is broken down into chunks. This criterion does not apply to batch solutions, i.e. ones that recalculate the query from scratch after each model change. Persistence To enable a failsafe operation, it is a desirable property for an analysis tool to continue the analysis even if the process is unexpectedly shut down. For this, data structures to save intermediate results have to be persisted, which is why we call this category query persistence. Further, in order to process models beyond the size of the main memory, it is desirable to persist the models in a database such that solutions do not have to have all model elements in memory. We refer to this as model persistence. Parallelism As modern CPUs have multiple cores, it is desirable for a query technology to be able to make use of these resources by running parts of the query or its incremental change propagation in parallel. Asymptotic complexity to propagate changes We present an estimated asymptotic complexity of the steps required to propagate changes. For this, we consider the necessary steps to propagate changes to both queries for all of the five possible changes (Sect. 2.3). The derived complexity values are specific to the particular scenario of the Social Media benchmark and thus should not be interpreted as a generic characterization of the tools.
In the following sections, we discuss the criteria for each of the solutions individually and summarize the results afterwards in Table 4, except for the asymptotic complexities that are depicted in Tables 5 and 6.

Declarative query language
NMF makes use of the SQO (Standard Query Operators), which is a common standard on the .NET platform. Similarly, the Xtend solution uses the standard collection operators of Java. ATL uses (a slightly adapted version of) OCL. The SQL and Neo4j solutions use the standard query languages of these databases: SQL:1999 and Cypher.
EOL used by Hawk is standard in Epsilon, but the community of Epsilon is much smaller than the community of relational databases, .NET or Neo4j.
AOF and YAMTL have their tool-specific query languages which are declarative (both solutions can be executed incrementally from the query specification) but not yet standards.
JastAdd uses its own language to specify synthesized and inherited attributes with tool-specific extensions like circular rules. Data Model: the data model exposed to the user. Engine: programming language used to implement the engine (model transformation engine, database query engine, etc.), Solution: programming language and query language (if applicable) used to implement the solution, Decl.: the solution specified the queries using a declarative query language, Batch: only batch mode is supported, Implicit: implicit incrementalization is supported, Explicit: a solution with explicit incrementalization was implemented, DB: database-backed, i.e. the tool persists the model on disk after each transaction, MV: materialized views, Parallel: parallelization is supported. Notation-⊗ yes; ⊘ to some extent; ◯ no; ⍟ yes, using Java 8 streams The solutions are categorized as batch, implicit incrementalized and explicit incrementalized

Data model
Given that the benchmark was originally presented at the TTC, it is not surprising that most solutions can operate directly on EMF models. NMF uses its own metamodel NMeta but includes a transformation from Ecore and uses the same serialization format [45].
For JastAdd, it is necessary to reformulate the metamodel as grammar.
The solutions for Differential Dataflow, Neo4j, Graph-BLAS and PostgreSQL do not operate on models directly but require dedicated adapters to feed the data from models into their input formats (CSV files).
The solutions are categorized as batch, implicit incrementalized and explicit incrementalized

Explicitness of incrementalization
The NMF reference solution, the ATL solution, the Xtend solution and the batch versions of SQL, Neo4j and Graph-BLAS solutions are not incremental. The incrementalization in the incremental NMF solution and in the AOF solution is implicit: the framework deducts a dependency graph from a model of the code that is available at runtime. This dependency graph is at the level of individual operations, so that only those operations really affected by a change need to be re-computed. However, these frameworks require data structures that explicitly support incrementalization. Both NMF and AOF already contain support for a wide range of common operations and allow Users to add new algorithms and data structures when needed, such as for calculating the connected components in Q2.
In the Hawk solutions, there are two aspects to incrementalization: the way that the persistent graph is updated and the way that the query is run. The batch solution uses derived attributes to cache the scores of Comments and Posts, which are implicitly updated incrementally if the nodes used to compute them change: however, the queries run in batch mode, and every change in the social network requires a full re-serialization to disk before the indexing process can update the graph. The incremental update solution uses the same approach for querying, but it does not re-serialize the social network model: instead, it applies the change sequences directly to the internal graph used for querying. The incremental update and query solution uses an explicitly incrementalized version of the query, using graph listeners to trigger the rescoring of Comments and Posts, and uses the same approach to directly update the underlying graph database from the NMF change sequences.
JastAdd and YAMTL support a completely implicit incrementalization by tracking all read operations to the model from which a dependency graph is set up. If the result of a rule depends on a read operation that is invalidated due to model changes, this attribute instance (JastAdd) or rule instance (YAMTL) is executed again. Therefore, the granularity of the incrementalization is given by the granularity of the attributes/rules. 19 In the incremental versions of the SQL, Neo4j and Graph-BLAS solutions, the incrementality is achieved explicitly. That is, these solutions all keep the scores of Posts or Comments through a table, additional nodes or vectors, respectively. Then, the solutions use explicit algorithms developed for the respective tool to update the scores in case of model updates.
In case of the SQL solution, this effort to explicitly reengineer existing queries for incremental change propagation is guided [38,81]. Such guidance can help the developer incrementalize a given query and could be automated when sufficient tooling becomes available.
The Differential Dataflow solution works similar to NMF and AOF. However, the notion of time during the calculation of the query allows this tools to support temporary state, in particular the iterate call in Listing 29. This temporary state allows to support also more complex cases such as computing connected components in Q2 using only built-in algorithms and data structures [70].
In total, we have nine non-incremental solutions. Post-greSQL, Neo4j, YAMTL-EI and GraphBLAS are explicitly incrementalized, so developers have to approach the problem conceptually differently than in the batch solution. Hawk provides slight implicit incrementalization capabilities, but much of the solution must be provided by the developer in order to get the solution incremental; therefore, we categorized it as an explicit incremental tool. In NMF, JastAdd, AOF, YAMTL-II, Incremental ATL Incremental and Differential Dataflow, the incrementalization happens entirely implicitly, though partially with restrictions. Namely, JastAdd and YAMTL require the developer to adapt to their programming model, whereas NMF and AOF restrict the used operations to those that are supported in the framework, even if this set is extensible. Differential Dataflow does not have such restrictions.

Persistence
Most of the presented solutions keep the data only in memory without any features for fault-tolerance. This means that the solutions require a separate tooling to ensure a failsafe operation.
The exceptions to this are Hawk, Neo4j and PostgreSQL, which make use of database systems internally and are therefore durable, in the sense that models and intermediate query results for their incremental variants are available even if the application gets terminated unexpectedly.
For many of the other tools, model persistence could be achieved through persistence systems such as Eclipse CDO. 20 However, although the model sizes used in the scope of the benchmark went up to a few million model elements and connections, solutions were always able to handle the models in main memory. Therefore, the need for solutions with model persistence to hit the disk for an update led to performance penalties in comparison with other solutions.
The advantage that Hawk and the incremental versions of Neo4j and PostgreSQL take out of the persistence of the intermediate results is that after a restart, they do not require the performance penalty of an initial run again and can process updates straight away after a restart. The exception to that is the incremental update and query solution for Hawk, since this keeps an in-memory list of the top-3 elements. 20 https://www.eclipse.org/cdo/.

Parallelism
Since GraphBLAS is based on matrix operations, it is well suited for parallel processing. Differential Dataflow also supports executing the query in parallel. The Xtend solution makes use of the Java streams feature to enable parallelism, but only on a task level, i.e. the solution runs different parts of the query in a pipeline.
NMF also has some built-in support for parallelism, but it is restricted only to the incremental change propagation. The initial query execution does not profit from the parallel execution.
The other solutions do not make use of parallelism.

Asymptotic complexity to propagate changes
Recall that in Sect.

NMF Batch, Xtend, ATL
The NMF reference solution and the ATL solution are batch implementations where all changes result in a complete recalculation of the analysis. For Q1, calculating the scores of each Post takes Θ( p + c) time, plus Θ( p log p) for the sorting. This amounts to Θ( p log p + c). For Q2, Tarjan's algorithm or a similar algorithm to obtain connected components is executed for each Comment. Tarjan's algorithm has a linear complexity in both nodes and edges where there are at most f 2 edges, leading to a complexity of O(c · f 2 ). However, because Users only need to be considered for Comments they have liked and every like corresponds to exactly one node in the induced graph for the liked Comment, this can be reduced to O(l · f ). Together with O(c log c) for sorting the Comments, this leads to O(l · f + c log c) for Q2. For Xtend, the sorting step is avoided by a dedicated stream operator in both queries. Furthermore, the usage of streams in the Xtend solution theoretically allows to calculate the scores of the individual Comments or Posts in parallel.

NMF incremental, AOF, ATL incremental, differential dataflow
In the incremental NMF solution as well as in the AOF, incremental ATL and Differential Dataflow solutions, the analysis is slightly more complex.  O(log c). Liking a Comment requires to re-compute the connected components for this Comment and updating the score in the search tree, which is O( f 2 +log c) since the connected components are computed from scratch. If two Users become friends, this potentially changes the subgraphs for all Comments, which has a complexity of O(l · f +c log c) just as if we were to recalculate the entire query, though in the average case, many new friendships will likely not affect all Comments.
The parallel execution of NMF and Differential Dataflow does not change the complexity to perform a change, and it is rather used to propagate the changes within the transaction in parallel.

Hawk
In Hawk, the batch update process needs to check the entire model for changes, which clearly is Ω( p + c + u · f ). For incremental updates, the time to perform the graph updates is constant. However, in both cases, the indices need to be re-computed. For Q1, this takes Θ(n) time plus O( p log p) for sorting the Posts as soon as the score for a Post needs to be re-computed, i.e. when adding or liking a Comment. The incremental query eliminates the sorting step, explicitly taking advantage of the monotony of the scores in this scenario.
For Q2, similarly to the incremental NMF and AOF solutions, the insertion of a User or a Post does not have any effect because they do not invalidate the score of any Comment. However, the solution still requires the sorting, which takes O(c log c), unless the solution is run in the incremental query mode (in which case the sorting is replaced by constant effort, using the monotony of the scores). If a Comment is liked, the calculation of connected components is invalidated and recalculated for this Comment, which takes O( f 2 ). A new friendship requires the score of all Comments to be re-evaluated, which is complexity of O(l · f ).

JastAdd
JastAdd tracks accesses to model elements when an attribute (score) of a model element is computed and uses this to invalidate these attributes upon a model change. However, the maximum always has to be computed. Adding a new User or two Users becoming friends in Q1 (changes that do not affect any model feature used in computing the scores) has a complexity of Θ( p). This complexity also dominates the score calculation for a new Post (without Comments). Adding or liking a Comment will invalidate the score for the Post and cause JastAdd to re-compute the score for this Post, which requires a complexity of O(n) in addition to the Θ( p).
For Q2, finding the maximum takes Θ(c) and for inserting a User or Post, that is the overall complexity because the new User or Post is not being read in any score calculation. A new Comment only requires to calculate the score for that Comment in addition, which takes only constant effort. Liking a Comment will invalidate the score for this Comment, which again takes O( f 2 ) and a new friendship takes O(l · f ) as this friendship may affect the score for many Comments.

YAMTL
In the YAMTL solutions, only the three best candidates are retained and this simplifies the sorting step, which becomes constant. Discarding a candidate has constant time, and when the candidate is valid, it takes up to two comparisons to update the list.
In Q1, YAMTL-B calculates the scores by traversing all Posts and then all of their Comments, O( p + c). When adding Comments or likes, YAMTL-II re-computes the score for each matched Post affected by a change, traversing all of its contained Comments, O(n). Instead, YAMTL-EI only processes the new Comment that has been added to the Post or that has been liked by another User for each impacted Post, so the update is O(1) in those cases.
In Q2, for each Comment, Sedgewick and Wayne's weighted quick union find with path compression initializes components in linear time, O(l · f ), where the initial size of components is given by l. Find and union operations have an amortized cost that corresponds to the inverse of Ackermann's function, O(α(l)). YAMTL solu-tions match Comments instead of Posts in Q2. For computing a Comment score, YAMTL's solution applies the union operation for any friend that has liked the Comment, so processing each Comment involves O(l · f · α(l)). In propagation mode, YAMTL-B initializes the data structures for the weighted quick union find with path compression for Users l in each Comment, and the cost is O(l · f ·α(l)+c). YAMTL-II visits the Comments affected by changes and initializes the connected components of friends from scratch, so the cost is O(l · f · α(l)) when adding a like and when adding a friendship. YAMTL-EI is sensitive to finer changes, adding a like by a new User triggers the update of connected components with the friends f of the new User, O( f · α(l)), whereas a new friendship only involves computing the union of two components, O(α(l)).

PostgreSQL batch
Compared to solutions working on object models, an RDBMS has the problem that references to other models are not directly available as object references but usually lead to an indirect reference that must be resolved using an index. The choice of this index and also the exact query strategy is generally specific to the RDBMS and may differ from our estimation here. In the remainder, we assume search trees are used that resolve such a reference in log time. This allows to manage data beyond main memory limitations but is of course slower than following a direct reference.
For Q1, the root Post reference for each Comment is computed first in Ω(c · log c), which stands for the cost of the first self-join of Comments when computing the transitive closure. Then Posts are outer joined with their Comments in Ω( p · log c + p + c) steps to produce an interim result size proportional to p + c, which is then outer joined with their likes in Ω(( p + c) · log l + p + c + l) and the count of likes has to be computed in Ω(l) steps. In-memory solutions, by contrast, can easily read the count as it is usually directly available for array list implementations of collections. Thus, the batch implementation has a complexity of Ω(c · log c + p · log c + ( p + c) · log l + p + c + l), which can be further simplified to Ω(( p + c) · log cl + l). 21 In Q2, we first compute the transitive closure of the friendship subgraphs 22 in Ω(l · log l f l + l + f + l), which stands for the cost of the first recursive step. The size of the transitive closure is Ω(l), which is fed as input to the subsequent aggregations that can be done in linear time w.r.t. their input size. The result of the aggregations has again a size of Ω(l), and as it exists only in memory, no index is available. Thus, the final join to Comments is assumed to be done using hash join with a complexity of Ω(c + l). Summing up the former gives the batch implementation a complexity of Ω(l · log l f l + l + f + l + l + c + l), which can be simplified to Ω(l · log l f + l). 23

PostgreSQL incremental
For Q1, any score update of a Post in the incremental SQL solution yields a complexity of Θ(log p). However, adding a new User or friendship does not cause a score update and has therefore constant effort. Apart from that, adding a Post only requires to left join an empty table with constant effort and migrating the Post to the new partition, which we assume is constant effort. Adding a Comment requires to resolve its Post with a complexity of Θ(log p + log c), adding a like causes Θ(log p · log c). For Q2, the result-retrieval is nontrivial even in the presence of an index and has an effort of Ω(l · f ). Adding a User, Post or Comment changes neither friends nor likes relations, which is why all join operations to update the comment_friends relation in stages 0 and 1 inner join an empty table and thus become constant in the maintenance phase. Computing stage 2 and inserting into the comment_friends relation require to process all O(c 2 f 2 ) entries of stage 2. Liking a Comment or a new friendships adds an overhead to the computation of stage 1 and stage 2 (possibly recursively), but we assume this effort to be dominated by O(c 2 f 2 ).

Neo4j incremental
The Neo4j solution saves the score as a property on Post and Comment nodes and indexes these properties using B-trees. 24 For Q1, this implies that inserting a new Post necessitates a cost of O(log p) (maintaining the index on the score property). Adding a Comment or a like can result in an increase of the score of the root Post, which costs O(h) for finding the Post and O(log p) to update the score. For Q2, the incremental solution materializes each Comment's connected components which need to be maintained upon inserting a new likes edge. The non-trivial operations are as follows. When a Comment is initially inserted, it has a score of 0. Maintaining the index on the score property incurs a cost of O(log c). When a User usr adds a new like to Comment cmt, the solution merges all connected components of cmt containing friends of usr with a cost of O(u · f ). Additionally, maintaining the index on the score property of the Comments has O(log c) cost. Finally, adding a new friends edge necessitates checking which components of which Comments can be merged, incurring a cost of O(c 2 ) and maintaining the index for cost O(log c).
Apart from this, since Neo4j indexes store identifiers of nodes in search trees and checks uniqueness constraints, adding a User requires an effort of Θ(log u), adding a Post Θ(log p) and adding a Comment Θ(log c). Liking a Comment requires at least Θ(log u + log c) and Users becoming friends at least Θ(log u) cost.

GraphBLAS
For GraphBLAS, stating algorithmic complexities for query calculation is difficult because the engine automatically chooses from a variety of algorithms for matrix multiplication given that the matrices are usually very sparse. For example, the adjacency matrix to calculate the Comments of a root Post in the Q1 batch version is sparse because Comments only belong to one root Post. A naïve implementation of calculating this sum would therefore take an effort of O( p · c), but this analysis does not take into account the optimizations that GraphBLAS applies to this computation. Therefore, we do not provide results for asymptotic complexity for GraphBLAS.

Summary
The asymptotic complexities for Q1 are summarized in Table 5. The table is divided into solutions that are not incremental at all at the top (including non-incremental executions of tools that are able to implicitly incrementalize the query), those that are implicitly incremental and derive the algorithm to propagate incremental changes from a declarative specification in the middle, and explicitly incremental solutions at the bottom.
One can see in Table 5 that whereas the non-incremental tools have a linear complexity with respect to the number of Posts and Comments, many incremental tools have a better asymptotic complexity and therefore should scale much better in presence of changes as they have constant or logarithmic update efforts. However, there is a notable difference between NMF and AOF, which take a rather fine-grained approach to incremental change propagation, compared to JastAdd, Hawk and YAMTL that rely on chunks in the form of (derived) attribute or rule calculations. While the latter can speed up recalculation to constant times in some cases, the former approaches can get down to logarithmic efforts even when Comments are added or liked as they allow partial recalculation of the score of a Comment. These complexities also could be beaten by solutions where the incremental change propagation is explicitly specified by the developer, at least using the approaches studied.
The results for asymptotic complexities for Q2 are summarized in Table 6. The solutions are ordered in the same way as in Table 5. While for adding Users, Posts or Comments, the results look very similar to Q1 with incremental approaches achieving a strictly better asymptotic complexity, the results for liking a Comment or especially Users becoming friends are not as good. This is partially because the analysis of asymptotic complexities is not very detailed, 25 but it is also a consequence of the effect that these changes simply have a large impact on the computations that get invalidated. A new friendship changes the connected components in the induced subgraphs of the Comments that both Users have liked.

Performance evaluation
In this section, we present the results of the performance measurements with respect to the time required to load the models, to run the initial query evaluation, and to propagate changes. We first present the benchmark setup and experiment design and then analyse the results. Finally, we discuss potential threats to validity.

Input models
We executed the benchmark on models of increasing sizes, denoted by scale factors (SFs) of power of 2. The number of elements per node/edge type in each SF is shown in Table 7. The largest model has 0.86M nodes and 2.25M edges. The number of changes varies between 45 and 132 model elements.

Benchmark framework
The benchmark framework is based on the one of the TTC 2017 Smart Grid case [48] and supports automated build and execution of solutions as well as a correctness check and visualization of the results using R. The correctness is checked by comparing the query result against a pre-computed reference both after the initial transformation and after each update. The source code and documentation of the benchmark as well as metamodels, solutions, input models and change sequences used for benchmarking are publicly available online. 26 The benchmark repository also contains instructions on how to 25 A more detailed analysis might take the maximum number of Comments a User has liked into account, which is certainly smaller than c. 26 https://github.com/TransformationToolContest/ttc2018liveContest.  Type\scale factor  1  2  4  8  16  32  64  128  256  512  1024   Comments  640  1064 2315 5056  9220  18,872 39,212  76,735  148,470 273,418  540,905   Posts  554  889  1845 2270  5518  10,929 18,083  37,228  74,668  167,299  314,510   Users  80  118  190  204  394  595  781  1158  1678  2606  3699   Total number of nodes  1274 2071 4350

Benchmark environment
We ran the solutions on a cloud virtual machine with 8 cores of an Intel ® Xeon ® Platinum 8167M CPU, a base clock speed of 2.0 GHz and a turbo clock speed of 2.4 GHz. Hyper-Threading was turned off. The machine was running the Ubuntu 20.04.2 LTS operating system. To help reproducibility, the experiments were executed in containers managed by Docker 20.10.6. The runtime environments for Javaand .NET-based solutions were OpenJDK 1.8.0 update 282, OpenJDK 11.0.10 and .NET Core 3.1.14. Each tool was measured 5 times, and the geometric mean of the results is used. The timeout value for each run was set to 10 minutes. The modelling tools used XML-based representations (e.g. XMI files) to load the data and the change set sequences. For the rest of the tools (Neo4j, GraphBLAS, PostgreSQL and Differential Dataflow), both the initial model and the change sequences were loaded from CSV files.

Analysis
We grouped execution times by query, tool family and phase. To save space, we do not show the exact results, but these can be obtained from the GitHub repository of the benchmark. From the phases listed in Sect. 2.4, we omitted the initialization phase and kept the other three, i.e. (1) loading the models, (2) the initial run and (3) the time to apply a set of changes and update the result accordingly. In the following, we compare the solutions across tool boundaries. For each family of solutions, we only include the results for the 27 https://hub.docker.com/r/ftsrg/ttc2018/tags. best variant. A comparison of the solution variants for NMF, Hawk, JastAdd and YAMTL can be found in the appendix.

Batch solutions
The results for all batch tools are depicted in Fig. 9. One can immediately see that most lines in all of these diagrams have approximately the same slope. This confirms that query execution times grow approximately linearly with the size of the models.
The results for the initial execution times and the times to update the graph are also very similar. This is clear, given that after changing the model, the batch solutions all evaluate the entire query again. Particularly for small models, there is an overhead attached with the first execution, which is why the execution times for updates are slightly smaller. This overhead differs between the tools and is constant, thus hard to perceive on a log scale.
For the update times, one can see clear distances between the graphs, especially for Q1 where plotted execution times appear almost parallel, indicating constant factors, i.e. the solutions have different execution speeds but very similar scalability characteristics. The GraphBLAS batch solution outperforms all other batch solutions by more than one order of magnitude. JastAdd and YAMTL that come next are still faster than NMF by a factor of 3 and faster than the remaining by at least an order of magnitude for Q1.
For Q2, many solutions had severe performance problems which is why they did not complete for larger model sizes. One can also see that in particular for the SQL solution and the JastAdd solution, the slope is steeper for the larger model sizes, indicating the chosen algorithm implementation does not scale equally well.
For Hawk, one can see how the choice of the underlying database implementation affects performance: the perfor-  Execution times of solutions with explicit incremental evaluation mance using SQLite is mostly better, which is clear given that the entire models for all sizes fit into main memory.

Implicit incremental solutions
For the implicitly incremental solutions, the results depicted in Fig. 10 look very different. While one can see a slope in the load times and in the initial query evaluations, the times to propagate a change sequence appear constant across model sizes for all solutions except JastAdd. These solutions require few milliseconds or less to propagate the changes even for the largest input model sizes. 28 For JastAdd, the execution times for an update have the same slope as for the initial execution, so the incremental change propagation cannot reduce the complexity but allows a constant speedup (or even slowdown, cf. Sect. A.2.3).
Because the Differential Dataflow solution was the only solution in this category that is not using a modelling framework underneath but uses plain CSVs as inputs, it is faster in loading the models by multiple orders of magnitude. This is due to the fact that the parsers used in the modelling tools (EMF or NMF) both work with a much more diverse set of inputs than CSV and therefore have a much higher complexity. Furthermore, both modelling frameworks induce an overhead to the model elements.
With regard to the initial execution, one can see that the slope for AOF and the incremental ATL solution is smaller than for the others. Apparently, AOF has a bigger constant overhead that gets less important as models grow. In contrast, the NMF solution has an initial time that is more than an order of magnitude slower than the other solutions. Apart from this, one can observe that Differential Dataflow, JastAdd and YAMTL are the fastest in the initial run.

Explicit incremental solutions
Surprisingly, the results for the explicitly incremental tools depicted in Fig. 11 are worse than for the implicitly incremental tools in the sense that they all have a slope, i.e. the execution times to update the query results grow with growing model sizes. The only exception here is YAMTL, where the incrementalization itself is obtained implicitly but is manually and explicitly tuned. For GraphBLAS, this seems not critical for the model sizes benchmarked because the solution is faster than the others by multiple orders of magnitude and the time to propagate the changes is still below the tenth of a second. For all the other solutions, however, the scalability is much worse than for most of the implicit incremental This result is surprising because one would expect that the incrementality of solutions, i.e. to which extent the update times depend on the model sizes depend on the input size, would be better if the solutions are explicitly developed for this use case. However, the results imply the opposite. This is because the implicit incremental tools create very finegrained dependency graphs that allow implicit incremental tools to keep the effort of propagating a change to a minimum, whereas the explicit solutions apply rather coarse-grained schemes and need to re-evaluate larger parts of the query.
For Hawk, the differences between the Neo4j and the SQLite backend are more severe than in the batch version. While Neo4j is slower, particularly in the Update phase, it is capable of processing the largest model, while the SQLite backend runs out of memory.

Selected incremental tools
To allow a better comparison between the implicitly and explicitly incremental solutions, we plotted a selected subset of the incremental tools in Fig. 12. The results show again large differences in the runtime for the initial query computation of about three orders of magnitude. The load times show the differences between solutions that operate on plain CSVs and those that use a modelling framework or have to set up a database.
The results for the change propagation just confirm what we found in the previous sections: while the explicitly incrementalized solutions have a slope, indicating that the time for updating the results grows with the size of the models, the implicitly incrementalized solutions essentially ignore the size of the models when propagating updates. Furthermore, while the times of the implicitly incremental tools are all within one order of magnitude, the times for the explicitly incremental tools differ significantly and are generally much worse, again with the exception of GraphBLAS. 29

Internal threats to validity
Query Selection The queries used in the benchmark are artificial. They have been created by the first author in such a way that they would represent typical queries. The design goal of the second query in particular was to include some kind of graph algorithm in order to evaluate the flexibility of the tools on a concrete example. We chose the calculation of connected components because it is a well-studied algorithm and was applicable for the chosen scenario.
Technologies There is a difference of the used languages, runtimes and technologies because the solutions use different modelling approaches. Therefore, differences in response times may be due to the difference of the used framework instead of the difference in the used incremental tool. Clearing this effect from the measurements would necessitate re-implementing both the solutions and the underlying tools (in many cases large libraries or even database management systems) in another technology stack, an overhead which is not justified by this confounding effect. Further, users will likely choose a tool that actively supports the modelling approach they are using.

Tools versus solutions
The paper claims to compare tools where we in fact compare solutions using these tools. However, the solutions have been created by the authors of the tools or at least experienced developers. Therefore, we think that it is realistic to assume that the solutions are optimal for the provided tool. However, this optimum may vary between solutions. A good example is the calculation of the top-3 elements from a collection with scores. Some tools realize this by taking the first elements of a sorted collection, because keeping the sorted collection speeds up the following calls and this way, the analysis is very readable. Others calculate the 3 maximal elements because the sorted collection would be discarded and is therefore not efficient.
Noise during measurements Due to repetition of measurements, we think that the influence of garbage collection and just-in-time compilation is much smaller than the observed differences between incremental and non-incremental tools. However, we ran our benchmarks on virtual machines in the cloud, where we could not control the resource isolation from other tenants.

Different input formats
To account for the format of model change sequences, we reproduced the change format also in EMF and generally took the serialization, deserialization and conversion of model changes into the tools' respective native formats out of the time measurements.

External threats to validity
Generalizability It is unclear to what degree the obtained results can be generalized for other applications, input model characteristics and change sequences. Further, it is unclear to what extent the solutions represent the tools, even though many solutions have been implemented by tool authors. The analysis of the algorithmic complexities shows that these complexities heavily depend on the query, the characteristics of the inputs and the change sequences and all of these will be different in other applications. However, we expect that observed difference between the orders of magnitude of the update times for incremental vs. non-incremental tools is (to some degree) representative to real-world use cases which run global graph queries on continuously changing graph models.
Limited types of changes Though the change sequence used in the various case studies has been generated, they depend on the selection of changes and their proportion. In particular, we only considered incremental changes, i.e. additions of Posts, Comments or friendships. We did not consider decremental changes such as deleting Posts, Comments or breaking friendships. Those potentially require additional change propagation rules that did not have to be considered in the scope of this paper. In the scope of a benchmark, it is important to exactly specify the possible changes to the inputs because the presence of decremental changes can have an impact on the choice of algorithms to implement the change propagation [46], so the possibility of decremental changes has an impact even if they do not actually occur.

Related work
In this section, we review related work. We begin by reviewing studies that compare existing tools of related fields in Sect. 6.1. In Sect. 6.2, we then review existing incremental tools that we have not so far taken into account for the comparison.

Comparative studies
The present paper is an outcome of TTC 2018, but this is not the first time that the TTC has considered incremental queries. The Train Benchmark [82] by Szárnyas et al. compared query technology based on an example metamodel and queries motivated by the railway domain. A separate TTC version of this benchmark also exists that focuses more on modelling tools [83]. Both versions of this benchmark have in common that they use homogeneous change sequences. This means that solutions generally only have to react to one kind of model changes that will always affect the query result. In the benchmark presented in this paper, we use heterogeneous change sequences independent of the query, as we think that this is more realistic. Furthermore, we added a query where pure querying technology is not sufficient.
Apart from the Train Benchmark, there have been a number of other TTC contests in the recent years that took incremental change propagation into account. The TTC 2016 live contest was about a meta-transformation to transform an abstract dataflow model into an executable model transformation, with the goal to achieve an incremental change propagation as well. However, no change sequences were provided and there was only one incremental solution. 30 Similarly, the TTC 2017 Smart Grids case [48] featured a query joining information from two models, if possible with incremental change propagation. However, only two solutions were submitted [47,72] from which only one supported incremental change propagation. The 2017 Families to Persons case [6] focused more on bidirectional change propagation but also considers incremental change propagation (in both directions). In contrast to the present social media benchmark, input and output models are isomorphic in the Families to Persons case. A complete list of the TTC cases up to 2018 can be found in [81, p.112].
A related field to incremental computation is reactive programming, where the goal is to get notifications for changes. An overview of 15 languages for reactive programming was created by Bainomugisha et al. [7]. Reactive programming approaches are built upon an important assumption, namely that signals do not change once they are processed. That is, they operate on a (potentially infinite) sequence of immutable data. This is a contrast to model analysis tools where the model usually has an approximately fixed size, but is mutable.
In the Social Media case, this difference boils down to the question whether one looks at the events that enter the system such as adding a new User, adding a friendship, etc., or whether the state of the system is looked at in its entirety. Model-driven tools make the system state explicit (in a model); meanwhile, reactive programming approaches make it rather implicit. We think that the question which of these is better highly depends on the application scenario.

Incrementality
Incrementality is a desirable property as it promises to save computational effort when analyses are computed repeatedly. Therefore, it has been a subject of research for decades [75], for example, with the search for incremental compilers [76]. Common to all of these approaches is that they take advantage of assumptions they make on the computation to perform at the cost of limited applicability.
The approach by Reps [77] for attribute grammars is among the first incrementalization systems, which specialized on a limited class of problems. This approach works by using a static dependency graph for attribute evaluations for which Reps has shown that an optimal-time re-evaluation strategy can be found by re-evaluating the attributes in a topologically sorted order of a static dependency graph. This approach rests on the assumption that the data processed by the attribute grammar are immutable. As Reps applies this technique for parse trees, this assumption is reasonable, but it does not hold for models in general. The JastAdd tool in our comparison uses this technology.
Pugh and Teitelbaum [73] then applied memoization to incremental computation. Memoization is applicable to any referential transparent function (such as getting references to the three topmost elements of Q1 or Q2 is) but rests on the assumption that the data structures it operates on are immutable-an assumption not met with models. Immutable data structures cannot represent cyclic data structures natively and therefore make it difficult to create analyses requiring them. In the Social Media benchmark, we have various cross-references between Users to represent their friends. Further, in general it is unclear which functions should be memoized to actually get a performance benefit.
Acar and others created Self-Adjusting Computation (SAC), a framework to support the development of incremental programs [2] using the then newly introduced DDGs. A good overview on SAC is provided by Acar [3]. The rough idea is to memoize the computation made for a given analysis. Closely related, Hammer and others introduced the idea of demanded computation graphs, implemented in Adapton [40,41]. Demanded computation graphs make sure that a change propagation is only performed if the result is actually needed. Both for SAC and Adapton, the modelling technology currently is a big obstacle as they are implemented in programming languages that do not have good support for models. 31 We therefore could not take these approaches into account.
A popular approach to specify queries, especially in graph transformation, is Graph Patterns. Bergmann et al. have created IncQuery, an incremental pattern matching system for Graph Patterns [10,11]. This approach uses a Rete network [32], a static dependency graph, whose nodes are primitive filter conditions or joins of partial pattern matches. Each node represents a set of (partial) pattern matches. This approach can support mutable models because the notification API of models can be used to determine when matches must be revoked or new matches arise.
In relational databases research, incrementalization manifests in the topic of incremental view maintenance [14]. An overview on the research can be found in [24,39,79]. However, we are not aware of an open-source relational management system that implements any of these techniques.
Other approaches to incremental computation include entirely new programming models that allow an easy incrementalization or parallelization. An example of these approaches is revision-based computing [21].

Conclusion
We have presented a simple benchmark for incremental graph queries and demonstrated how it can be implemented on 11 tools from different technological spaces (modelling tools, databases, graph analytical frameworks). The results allow us to reason on the questions raised in the introduction.

Does the tool fit into my technology space?
A complete overview of the tools considered can be found in Table 4. Most solutions work directly with EMF models. NMF has its own modelling framework but claims compatibility of the serialized models [45]. The solutions in Differential Dataflow, GraphBLAS, Neo4j and PostgreSQL implement conversions from EMF to their native formats.

Is it useful to rely on the incrementalization of a tool or is it better [...] to implement change propagation explicitly?
The analysis of asymptotic complexities in Sect. 4.6 has foreshadowed what could be confirmed by actual performance measurements on realistic graph instances in Sect. 5: in the present benchmark, several implicit incrementalization tools were able to keep up and even outperform not only solutions developed for batch execution but also solutions that have been developed explicitly to be incremental, sometimes by multiple orders of magnitude for propagating changes. Just by specifying the query in the format of those tools, developers can gain a performance improvement for incremental change propagation that seems very hard to beat otherwise and gain that essentially for free. Only the GraphBLAS solution without the performance penalty of a modelling framework was faster, but it required the developer to rephrase the problem with matrix multiplications and explicitly deal with incrementality. This is complex, errorprone and hence expensive to develop even for such simple queries. Further, it is in contrast to the implicit solutions, in particular when their front-end language matches common standards such as OCL, EOL, SQO or Java Collections queries.
The performance results demonstrate how the explicitly incrementalized solutions, which use dependency structures on a much coarser granularity, save only few intermediate results and are slower on average than the implicitly incrementalized solutions. This is because every type of change requires dedicated code to update intermediate results. The implicit tools create a much more fine-grained dependency graph that allows to reach better asymptotic complexities and better runtimes. The only exception here is YAMTL-EI, because it is essentially a tuned version of YAMTL-II and therefore uses the same granularity.
The results also show that the implicit incremental tools NMF, AOF, ATL, YAMTL-II and Differential Dataflow have similar performance characteristics (with Differential Dataflow in front due to not having the overhead of a modelling framework), so the choice which of them to use is largely a matter of the context where the problem needs to be tackled (modelling framework, programming language, etc.).
How long does it take to recover from an application crash? The implicit incrementalization tools we compared in the scope of this benchmark do not support query persistence (cf. Sect. 4.4) or currently cannot (yet?) catch up with dedicated solutions where the queries have been explicitly designed for change propagation, even when being slightly bent towards the specific problem (the incremental query version of Hawk). The explicitly incrementalized durable solutions in SQL and Neo4j are more difficult to understand but are performant, making it an interesting future research topic to generalize these kind of solutions and hide their complexity behind a high-level frontend language.
For relational databases, the lack of an out-of-the-box solution for incrementalization is particularly surprising, given the huge body of research in incremental view maintenance [24,39]. Their advantage in the scope of this benchmark would have been clear as the incremental SQL solution is much harder to understand than its batch counterpart. However, these ideas did not yet make their way into the common or open-source database management systems.
How much development effort will be necessary to implement change propagation? Of course, the development effort to implement change propagation heavily depends on the tool and how experienced the developer is with the tool. Because the solutions in this benchmark have been developed by different authors, we did not even attempt to collect data about the development effort necessary. However, the discussion in Sect. 4.3 shows that the magnitude of development effort is quite different for the tools, in particular depending on whether the incrementalization happens implicitly or requires explicit implementation. While for JastAdd and YAMTL (in the implicitly incremental solution), no changes are necessary at all and tools like NMF or AOF only require to implement extensions (which can be shared among multiple projects, so that maybe one day comprehensive libraries exist), other tools such as in particular GraphBLAS, Neo4j and SQL require a completely different approach for supporting incremental change propagation.
How does it scale? The analysis of complexities and the performance results on actual hardware gives an impression of the scalability of the tools for the benchmark queries. As discussed in Sect. 5.3, this does not mean that the same scala-bility is reached on a different use case. Rather, the discussion in Sect. 4.6 unveils what is actually happening under the covers for a given change and this discussion could be adapted for other use cases as well. The actual performance results then show the implementation constant for the tools in comparison to each other, which likely hold also for other use cases. Thus, the results can be used to estimate the scalability also in other contexts.
Can I speed it up by adding more CPU cores? The parallelism support of the tools compared in this paper is discussed in Sect. 4.5. The results show that while the parallelism for GraphBLAS and Differential Dataflow does bring performance advantages, these are not present (yet?) for NMF or YAMTL.
Is the tool extensible [...]? When originally selecting the queries for the benchmark, we expected that the calculation of connected components could not be handled natively by any approach so that solutions had to prove the extensibility of the tools. However, although this strategy has worked for a number of tools, it has not worked for all of them as some tools (Differential Dataflow, JastAdd to some degree) could indeed handle also Q2 without an extension of the tool. For the database solutions, algorithms such as calculating connected components usually have to be developed from scratch, using relational algebra.

Future work
We have future plans to extend the Social Media benchmark, both by covering more tools and algorithms in our experiments as well as by making the benchmark more challenging. To cover more techniques, we plan to include modelling tools such as Viatra [85], relational databases which support intra-query parallelism such as the DuckDB embeddable analytical database [74], and make use of recently developed connected components algorithms [15]. Finally, we plan to incorporate change sequences that include deletions [87] which is expected to make it more challenging to perform the incremental maintenance of the queries and the connected components algorithm.   --start with the users that liked a specific comment. 7 --They are the nodes of the projected users graph for a comment 8 SELECT l.commentid, l.userid AS head_userid, l.userid AS tail_userid 9 FROM likes l 10 UNION 11 --expand the closure with the edges of the projected graph, 12 --which is stored in comment_friends - Figure 14 contains the proof for the relational algebra formula for the join of relations with a positive delta (change set).  b_tuples ← ∅, k ← 1 7: for all (i, j) ∈ es do 8: if i < j then 9: b_tuples ← b_tuples ∪ {(i, k, 1), ( j, k, 1)} 10: k ← k + 1 11: end if 12: end for 13: B → b_tuples 14: return B 15: end function Figure 15 shows the results for the different variants of the NMF solution. The results show that the difference between the standard incremental mode and the transactional mode is marginal, i.e. the engine could not take advantage of propagating all changes at once instead of propagating each change separately. This is because the changes affect different parts of the dependency graph. However, executing these propagations in different threads also does not lead to speedups because there is usually one change propagation dominating the others. Instead, we see that the additional overhead of synchronization makes the parallel mode's change propagation slightly slower in this case. Figure 16 shows the execution times of all different variants of the Hawk solution. Unfortunately, it appears that the initial load process did not complete within the timeout past model size 64 across the solutions: this is due to the use of monolithic single-file models in the benchmark framework, whereas Hawk is optimized towards models which have been fragmented into many files. The results show that the initial load is somewhat slower for the batch and incremental update (IU) solutions, as they need to calculate the derived attributes used to cache scores. These derived attributes allow the initial execution of the query to be much faster in the batch and IU modes than in the incremental update and query (IUQ) mode: the IUQ mode has to do a first full execution of the query. The updates are the slowest in the batch solution due to the need to both recalculate derived attributes and reserialize the changed social network model between updates. The IU mode has slightly faster updates, as it skips the reserialization and instead applies the changes directly to the graph used for querying. The IUQ mode speeds this up even further by replacing derived attributes with graph listeners, using them to detect changed comments and posts and re-score them. Figure 17 shows the execution times of all different variants of the JastAdd solution. The results show that the approaches using bidirectional relations (Relast variants) are strictly better than their counterparts, both in terms of initial execution and time to propagate updates. As can be expected, the incremental variants are also slower than the batch variants in the initial query computation. However, because the JastAdd solution needs to sort the results also in the incremental case which dominates the complexity, the performance results for the incremental variants are not significantly faster than the batch variants. With unidirectional references as in the original metamodel, the change propagation in the incremental execution is even slower than the batch variant. Figure 18 shows the execution times of all YAMTL solutions. They indicate that the performance improvements of the YAMTL-EI solutions over the purely implicit variant are rather small, except for Q2 where the time for an update propagation could be further reduced.