Skip to main content

Improving query performance on dynamic graphs

Abstract

Querying large models efficiently often imposes high demands on system resources such as memory, processing time, disk access or network latency. The situation becomes more complicated when data are highly interconnected, e.g. in the form of graph structures, and when data sources are heterogeneous, partly coming from dynamic systems and partly stored in databases. These situations are now common in many existing social networking applications and geo-location systems, which require specialized and efficient query algorithms in order to make informed decisions on time. In this paper, we propose an algorithm to improve the memory consumption and time performance of this type of queries by reducing the amount of elements to be processed, focusing only on the information that is relevant to the query but without compromising the accuracy of its results. To this end, the reduced subset of data is selected depending on the type of query and its constituent filters. Three case studies are used to evaluate the performance of our proposal, obtaining significant speedups in all cases.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Note that we used the in-memory implementation for OrientDB and the BerkeleyDB backend [28] for JanusGraph.

  2. 2.

    Even if CrateDB is not a graph database, we included it in our study because of its high scalability.

  3. 3.

    A TraversalParent in Gremlin includes steps that imply one or more subqueries, namely where, and, or and not.

  4. 4.

    The SDR algorithm adds an initial graph step at the beginning of a traversal subquery. For this reason, a traversal subquery always has one more step than its size, i.e. S.size = 4 in this case.

References

  1. 1.

    Acharya, S., Gibbons, P.B., Poosala, V.: Congressional samples for approximate answering of group-by queries. In: Proc. of SIGMOD’00, pp. 487–498. ACM (2000). https://doi.org/10.1145/342009.335450

  2. 2.

    Agarwal, S., Panda, A., Mozafari, B., Iyer, A.P., Madden, S., Stoica, I.: Blink and it’s done: interactive queries on very large data. PVLDB 5(12), 1902–1905 (2012). https://doi.org/10.14778/2367502.2367533

    Article  Google Scholar 

  3. 3.

    Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J.L., Vrgoc, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv. 50(5), 68:1–68:40 (2017). https://doi.org/10.1145/3104031

    Article  Google Scholar 

  4. 4.

    Apache Spark: Spark streaming programming. https://spark.apache.org/docs/latest/streaming-programming-guide.html. Accessed May 2019

  5. 5.

    Apache Spark: GraphFrames. https://graphframes.github.io/graphframes/docs/_site/index.html. Accessed Nov 2019

  6. 6.

    Apache TinkerPop: The Gremlin graph traversal machine and language. https://tinkerpop.apache.org/gremlin.html. Accessed Jan 2020

  7. 7.

    Babcock, B., Chaudhuri, S., Das, G.: Dynamic sample selection for approximate query processing. In: Proc. of SIGMOD’03, pp. 539–550. ACM (2003). https://doi.org/10.1145/872757.872822

  8. 8.

    Barceló, P.: Querying graph databases. In: Proc. of PODS’13, pp. 175–188. ACM (2013). https://doi.org/10.1145/2463664.2465216

  9. 9.

    Barquero, G., Burgueño, L., Troya, J., Vallecillo, A.: Extending complex event processing to graph-structured information. In: Proc. of MODELS’18, pp. 166–175. ACM (2018). https://doi.org/10.1145/3239372.3239402

  10. 10.

    Barquero, G., Troya, J., Vallecillo, A.: Trading accuracy for performance in data processing applications. J. Object Technol. 18(2), 9:1–9:24 (2019). https://doi.org/10.5381/jot.2019.18.2.a9

    Article  Google Scholar 

  11. 11.

    Barquero, G., Troya, J., Vallecillo, A.: SDR algorithm git repository. https://github.com/atenearesearchgroup/SDRalgorithm. Accessed Jan 2020

  12. 12.

    Barquero, G., Troya, J., Vallecillo, A.: SDR algorithm website. http://atenea.lcc.uma.es/projects/SDRAlg.html. Accessed Jan 2020

  13. 13.

    BBVA: The impact of the Mobile World Congress in a dynamic visualization by BBVA and CartoDB (2013). https://www.bbva.com/en/impact-mobile-world-congress-dynamic-visualization-bbva-cartodb/. Accessed Jan 2020

  14. 14.

    Bergmann, G., Horváth, Á., Ráth, I., Varró, D., Balogh, A., Balogh, Z., Ökrös, A.: Incremental evaluation of model queries over EMF models. In: Proc. of MODELS’10, pp. 76–90 (2010). https://doi.org/10.1007/978-3-642-16145-2_6

  15. 15.

    Bergmann, G., Ökrös, A., Ráth, I., Varró, D., Varró, G.: Incremental pattern matching in the VIATRA model transformation system. In: Proc. of GRAMOT’08, pp. 25–32. ACM (2008)

  16. 16.

    Besta, M., Fischer, M., Kalavri, V., Kapralov, M., Hoefler, T.: Practice of streaming and dynamic graphs: concepts, models, systems, and parallelism. CoRR arXiv:1912.12740 (2019)

  17. 17.

    Besta, M., Peter, E., Gerstenberger, R., Fischer, M., Podstawski, M., Barthels, C., Alonso, G., Hoefler, T.: Demystifying graph databases: analysis and taxonomy of data organization, system designs, and graph queries. CoRR arXiv:1910.09017 (2019)

  18. 18.

    Callidus Software Inc.: OrientDB. The database designed for the modern world. https://orientdb.com/. Accessed June 2020

  19. 19.

    Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.R.: Overcoming limitations of sampling for aggregation queries. In: Proc. of ICDE’01, pp. 534–542. IEEE Computer Society (2001). https://doi.org/10.1109/ICDE.2001.914867

  20. 20.

    Chaudhuri, S., Das, G., Narasayya, V.R.: A robust, optimization-based approach for approximate answering of aggregate queries. In: Proc. of SIGMOD’01, pp. 295–306. ACM (2001). https://doi.org/10.1145/375663.375694

  21. 21.

    Chaudhuri, S., Ding, B., Kandula, S.: Approximate query processing: no silver bullet. In: Proc. of SIGMOD’17, pp. 511–519. ACM (2017). https://doi.org/10.1145/3035918.3056097

  22. 22.

    Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012). https://doi.org/10.1145/2187671.2187677

    Article  Google Scholar 

  23. 23.

    Etzion, O., Niblett, P.: Event Processing in Action. Manning Publications, New York (2010)

    Google Scholar 

  24. 24.

    Fan, W., Geerts, F., Cao, Y., Deng, T., Lu, P.: Querying big data by accessing small data. In: Proc. of PODS’15, pp. 173–184. ACM (2015). https://doi.org/10.1145/2745754.2745771

  25. 25.

    Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010). https://doi.org/10.14778/1920841.1920878

    Article  Google Scholar 

  26. 26.

    Fan, W., Wang, X., Wu, Y.: Querying big graphs within bounded resources. In: Proc. of SIGMOD’14, pp. 301–312. ACM (2014). https://doi.org/10.1145/2588555.2610513

  27. 27.

    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proc. of OSDI’14, pp. 599–613 (2014)

  28. 28.

    Group, C.M.D.: BerkeleyDB. https://dbdb.io/db/berkeley-db. Accessed July 2020

  29. 29.

    Holzschuher, F., Peinl, P.D.R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. In: Proc. of GraphQ@EDBT/ICDT’13, pp. 195–204 (2013). https://doi.org/10.1145/2457317.2457351

  30. 30.

    JanusGraph: Distributed, open source, massively scalable graph database . https://janusgraph.org/. Accessed June 2020

  31. 31.

    Johann, S., Egyed, A.: Instant and incremental transformation of models. In: Proc. of ASE’04, pp. 362–365. IEEE Computer Society (2004). https://doi.org/10.1109/ASE.2004.10047

  32. 32.

    Jouault, F., Tisi, M.: Towards incremental execution of ATL transformations. In: Proc. of ICMT’10, LNCS, vol. 6142, pp. 123–137. Springer (2010). https://doi.org/10.1007/978-3-642-13688-7_9

  33. 33.

    Kafka, A.: Apache Kafka. A distributed streaming platform. https://kafka.apache.org/intro. Accessed May 2019

  34. 34.

    Kalavri, V., Vlassov, V., Haridi, S.: High-level programming abstractions for distributed graph processing. IEEE Trans. Knowl. Data Eng. 30(2), 305–324 (2018). https://doi.org/10.1109/TKDE.2017.2762294

    Article  Google Scholar 

  35. 35.

    Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013). https://doi.org/10.14778/2556549.2556571

    Article  Google Scholar 

  36. 36.

    Li, K., Li, G.: Approximate query processing: What is new and where to go?—a survey on approximate query processing. Data Sci. Eng. 3(4), 379–397 (2018). https://doi.org/10.1007/s41019-018-0074-4

    Article  Google Scholar 

  37. 37.

    Ltd, M.: Memgraph. Difference from Neo4j’s cypher implementation. https://docs.memgraph.com/memgraph/reference-overview/differences. Accessed Sept 2020

  38. 38.

    Luckham, D.C.: The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley, Boston (2002)

    Google Scholar 

  39. 39.

    Luckham, D.C.: Event Processing for Business: Organizing the Real-Time Enterprise. Wiley, New York (2012)

    Book  Google Scholar 

  40. 40.

    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, June 6–10, 2010, pp. 135–146 (2010). https://doi.org/10.1145/1807167.1807184

  41. 41.

    Memgraph Ltd: Memgraph graph database. https://memgraph.com/. Accessed Nov 2019

  42. 42.

    Memgraph Ltd: Memgraph indexing. https://docs.memgraph.com/memgraph/concepts-overview/indexing. Accessed Sept 2020

  43. 43.

    Mhedhbi, A., Gupta, P., Khaliq, S., Salihoglu, S.: A+ indexes: lightweight and highly flexible adjacency lists for graph database management systems. CoRR arXiv:2004.00130 (2020)

  44. 44.

    Neo4j: Neo4j graph platform. https://neo4j.com/. Accessed Jan 2020

  45. 45.

    Neo4j: Cypher query language. https://neo4j.com/developer/cypher-query-language/. Accessed Nov 2019

  46. 46.

    Neo4j: Neo4j—indexes for search performance. https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.html. Accessed Sept 2020

  47. 47.

    OrientDB: LiveQuery. https://orientdb.com/nosql/livequery/. Accessed July 2020

  48. 48.

    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Tech. rep, Stanford Digital Library Technologies Project (1998)

  49. 49.

    Peng, P., Zou, L., Chen, L., Zhao, D.: Adaptive distributed RDF graph fragmentation and allocation based on query workload. IEEE Trans. Knowl. Data Eng. 31(4), 670–685 (2019). https://doi.org/10.1109/TKDE.2018.2841389

    Article  Google Scholar 

  50. 50.

    Perliger, A., Pedahzur, A.: Social network analysis in the study of terrorism and political violence. PS Polit. Sci. Polit. 44(1), 45–50 (2011). https://doi.org/10.1017/S1049096510001848

    Article  Google Scholar 

  51. 51.

    Razavi, A., Kontogiannis, K.: Partial evaluation of model transformations. In: Proc. of ICSE’12, pp. 562–572. IEEE Computer Society (2012). https://doi.org/10.1109/ICSE.2012.6227160

  52. 52.

    Real, E., Shlens, J., , Pan, S.M.X., Vanhoucke, V.: YouTube-BoundingBoxes dataset. https://research.google.com/youtube-bb/. Accessed Oct 2019

  53. 53.

    Richardson, M., Domingos, P.M.: The intelligent surfer: probabilistic combination of link and content information in PageRank. In: proc. of NIPS’01, pp. 1441–1448. MIT Press (2001)

  54. 54.

    Szárnyas, G., Izsó, B., Ráth, I., Harmath, D., Bergmann, G., Varró, D.: IncQuery-D: a distributed incremental model query framework in the cloud. In: Proc. of MODELS’14, pp. 653–669 (2014). https://doi.org/10.1007/978-3-319-11653-2_40

  55. 55.

    Szárnyas, G., Izsó, B., Ráth, I., Varró, D.: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17(4), 1365–1393 (2018). https://doi.org/10.1007/s10270-016-0571-8

    Article  Google Scholar 

  56. 56.

    Szárnyas, G., Marton, J., Maginecz, J., Varró, D.: Reducing property graph queries to relational algebra for incremental view maintenance. CoRR arXiv:1806.07344 (2018)

  57. 57.

    The New Yorker: Data from the New Yorker caption contest. https://github.com/nextml/caption-contest-data. Accessed Oct 2019

  58. 58.

    TinkerPop: Apache TinkerGraph. http://tinkerpop.apache.org/docs/current/reference/#tinkergraph-gremlin. Accessed Oct 2019

  59. 59.

    TinkerPop: TinkerGraph indices. https://tinkerpop.apache.org/javadocs/3.2.2/full/org/apache/tinkerpop/gremlin/tinkergraph/structure/TinkerGraph.html#vertexIndex. Accessed Sept 2020

  60. 60.

    Tinkerpop, A.: Interface vertex program. http://tinkerpop.apache.org/javadocs/3.1.4/core/org/apache/tinkerpop/gremlin/process/computer/VertexProgram.html. Accessed Jan 2020

  61. 61.

    Troya, J., Wimmer, M., Burgueño, L., Vallecillo, A.: Towards approximate model transformations. In: Proc. of AMT@MoDELS’14, pp. 44–53. CEUR-WS (2014)

  62. 62.

    Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: Proc. of ICDE’13, pp. 673–684 (2013). https://doi.org/10.1109/ICDE.2013.6544865

  63. 63.

    Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Answering enumeration queries with the crowd. Commun. ACM 59(1), 118–127 (2016)

    Article  Google Scholar 

  64. 64.

    Ujhelyi, Z., Bergmann, G., Hegedüs, Á., Horváth, Á., Izsó, B., Ráth, I., Szatmári, Z., Varró, D.: EMF-IncQuery: an integrated development environment for live model queries. Sci. Comput. Program. 98, 80–99 (2015). https://doi.org/10.1016/j.scico.2014.01.004

    Article  Google Scholar 

  65. 65.

    Uta, A., Ghit, B., Dave, A., Boncz, P.A.: [Demo] Low-latency spark queries on updatable data. In: Proc. of SIGMOD’19, pp. 2009–2012 (2019). https://doi.org/10.1145/3299869.3320227

  66. 66.

    W3C RDF Data Access Working Group: SPARQL query language. https://www.w3.org/TR/rdf-sparql-query/. Accessed Jan 2020

  67. 67.

    Wang, Y., Parthasarathy, S., Sadayappan, P.: Stratification driven placement of complex data: a framework for distributed data analytics. In: Proc. of ICDE’13, pp. 709–720. IEEE Computer Society (2013). https://doi.org/10.1109/ICDE.2013.6544868

  68. 68.

    Webber, J., Robinson, I., Eifrem, E.: Graph databases. O’Reilly Media (2013)

  69. 69.

    Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B.: Experimentation in Software Engineering. Springer, Berlin (2012)

    Book  Google Scholar 

  70. 70.

    Wood, P.T.: Graph database. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, 2nd edn. Springer, New York (2018). https://doi.org/10.1007/978-1-4614-8265-9_183

    Chapter  Google Scholar 

  71. 71.

    Yang, C.C., Ng, T.D.: Terrorism and crime related weblog social network: link, content analysis and information visualization. In: Proc. of ISI’07, pp. 55–58. IEEE (2007). https://doi.org/10.1109/ISI.2007.379533

Download references

Acknowledgements

This work is partially supported by the European Commission (FEDER) and the Spanish Government under projects APOLO (US-1264651), HORATIO (RTI2018-101204-B-C21), EKIPMENT-PLUS (P18-FR-2895) and COSCA (PGC2018-094905-B-I00).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Javier Troya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Daniel Varro.

Appendices

A Appendix: ProductPopularity with SDR algorithm

To demonstrate how the SDR algorithm works for a specific query, a small graph for Amazon case study is shown in Fig. 4. In this case, the graph contains two customers (C1 and C2) and two products (P10 and P20). C1 orders two orders (O1C1 and O2C1), whereas C2 orders one (O1C2). We want to apply the SDR algorithm for this graph with the ProductPopularity query showed in Listing 2. The updates of the weight for each object as iterations run are displayed in Table 7. In the following, each function and iteration of the algorithm displayed in Algorithm 1 is explained in detail. Note that when we refer to line numbers, unless otherwise specified, we are referring to the following:

  • Text of the section: line numbers that are mentioned in the normal text of this section refer to the lines of SDRAlgorithm depicted in Algorithm 1.

  • Non-enumerated lists: line numbers that are mentioned in non-enumerated lists refer to the lines of SDRVertexCentric function.

  • Enumerated lists: line numbers that are mentioned in enumerated lists refer to the lines of functions WeightInitialisation, InWeightPropagation or FurWeightPropagation, depending on the specific case.

First, the SDR algorithm calls the SDRVertexCentric function for each object in the graph (line 1). This function starts the initial iteration (\(iteration = 0\)) and it has the following execution flow:

  • First, it establishes guardCondition to true (line 4).

  • Since iteration meets the condition of line 5 (iteration==0), the function selects the last step of the query to be analysed (line 6).

  • Then, it calls WeightInitialisation function (line 7). Note that at this point \(iteration = 0\) and S.size = 2. WeightInitialisation works as follows:

    1. 1.

      First, the function checks the type of the step s. In ProductPopularity query, the last step is a where step. As a where step is a traversal step that has only one statement, the function gets into the if clause of line 12 and obtains the subquery contained in this statement (line 13).

    2. 2.

      Then, it makes a recursive call with this subquery as input data of the SDRVertexCentric function (line 15). Note that at this point, \(iteration=0\) and S.size = 4, since the subquery has 4 stepsFootnote 4. This call has the following flow for each iteration:

      • guardCondition is established to true (line 4).

      • iteration meets the condition of line 5, so the function selects the last step of the subquery and stores it in s (line 6).

      • Then, it calls WeightInitialisation function that works as follows:

        1. (a)

          Now, the step s corresponds with the has step (line 5 in Listing 2).

        2. (b)

          Same as for the query, the function checks its type. In this case, s is a property filter, so the function gets into the if clause of line 1 and checks if v matches the filter (line 2). As shown in Figure 4, only the object P10 matches the filter, so for v = P10 the function gets into the if clause of line 2. For the rest of objects the function establishes guardCondition to false (line 7).

        3. (c)

          Then, for v = P10, the function searches the previous step of the subquery that corresponds with a relationship (line 3). As can be viewed in Listing 2, this step is the relationship step contains.

        4. (d)

          Therefore, it counts the number of neighbours that P10 can reach through relationship contains (line 4). Since P10 can reach O1C1 through relationship contains, cNeighbors is equal to 1.

        5. (e)

          As cNeighbors is higher than 0, guardCondition is established to true (line 5).

        6. (f)

          Once the function finishes the if-then-else clause of lines 1 to 18, it checks the value of guardCondition. For v = P10, this value is true, so the function gets into the if clause of line 19 and calculates the weight of P10. Since weight is 0 and cNeighbors is 1, the new value of weight is 1 (line 20). On the contrary, as stated before, for \(v \ne P10\) guardCondition is false so weight remains 0. Note how in the second column of Table 7 the object P10 has weight = 1, whereas the remaining objects have weight = 0.

        7. (g)

          Finally, it returns the weight value (line 22) and the function finishes.

      • Then, the SDRVertexCentric function increments iteration counter (line 16) and the next iteration starts (at this point iteration = 1 and S.size = 4).

      • As iteration is less than S.size, the SDRVertexCentric function stays in the while loop of line 3 and sets guardCondition to true (line 4).

      • As \(iteration = 1\), it gets into the else clause of line 8 and selects the same value for s than the initial iteration (line 9).

      • Then, it gets into if clause of line 10 and it calls InWeightPropagation function (line 11). This function works as follows :

        1. (a)

          First, it checks the type of s. As stated in the previous iteration (recall that WeightInitialisation and InWeightPropagation analyse the same step), s is a property filter so it gets into if clause of line 3.

        2. (b)

          Same as in the WeightInitialisation function, it searches the previous step that corresponds to a relationship and stores it in pRel (line 4). This relationship is contains.

        3. (c)

          Then, iteration is incremented (line 5), which means that \(iteration = 2\).

        4. (d)

          The algorithm checks if the calculated weight in the previous iteration is higher than 0 (line 6). This is true only for v = P10, so, in this case, it sends a message through relationship contains to O1C1 (line 7).

        5. (e)

          Finally, the InWeightPropagation function finishes and it returns the same weight calculated in WeightInitialisation function (line 10). Note that in columns 2 and 3 of Table 7 all weights are the same.

      • Then, SDRVertexCentric increments iteration and the new iteration starts, which means that \(iteration = 3\) and S.size = 4.

      • As iteration is smaller or equal to S.size, SDRVertexCentric stays into while loop of line 3 and sets guardCondition to true (line 4).

      • \(iteration \ne 0\), so the function gets into else clause of line 8 and selects the relationship step orders (line 3 in Listing 2) for s (line 9).

      • Besides, \(iteration \ne 1\), so SDRVertexCentric gets into else clause of line 12 and it calls FurWeightPropagation function that works as follows:

        1. (a)

          First, it counts the number of messages sent from the previous iteration to v (line 1). Note that in the previous iteration only P10 sent a message to O1C1 through the relationship contains, so for \(v = O1C1\) cMessages has value 1, while for the rest it is 0.

        2. (b)

          Therefore, for \(v = O1C1\), the function gets into if clause of line 2 and checks the type of s.

        3. (c)

          As stated before, s is the relationship step orders, so the function gets into if clause of line 3, it counts the number of neighbours that can be reached for v through s and stores this number in cNeighbors. For \(v = O1C1\), cNeighbors has value 1, since O1C1 can reach C1 through relationship orders.

        4. (d)

          Then, for this value of v, guardCondition is set to true (line 5) and a message is sent through relationship orders to C1 (line 6).

        5. (e)

          Finally, as guardCondition is true for every object of the graph, the function updates the value of weight for all of them (lines 19–21). However, since cNeighbors and cMessages are 0 for \(v \ne O1C1\), the weight value remains the same as in the previous iteration for this case. On the other hand, for \(v = O1C1\), cMessages = 1 and cNeighbors = 1, so weight is updated to 2. Updated values for this iteration can be viewed in column 4 of Table 7.

        6. (f)

          FurWeightPropagation returns the updated weight and it finishes (line 22).

      • Now, SDRVertexCentric increments iteration (line 16) and the next iteration starts (at this point \(iteration = 4\) and S.size = 4).

      • guardCondition is set to true (line 4).

      • \(iteration \ne 0\), so SDRVertexCentric gets into else clause of line 8 and selects the added graph step at the beginning of the subquery for s (line 9).

      • Besides, \(iteration \ne 1\), so the SDRVertexCentric gets into else clause of line 12 and FurWeightPropagation starts again:

        1. (a)

          First, it counts the number of messages sent from the previous iteration to v (line 1). In the previous iteration, only O1C1 sent a message to C1 through the relationship orders. For this reason, for \(v = C1\), cMessages has value 1, and 0 for the rest of objects.

        2. (b)

          Then, for \(v = C1\), the function gets into if clause of line 2 and checks the type of s. However, since s is a graph step, the algorithm gets out of this if clause without any change.

        3. (c)

          Finally, as guardCondition is true for every object of the graph, the algorithm updates the value of weight for all of them (lines 19–21). However, since cNeighbors and cMessages are 0 for \(v \ne C1\), the weight value remains the same as in the previous iteration for this case. On the other hand, for \(v = C1\), cMessages = 1 and cNeighbors = 0, so weight is updated to 1. Updated values for this iteration can be viewed in column 5 of Table 7.

        4. (d)

          FurWeightPropagation returns the updated weight and it finishes (line 22).

      • Then, iteration is incremented by SDRVertexCentric in line 16 and since \(iteration = 5\), which is higher than S.size, the function escapes the while loop of line 3 and returns the value of weight (line 18).

    3. 3.

      Once the results of the recursive call are obtained, the function computes weights according to the type of traversal (line 16). The computation process for the different types of traversal steps is explained more in detail in ‘Appendix B’.

    4. 4.

      Then, the function escapes the if clause of line 12 and checks the guardCondition value (line 19).

    5. 5.

      Since guardCondition remains true, it updates the weight value (line 20). However, as cNeighbors value is equal to 0 for every object in the graph, the value of weight is updated with the result of the recursive call of lines 15 and 16.

  • Finally, iteration is incremented by SDRVertexCentric in line 16 (note that at this point \(iteration = 1\) and S.size = 2, since the query has 2 steps).

  • guardCondition is set to true (line 4).

  • \(iteration = 1\), so SDRVertexCentric gets into else clause of line 8 and selects the last step of the query for s (line 9).

  • Then, it gets into if clause of line 10 and calls InWeightPropagation function (line 11):

    1. 1.

      First, it checks the type of s. As stated in the previous iteration, s is a traversal so it gets into if clause of line 3.

    2. 2.

      Then, it searches for the previous step that corresponds to a relationship and stores it in pRel (line 4). However, since there are no more relationship steps in the query, pRel does not contain any relationship.

    3. 3.

      Then, iteration is incremented (line 5), so that \(iteration = 2\) and S.size = 2.

    4. 4.

      The function checks if the calculated weight in the previous iteration is higher than 0 (line 6). This is true only for P10, O1C1 and C1 so, in this case, the function tries to send a message through pRel (line 7). But since pRel does not contain a relationship, no messages are sent.

    5. 5.

      Finally, weight value remains the same as in the previous iteration (line 10). Note that weights in columns 5 and 6 of Table 7 are the same.

  • Then, iteration is incremented by SDRVertexCentric and it is equal to 3. In this case, iteration is higher than S.size, so SDRVertexCentric escapes the while loop of line 3, it returns weight value (line 18) and the execution finishes.

Once SDRVertexCentric finishes, the SDR algorithm obtains a subgraph with the objects with weight higher than 0, and the relationships among them (lines 2 and 3). In this example, this subgraph only contains C1, O1C1 and P10 objects and the relationships between them. Note that if ProductPopularity query is run either over this subgraph or over the complete graph of Figure 4, the result will be object C1 for both executions.

B Appendix: Traversals with SDR algorithm

In this appendix, we explain the strategies to compute the weights for the different types of traversal steps in Algorithm 1. We distinguish four types of traversal steps: where, not, and and or. For a better understanding about how the SDR algorithm computes them, we describe several examples applied to the Amazon graph shown in Figure 4.

Fig. 4
figure4

Graph 1: example for Amazon case

Where Step

The where step is used to filter objects according to a predicate. This predicate is based on the path history of an object. In this way, an object is selected by the filter if it has the path indicated in the where step predicate.

Let us consider the sample query shown in Listing 2, which contains a where step. In this query, the graph objects that order an Order that contains a Product with the idProduct = ‘10’ are filtered. For this query to be applied to the graph of Figure 4, the SDR algorithm first obtains the weights of the where clause, iterating the steps of the subquery contained in the predicate. The results of the calculated weights for each iteration and each object of the graph are shown in columns 2 to 6 of Table 7. Once the algorithm finishes the calculation of the where step, the resulting weights calculated for this step are assigned to each object of the graph for the next iteration. Note in column 6 of Table 7 that the weights of all objects are the same as in the last iteration of the computation of the where step (column 5). This is because iteration It1 does not modify the weights, since it only sends messages, as explained in Sect. 4.1. After that, the algorithm continues the normal execution updating the calculated weights according to the remaining steps of the query.

Not Step

Same as where step, the not step is used to filter the objects according to a predicate. However, not step removes from the result the objects that satisfy this predicate and returns the rest.

Table 7 Object weights for ProductPopularity query with SDR algorithm

Let us observe again the example shown in Listing 2 and suppose we change the where step for a not step in this query. In this case, the graph objects that do not order an Order that contains a Product with the idProduct = ‘10’ are filtered. Applying this query to the graph of Figure 4, the SDR algorithm first traverses the steps of the predicate of the not clause. At first, the algorithm calculates the weights in the same way as in the where step. However, in the last iteration it performs the following operation with the weight values:

$$\begin{aligned} \mathrm{{weight}} = \left\{ \begin{array}{ll} 0 &{} \mathrm {if\ } \mathrm{{weight}} > 0 \\ 1 &{} \mathrm {if\ } \mathrm{{weight}} \le 0 \end{array} \right. + \mathrm{{pItWeight}} \end{aligned}$$

Therefore, if the calculated weight is higher than 0, then the algorithm changes it to 0, and the other way around. After this conversion, if the object was relevant to the previous steps of the query, it will have a weight 0 and, therefore, it will be discarded when obtaining the subgraph. To avoid this, the algorithm adds the weight calculated for that object in the penultimate iteration (pItWeight). This process is exemplified for the graph of Figure 4 in column 5 of Table 8.

Table 8 Object weights for ProductPopularity query with not step with SDR algorithm

Then, as with the where step, the algorithm continues the normal execution updating the calculated weights according to the remaining steps of the query. Note in column 6 of Table 8 that the weights for each object are the same as the weights of the last iteration of the computation of the not step (column 5), since in the It1 only messages are sent to other objects.

And Step

The and step is used to filter objects according to two or more predicates and it ensures that filtered objects meet all predicates. Therefore, since in this case there are more than one predicate, there are more than one subquery where to compute the weights too.

Table 9 Object weights for subquery example with SDR algorithm

Let us consider Listing 4, where PackagePopularity query of Amazon case study is shown. In this case, the objects that order an Order that contains the Product with the idProduct = ‘10’ and order an Order that contains the Product with the idProduct = ‘20’ are filtered. Note that this query has two subqueries: the first one is equivalent to the subquery of the where step in ProductPopularity query, and the second one is similar but with a different property filter step. The weights computed for the second subquery are shown in Table 9—note that four iterations are displayed in the table because it is focused on the subquery. In this way, the results for both subqueries with the SDR algorithm are shown in columns 2–5 of Tables 7 and 9, respectively. For the and step, the algorithm computes the weights for both queries separately and performs the following operation to merge them:

$$\begin{aligned} weight = \sum _{i=1}^{n} \mathrm{{pItWeight}}_{i} + \prod _{i=1}^{n} \mathrm{{weight}}_{i} \end{aligned}$$

If n is the number of predicates contained in the and step, \(weight_{i}\) is the calculated weight of the subquery of the predicate i, and \(pItWeight_{i}\) is the calculated weight of the predicate i in the penultimate iteration. The results of the weights computed for the PackagePopularity query are shown in Table 10. Note we add \(pItWeight_{i}\) to avoid that the object has weight 0 if it is relevant to the steps previous to the first one, similar to the situation described in ‘Appendix B.2’.

Table 10 Object weights for PackagePopularity example with SDR algorithm
Table 11 Object weights for SimProductsPopularity example with SDR algorithm
Fig. 5
figure5

Performance results for SDR algorithm in contest example queries

Fig. 6
figure6

Performance results for SDR algorithm in YouTube example queries

Table 12 Ratio incremental gain results for Contest case study
Table 13 Ratio incremental gain results for Youtube case study

Or Step

Similar to the and step, the or step is used to filter the objects according to two or more predicates. However, in this case, it ensures that the filtered objects meet at least one of the predicates.

Let us consider we modify in Listing 4 the and step with an or step, obtaining the query SimProductsPopularity of Amazon case example. In this case, the objects that order an Order that contains the Product with the idProduct = ‘10’ or order an Order that contains the Product with the idProduct = ‘20’ are filtered. Starting from the results shown in columns 2 to 5 of Table 7 and Table 9, the or step performs the following merge of subqueries:

$$\begin{aligned} \mathrm{{weight}} = \sum _{i=1}^{n} \mathrm{{weight}}_{i} \end{aligned}$$

Being n the number of predicates contained in the or step and \(weight_{i}\) the calculated weight of the subquery of the predicate i. The results for SimProductsPopularity query over the graph of Fig. 4 are shown in Table 11. Note that with the simple graph of Figure 4, all the objects in the graph are assigned weights> 0. This would not be the case in a real system, where many elements would be discarded, as we describe in Sect. 5.

C Appendix: Additional charts and tables displaying experiments results

To improve the readability of the manuscript, this appendix contains some of the tables and figures that show the results of the evaluations. Specifically, Figs. 5 and 6 show the execution time and memory consumption of the experiments with static information of Contest and YouTube case studies, respectively. Then, Tables 12 and 13 show the gain results of the experiments with dynamic information of the same case studies.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barquero, G., Troya, J. & Vallecillo, A. Improving query performance on dynamic graphs. Softw Syst Model 20, 1011–1041 (2021). https://doi.org/10.1007/s10270-020-00832-3

Download citation

Keywords

  • Data stream processing
  • Dynamic graphs
  • Performance optimization
  • Precomputing systems
  • Data queries