Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

  • Rudolf RamlerEmail author
  • Georg Buchgeher
  • Claus Klammer
  • Michael Pfeiffer
  • Christian Salomon
  • Hannes Thaller
  • Lukas Linsbauer
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 338)


Source code and related artifacts of software systems encode valuable expert knowledge accumulated over many person-years of development. Analyzing software systems and extracting this knowledge requires processing the source code and reconstructing structure and dependency information. In analysis projects over the last years, we have created tools and services using graph databases for representing and analyzing source code and other software engineering artifacts as well as their dependencies. Graph databases such as Neo4j are optimized for storing, traversing, and manipulating data in the form of nodes and relationships. They are scalable, extendable, and can quickly be adapted for different application scenarios. In this paper, we share our insights and experience from five different cases where graph databases have been used as a common solution concept for analyzing source code and related artifacts. They cover a broad spectrum of use cases from industry and research, ranging from lightweight dependency analysis to analyzing the architecture of a large-scale software system with 44 million lines of code. We discuss the benefits and drawbacks of using graph databases in the reported cases. The benefits are related to representing dependencies between source code elements and other artifacts, the support for rapid prototyping of analysis solutions, and the power and flexibility of the graph query language. The drawbacks concern the generic frontends of graph databases and the lack of support for time series data. A summary of application scenarios for using graph databases concludes the paper.


Static analysis Dependency analysis Knowledge extraction Graph database Neo4j Experience report 



The research reported in this paper was supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the Province of Upper Austria in the frame of the COMET center SCCH.


  1. 1.
    Alves, T.L., Hage, J., Rademaker, P.: A comparative study of code query technologies. In: 11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) 2011, pp. 145–154. IEEE (2011)Google Scholar
  2. 2.
    Angerer, F., Prähofer, H., Ramler, R., Grillenberger, F.: Points-to analysis of IEC 61131–3 programs: Implementation and application. In: IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA) 2013, pp. 1–8. IEEE (2013)Google Scholar
  3. 3.
    Angles, R.: A comparison of current graph database models. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW) 2012. pp. 171–177. IEEE (2012)Google Scholar
  4. 4.
    Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1 (2008)CrossRefGoogle Scholar
  5. 5.
    Buchgeher, G., Ernstbrunner, C., Ramler, R., Lusser, M.: Towards tool-support for test case selection in manual regression testing. In: IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops (ICSTW) 2013, pp. 74–79. IEEE (2013)Google Scholar
  6. 6.
    Buchgeher, G., Weinreich, R., Huber, H.: A platform for the automated provisioning of architecture information for large-scale service-oriented software systems. In: European Conference on Software Architecture. Springer (2018) (to appear)Google Scholar
  7. 7.
    Fleck, G., Kirchmayr, W., Moser, M., Nocke, L., Pichler, J., Tober, R., Witlatschil, M.: Experience report on building ASTM based tools for multi-language reverse engineering. In:IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 2016, vol. 1, pp. 683–687. IEEE (2016)Google Scholar
  8. 8.
    Goonetilleke, O., Meibusch, D., Barham, B.: Graph data management of evolving dependency graphs for multi-versioned codebases. In: IEEE International Conference on Software Maintenance and Evolution (ICSME) 2017, pp. 574–583. IEEE (2017)Google Scholar
  9. 9.
    Hawes, N., Barham, B., Cifuentes, C.: Frappé: Querying the Linux kernel dependency graph. In: Proceedings of the GRADES 2015, p. 4. ACM (2015)Google Scholar
  10. 10.
    Ikkink, H.K.: Gradle Dependency Management. Packt Publishing, Birmingham (2015)Google Scholar
  11. 11.
    John, K.H., Tiegelkamp, M.: IEC 61131–3: Programming Industrial Automation Systems. Concepts and Programming Languages, Requirements for Programming Systems Decision-making Aids. Springer, Heidelberg (2010). Scholar
  12. 12.
    Juergens, E., Hummel, B., Deissenboeck, F., Feilkas, M., Schlogel, C., Wubbeke, A.: Regression test selection of manual system tests in practice. In: 15th European Conference on Software Maintenance and Reengineering, pp. 309–312, March 2011Google Scholar
  13. 13.
    Passos, L., Terra, R., Valente, M.T., Diniz, R., das Mendonca, N.C.: Static architecture-conformance checking: an illustrative overview. IEEE Softw. 27(5), 82–89 (2010)CrossRefGoogle Scholar
  14. 14.
    Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: SPOON: A library for implementing analyses and transformations of Java source code. Softw. Pract. Exp. 46(9), 1155–1179 (2015)CrossRefGoogle Scholar
  15. 15.
    Prähofer, H., Angerer, F., Ramler, R., Grillenberger, F.: Static code analysis of iec 61131–3 programs: Comprehensive tool support and experiences from large-scale industrial application. IEEE Trans. Ind. Inform. 13(1), 37–47 (2017)CrossRefGoogle Scholar
  16. 16.
    Prähofer, H., Angerer, F., Ramler, R., Lacheiner, H., Grillenberger, F.: Opportunities and challenges of static code analysis of iec 61131–3 programs. In: IEEE 17th Conference on Emerging Technologies & Factory Automation (ETFA), pp. 1–8. IEEE (2012)Google Scholar
  17. 17.
    Ramler, R., Salomon, C., Buchgeher, G., Lusser, M.: Tool support for change-based regression testing: an industry experience report. In: Winkler, D., Biffl, S., Bergsmann, J. (eds.) SWQD 2017. LNBIP, vol. 269, pp. 133–152. Springer, Cham (2017). Scholar
  18. 18.
    Robinson, I., Webber, J., Eifrem, E.: Graph Databases: New Opportunities for Connected Data. O’Reilly. Media Inc., Sebastopol (2015)Google Scholar
  19. 19.
    Runeson, P., Host, M., Rainer, A., Regnell, B.: Case Study Research in Software Engineering. Guidelines and Examples. Wiley, Hoboken (2012)CrossRefGoogle Scholar
  20. 20.
    Thaller, H.: Probabilistic Software Modeling, Jun 2018. arXiv:1806.08942 [cs]
  21. 21.
    Urma, R.G., Mycroft, A.: Source-code queries with graph databases-with application to programming language usage and evolution. Sci. Comput. Program. 97, 127–134 (2015)CrossRefGoogle Scholar
  22. 22.
    Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: IEEE Symposium on Security and Privacy (SP), pp. 590–604. IEEE (2014)Google Scholar
  23. 23.
    Yoo, S., Harman, M.: Regression testing minimization, selection and prioritization: a survey. Softw. Test. Verif. Reliab. 22(2), 67–120 (2012)CrossRefGoogle Scholar
  24. 24.
    Zhang, T., Pan, M., Zhao, J., Yu, Y., Li, X.: An open framework for semantic code queries on heterogeneous repositories. In: International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 39–46. IEEE (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Software Competence Center Hagenberg GmbHHagenbergAustria
  2. 2.Johannes Kepler University LinzLinzAustria

Personalised recommendations