Information Systems Frontiers

, Volume 13, Issue 3, pp 349–357 | Cite as

Efficient storage and fast querying of source code

  • Oleksandr PanchenkoEmail author
  • Hasso Plattner
  • Alexander B. Zeier


Enabling fast and detailed insights over large portions of source code is an important task in a global development ecosystem. Numerous data structures have been developed to store source code and to support various structural queries, to help in navigation, evaluation and analysis. Many of these data structures work with tree-based or graph-based representations of source code. The goal of this project is to elaborate a data storage that enables efficient storing and fast querying of structural information. The naive adjacency list method has been enhanced with the use of recent data compression approaches for column-oriented databases to allow no-loss albeit compact storage of fine-grained structural data. The graph indexing has enabled the proposed data model to expeditiously answer fine-grained structural queries. This paper describes the basics of the proposed approach and illustrates its technical feasibility.


Source code search Source code analysis Global code repository Structural information 



This project has been done in cooperation with SAP AG. In particular, we would like to thank Jan Karstens, Heinz Ulrich Roggenkemper, Wolfgang Stephan, Cafer Tosun, Xiwei Zhou.


  1. Abadi, D., Madden, S., & Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the international conference on management of data (pp. 671–682). ACM.Google Scholar
  2. Bajracharya, S., Ngo, T., Linstead, E., Dou, Y., Rigor, P., Baldi, P., et al. (2006). Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st SIGPLAN symposium on object-oriented programming systems, languages, and applications (pp. 681–682). ACM.Google Scholar
  3. Begel, A. (2007). Codifier: A programmer-centric search user interface. In Proceedings of the workshop on human-computer interaction and information retrieval (pp. 23–24).Google Scholar
  4. Hajiyev, E., Verbaere, M., & de Moor, O. (2006). CodeQuest: Scalable source code queries with datalog. In Proceedings of the 20th European conference on object-oriented programming (Vol. 4067, pp. 2–27). Berlin: Springer.Google Scholar
  5. Hill, E., Pollock, L., & Vijay-Shanker, K. (2007). Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd international conference on automated software engineering (pp. 14–23). ACM.Google Scholar
  6. Holmes, R., Walker, R. J., & Murphy, G. C. (2006). Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12), 952–970.CrossRefGoogle Scholar
  7. Hummel, O., & Atkinson, C. (2006). Using the web as a reuse repository. In Proceedings of the international conference on software reuse (pp. 298–311).Google Scholar
  8. Hummel, O., Janjic, W., & Atkinson, C. (2008). Code conjurer: Pulling reusable software out of thin air. IEEE Software, 25(5), 45–52.CrossRefGoogle Scholar
  9. Janzen, D., & Volder, K. D. (2003). Programs as information. In Proceedings of the OOPSLA workshop on eclipse technology exchange (pp. 69–73). New York: ACM.CrossRefGoogle Scholar
  10. Keller, H., & Krüger, S. (2007). ABAP objects: ABAP programming in SAP NetWeaver. Galileo Press.Google Scholar
  11. Koskinen, J., Salminen, A., & Paakki, J. (2004). Hypertext support for the information needs of software maintainers. Journal of Software Maintenance and Evolution: Research and Practice, 16(3), 187–215.CrossRefGoogle Scholar
  12. Lethbridge, T., & Singer, J. (2001). Studies of the work practices of software engineers. In H. Erdogmus, & O. Tanir (Eds.), Advances in software engineering: Comprehension, evaluation, and evolution (pp. 53–76). Springer.Google Scholar
  13. Liu, D., & Xu, S. (2007). Challenges of using LSI for concept location. In Proceedings of the 45th annual southeast regional conference (pp. 449–454). ACM.Google Scholar
  14. Marcus, A., Sergeyev, A., Rajlich, V., & Maletic, J. I. (2004). An information retrieval approach to concept location in source code. In Proceedings of the 11th working conference on reverse engineering (pp. 214–223). IEEE Computer Society.Google Scholar
  15. McCormick, E., & Volder, K. D. (2004). JQuery: Finding your way through tangled code. In Proceedings of the 19th annual SIGPLAN conference on object-oriented programming systems, languages, and applications (pp. 9–10). ACM.Google Scholar
  16. Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., & Liu, D. (2006). Source code exploration with Google. In Proceedings of the 22nd IEEE international conference on software maintenance (pp. 334–338). IEEE Computer Society.Google Scholar
  17. Schaffner, J., Bog, A., Krüger, J., & Zeier, A. (2008). A hybrid row-column OLTP database architecture for operational reporting. In Proceedings of the international workshop on business intelligence for the real time enterprise.Google Scholar
  18. Sim, S. E., Clarke, C. L. A., & Holt, R. C. (1998). Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th international workshop on program comprehension (pp. 180–187). IEEE Computer Society.Google Scholar
  19. Stockinger, K., Cieslewicz, J., Wu, K., Rotem, D., & Shoshani, A. (2009). Using bitmap index for joint queries on structured and text data. Annals of Information Systems, 1–23.Google Scholar
  20. Transier, F., & Sanders, P. (2008). Compressed inverted indexes for in-memory search engines. In Proceedings of the 9th workshop on algorithm engineering and experiments.Google Scholar
  21. Trißl, S., & Leser, U. (2007). Fast and practical indexing and querying of very large graphs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 845–856). ACM.Google Scholar
  22. von Mayrhauser, A., & Vans, A. M. (1997). Program understanding needs during corrective maintenance of large scale software. In Proceedings of the 21st international computer software and applications conference (pp. 630–637). IEEE Computer Society.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Oleksandr Panchenko
    • 1
    Email author
  • Hasso Plattner
    • 1
  • Alexander B. Zeier
    • 1
  1. 1.Hasso Plattner Institute for Software Systems EngineeringPotsdamGermany

Personalised recommendations