Conclave: Ontology-Driven Measurement of Semantic Relatedness between Source Code Elements and Problem Domain Concepts

  • Nuno Ramos Carvalho
  • José João Almeida
  • Pedro Rangel Henriques
  • Maria João Varanda Pereira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8584)


Software maintainers are often challenged with source code changes to improve software systems, or eliminate defects, in unfamiliar programs. To undertake these tasks a sufficient understanding of the system (or at least a small part of it) is required. One of the most time consuming tasks of this process is locating which parts of the code are responsible for some key functionality or feature. Feature (or concept) location techniques address this problem.

This paper introduces Conclave, an environment for software analysis, and in particular the Conclave-Mapper tool that provides a feature location facility. This tool explores natural language terms used in programs (e.g. function and variable names), and using textual analysis and a collection of Natural Language Processing techniques, computes synonymous sets of terms. These sets are used to score relatedness between program elements, and search queries or problem domain concepts, producing sorted ranks of program elements that address the search criteria, or concepts. An empirical study is also discussed to evaluate the underlying feature location technique.


Source Code Search Query Program Element Program Comprehension Software Maintenance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antoniol, G., Guéhéneuc, Y.-G.: Feature identification: An epidemiological metaphor. IEEE Transactions on Software Engineering 32(9), 627–641 (2006)CrossRefGoogle Scholar
  2. 2.
    Bechhofer, S., Van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A., et al.: Owl web ontology language reference. W3C Recommendation 10, 2006–01 (2004)Google Scholar
  3. 3.
    Biggerstaff, T.J., Mitbander, B.G., Webster, D.: The concept assignment problem in program understanding. In: Proceedings of the 15th International Conference on Software Engineering, pp. 482–498. IEEE Computer Society Press (1994)Google Scholar
  4. 4.
    Binkley, D., Lawrie, D.: Information retrieval applications in software maintenance and evolution. In: Encyclopedia of Software Engineering (2009)Google Scholar
  5. 5.
    Binkley, D., Lawrie, D.: Information retrieval applications in software development. In: Encyclopedia of Software Engineering (2010)Google Scholar
  6. 6.
    Carvalho, N.R., Almeida, J.J., Pereira, M.J.V., Henriques, P.R.: Probabilistic synset based concept location. In: SLATE 2012 — Symposium on Languages, Applications and Technologies (June 2012)Google Scholar
  7. 7.
    Chen, K., Rajlich, V.: Case study of feature location using dependence graph. In: 8th International Workshop on Program Comprehension. IEEE (2000)Google Scholar
  8. 8.
    Chikofsky, E.J., Cross II, J.H.: Reverse engineering and design recovery: A taxonomy. IEEE Software, 13–17 (1990)Google Scholar
  9. 9.
    Corbi, T.A.: Program understanding: Challenge for the 1990s. IBM Systems Journal 28(2), 294–306 (1989)CrossRefGoogle Scholar
  10. 10.
    Deissenboeck, F., Pizka, M.: Concise and consistent naming. Software Quality Journal 14(3), 261–282 (2006)CrossRefGoogle Scholar
  11. 11.
    Dit, B., Guerrouj, L., Poshyvanyk, D., Antoniol, G.: Can better identifier splitting techniques help feature location? In: IEEE 19th International Conference on Program Comprehension (2011)Google Scholar
  12. 12.
    Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25(1), 53–95 (2013)Google Scholar
  13. 13.
    Eisenbarth, T., Koschke, R., Simon, D.: Locating features in source code. IEEE Transactions on Software Engineering 29(3), 210–224 (2003)CrossRefGoogle Scholar
  14. 14.
    Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Communications of the ACM 30(11), 964–971 (1987)CrossRefGoogle Scholar
  15. 15.
    Happel, H.-J., Seedorf, S.: Applications of ontologies in software engineering. In: Proc. of Workshop on Sematic Web Enabled Software Engineering (SWESE) on the ISWC, pp. 5–9. Citeseer (2006)Google Scholar
  16. 16.
    Hayashi, S., Yoshikawa, T., Saeki, M.: Sentence-to-code traceability recovery with domain ontologies. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp. 385–394. IEEE (2010)Google Scholar
  17. 17.
    Hill, E., Pollock, L., Vijay-Shanker, K.: Exploring the neighborhood with dora to expedite software maintenance. In: Proceedings of 22nd IEEE/ACM International Conference on Automated Software Engineering, pp. 14–23 (2007)Google Scholar
  18. 18.
    Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering. IEEE (2009)Google Scholar
  19. 19.
    Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: the making of a Web Ontology Language. Web Semantics: Science, Services and Agents on the World Wide Web 1(1), 7–26 (2003)CrossRefGoogle Scholar
  20. 20.
    Keller, W.: Mapping objects to tables. In: Proc. of European Conference on Pattern Languages of Programming and Computing, Kloster Irsee, Germany, vol. 206, p. 207. Citeseer (1997)Google Scholar
  21. 21.
    Klyne, G., Carroll, J.J., McBride, B.: Resource description framework (rdf): Concepts and abstract syntax. W3C Recommendation, 10 (2004)Google Scholar
  22. 22.
    Lattner, C.: Llvm and clang: Next generation compiler technology. In: The BSD Conference, pp. 1–2 (2008)Google Scholar
  23. 23.
    Lawrie, D., Binkley, D.: Expanding identifiers to normalize source code vocabulary. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 113–122 (2011)Google Scholar
  24. 24.
    Lawrie, D., Morrell, C., Feild, H., Binkley, D.: What’s in a name? a study of identifiers. In: 14th International Conference on Program Comprehension (2006)Google Scholar
  25. 25.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  26. 26.
    Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223. IEEE (2004)Google Scholar
  27. 27.
    Marcus, A., Rajlich, V., Buchta, J., Petrenko, M., Sergeyev, A.: Static techniques for concept location in object-oriented code. In: Proceedings of the 13th International Workshop on Program Comprehension, IWPC 2005, pp. 33–42. IEEE (2005)Google Scholar
  28. 28.
    Marcus, A., Rajlich, V.: Identification of concepts, features, and concerns in source code. In: Panel Discussion at the International Conference on Software Maintenance (2005)Google Scholar
  29. 29.
    Martin, J.H., Jurafsky, D.: Speech and language processing (2000)Google Scholar
  30. 30.
    Nelson, M.L.: A survey of reverse engineering and program comprehension. Arxiv preprint cs/0503068 (2005)Google Scholar
  31. 31.
    Parr, T.: The Definitive ANTLR 4 Reference. Pragmatic Bookshelf (2013)Google Scholar
  32. 32.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Poshyvanyk, D., Guéhéneuc, Y.-G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering 33(6), 420–432 (2007)CrossRefGoogle Scholar
  34. 34.
    Prud’Hommeaux, E., Seaborne, A., et al.: Sparql query language for rdf. W3C Recommendation, 15 (2008)Google Scholar
  35. 35.
    Rajlich, V., Wilde, N.: The role of concepts in program comprehension. In: Proceedings of the 10th International Workshop on Program Comprehension, pp. 271–278. IEEE (2002)Google Scholar
  36. 36.
    Ratiu, D., Deissenboeck, F.: How programs represent reality (and how they don’t). In: 13th Working Conference on Reverse Engineering, WCRE 2006, pp. 83–92. IEEE (2006)Google Scholar
  37. 37.
    Ratiu, D., Deissenboeck, F.: From reality to programs and (not quite) back again. In: 15th IEEE International Conference on Program Comprehension, ICPC 2007, pp. 91–102. IEEE (2007)Google Scholar
  38. 38.
    Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: 2010 IEEE 18th International Conference on Program Comprehension (ICPC), pp. 14–23. IEEE (2010)Google Scholar
  39. 39.
    Robillard, M.P.: Topology analysis of software dependencies. ACM Transactions on Software Engineering and Methodology (TOSEM) 17(4), 18 (2008)CrossRefGoogle Scholar
  40. 40.
    Safyallah, H., Sartipi, K.: Dynamic analysis of software systems using execution pattern mining. In: 14th IEEE International Conference on Program Comprehension (2006)Google Scholar
  41. 41.
    Shepherd, D., Fry, Z.P., Hill, E., Pollock, L., Vijay-Shanker, K.: Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th International Conference on Aspect-Oriented Software Development, pp. 212–224. ACM (2007)Google Scholar
  42. 42.
    Simões, A., Almeida, J.J., Carvalho, N.R.: Defining a probabilistic translation dictionaries algebra. In: XVI Portuguese Conference on Artificial Inteligence - EPIA, pp. 444–455 (September 2013)Google Scholar
  43. 43.
    Von Mayrhauser, A., Vans, A.M.: Program comprehension during software maintenance and evolution. Computer 28(8), 44–55 (1995)CrossRefGoogle Scholar
  44. 44.
    Wilde, N., Buckellew, M., Page, H., Rajlich, V., Pounds, L.: A comparison of methods for locating features in legacy software. Journal of Systems and Software (2003)Google Scholar
  45. 45.
    Würsch, M., Ghezzi, G., Reif, G., Gall, H.C.: Supporting developers with natural language queries. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1 (2010)Google Scholar
  46. 46.
    Zhang, Y.: An Ontology-based Program Comprehension Model. PhD thesis (2007)Google Scholar
  47. 47.
    Zhao, W., Zhang, L., Liu, Y., Sun, J., Yang, F.: Sniafl: Towards a static noninteractive approach to feature location. ACM Trans. Softw. Eng. Methodol. 15(2), 195–226 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nuno Ramos Carvalho
    • 1
  • José João Almeida
    • 1
  • Pedro Rangel Henriques
    • 1
  • Maria João Varanda Pereira
    • 2
  1. 1.University of MinhoBragaPortugal
  2. 2.Polytechnic Institute of BragançaBragançaPortugal

Personalised recommendations