Advertisement

Journal of Computer Science and Technology

, Volume 31, Issue 5, pp 883–909 | Cite as

Summarizing Software Artifacts: A Literature Review

  • Najam Nazar
  • Yan Hu
  • He JiangEmail author
Survey

Abstract

This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.

Keywords

mining software repositories mining software engineering data machine learning summarizing software artifacts summarizing source code 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lloret E, Palomar M. Text summarisation in progress: A literature review. Artificial Intelligence Review, 2012, 37(1): 1-41.Google Scholar
  2. 2.
    Murphy G C. Lightweight structural summarization as an aid to software evolution [Ph.D. Thesis]. University of Washington, 1996.Google Scholar
  3. 3.
    Sridhara G, Hill E, Muppaneni D, Pollock L L, Vijay-Shanker K. Towards automatically generating summary comments for java methods. In Proc. the 25th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.43-52.Google Scholar
  4. 4.
    Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: Replication and expansion. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.13-22.Google Scholar
  5. 5.
    Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366-380.Google Scholar
  6. 6.
    Bettenburg N, Premraj R, Zimmermann T, Kim S. Extracting structural information from bug reports. In Proc. the International Working Conference on Mining Software Repositories, May 2008, pp.27-30. J. Comput. Sci. & Technol., Sept. 2016, Vol.31, No.5Google Scholar
  7. 7.
    Bacchelli A, Lanza M, Mastrodicasa E S. On the road to hades-helpful automatic development email summarization. In Proc. the 1st International Workshop on the Next Five Years of Text Analysis in Software Maintenance, Sept. 2012.Google Scholar
  8. 8.
    Di Sorbo A, Panichella S, Visaggio C A, Di Penta M, Canfora G, Gall H C. Development emails content analyzer: Intention mining in developer discussions (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, Nov. 2015, pp.12-23.Google Scholar
  9. 9.
    Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In Proc. the 17th Working Conference on Reverse Engineering, Oct. 2010, pp.35-44.Google Scholar
  10. 10.
    Nenkova A, McKeown K. A survey of text summarization techniques. In Mining Text Data, Aggarwal C C, Zhai C (eds.), Springer US, 2012, pp.43-76.Google Scholar
  11. 11.
    Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval (1 edition). Cambridge University Press, 2008.Google Scholar
  12. 12.
    Kagdi H, Collard M L, Maletic J I. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution: Research and Practice, 2007, 19(2): 77-131.Google Scholar
  13. 13.
    Bacchelli A, Lanza M, Robbes R. Linking e-mails and source code artifacts. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, May 2010, pp.375-384.Google Scholar
  14. 14.
    Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.223-226.Google Scholar
  15. 15.
    Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2).Google Scholar
  16. 16.
    Rodeghero P, McMillan C, McBurney P W, Bosch N, D’Mello S. Improving automated source code summarization via an eyetracking study of programmers. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.390-401.Google Scholar
  17. 17.
    Rodeghero P, Liu C, McBurney P, McMillan C. An eyetracking study of java programmers and application to source code summarization. IEEE Transactions on Software Engineering, 2015, 41(11): 1038-1054.Google Scholar
  18. 18.
    Rastkar S, Murphy G C. Why did this code change? In Proc. the 2013 International Conference on Software Engineering, May 2013, pp.1193-1196.Google Scholar
  19. 19.
    Binkley D, Lawrie D, Hill E, Burge J, Harris I, Hebig R, Keszocze O, Reed K, Slankas J. Task-driven software summarization. In Proc. the 29th IEEE International Conference on Software Maintenance, Sept. 2013, pp.432-435.Google Scholar
  20. 20.
    Panichella A, Aponte J, Di Penta M, Marcus A, Canfora G. Mining source code descriptions from developer communications. In Proc. the 20th International Conference on Program Comprehension (ICPC), Jun. 2012, pp.63-72.Google Scholar
  21. 21.
    Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022.Google Scholar
  22. 22.
    Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531.Google Scholar
  23. 23.
    De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Using IR methods for labeling source code artifacts: Is it worthwhile? In Proc. the 20th International Conference on Program Comprehension, Jun. 2012, pp.193-202.Google Scholar
  24. 24.
    De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Labeling source code with information retrieval methods: An empirical study. Empirical Software Engineering, 2014, 19(5): 1383-1420.Google Scholar
  25. 25.
    Vassallo C, Panichella S, Di Penta M, Canfora G. Codes: Mining source code descriptions from developers discussions. In Proc. the 22nd International Conference on Program Comprehension, May 2014, pp.106-109.Google Scholar
  26. 26.
    Rahman M M, Roy C K, Keivanloo I. Recommending insightful comments for source code using crowd-sourced knowledge. In Proc. the 15th International Working Conference on Source Code Analysis and Manipulation (SCAM), Sept. 2015, pp.81-90.Google Scholar
  27. 27.
    Sridhara G, Pollock L L, Vijay-Shanker K. Generating parameter comments and integrating with method summaries. In Proc. the 19th IEEE International Conference on Program Comprehension, Jun. 2011, pp.71-80.Google Scholar
  28. 28.
    Sridhara G, Pollock L, Vijay-Shanker K. Automatically detecting and describing high level actions within methods. In Proc. the 33rd International Conference on Software Engineering (ICSE), May 2011, pp.101-110.Google Scholar
  29. 29.
    Rastkar S. Summarizing software concerns. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, May 2010, pp.527-528.Google Scholar
  30. 30.
    Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In Proc. the 27th International Conference on Software Maintenance, Sept. 2011, pp.103-112.Google Scholar
  31. 31.
    Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L L, Vijay-Shanker K. Automatic generation of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension, May 2013, pp.23-32.Google Scholar
  32. 32.
    Moreno L, Marcus A, Pollock L L, Vijay Shanker K. Jsummarizer: An automatic generator of natural language summaries for java classes. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.230-232.Google Scholar
  33. 33.
    McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In Proc. the 22nd International Conference on Program Comprehension, Jun. 2014, pp.279-290.Google Scholar
  34. 34.
    McBurney P W, McMillan C. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering, 2016, 42(2): 103-119.Google Scholar
  35. 35.
    McBurney P W. Automatic documentation generation via source code summarization. In Proc. the 37th International Conference on Software Engineering - Volume 2, May 2015, pp.903-906.Google Scholar
  36. 36.
    McBurney P W, Liu C, McMillan C, Weninger T. Improving topic model source code summarization. In Proc. the 22nd International Conference on Program Comprehension, June 2014, pp.291-294.Google Scholar
  37. 37.
    Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G. Automatic generation of release notes. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.484-495.Google Scholar
  38. 38.
    Kulkarni N, Varma V. Supporting comprehension of unfamiliar programs by modeling an expert’s perception. In Proc. the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, Jun. 2014, pp.19-24.Google Scholar
  39. 39.
    Wong E, Yang J, Tan L. Autocomment: Mining question and answer sites for automatic comment generation. In Proc. the IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Nov. 2013, pp.562-567.Google Scholar
  40. 40.
    Zhang Y, Hou D. Extracting problematic API features from forum discussions. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.142-151.Google Scholar
  41. 41.
    Kamimura M, Murphy G C. Towards generating human-oriented summaries of unit test cases. In Proc. the 21st International Conference on Program Comprehension (ICPC), May 2013, pp.215-218.Google Scholar
  42. 42.
    Panichella S, Panichella A, Beller M, Zaidman A, Gall H C. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.547-558.Google Scholar
  43. 43.
    Li B, Vendome C, Linares-Vásquez M, Poshyvanyk D, Kraft N A. Automatically documenting unit test cases. In Proc. the IEEE Int. Conf. Software Testing, Verification and Valication, Apr. 2016, pp.341-352.Google Scholar
  44. 44.
    Dragan N, Collard M, Maletic J. Automatic identification of class stereotypes. In Proc. the IEEE International Conference on Software Maintenance (ICSM), Sept. 2010, pp.1-10.Google Scholar
  45. 45.
    Abid N, Dragan N, Collard M, Maletic J. Using stereotypes in the automatic generation of natural language summaries for C++ methods. In Proc. the International Conference on Software Maintenance and Evolution, Sept.29-Oct.1, 2015, pp.561-565.Google Scholar
  46. 46.
    Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk D. On automatically generating commit messages via summarization of source code changes. In Proc. the 14th IEEE International Working Conference on Source Code Analysis and Manipulation, Sept. 2014, pp.275-284.Google Scholar
  47. 47.
    Moreno L, Marcus A. Jstereocode: Automatically identifying method and class stereotypes in java code. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2012, pp.358-361.Google Scholar
  48. 48.
    Buse R P, Weimer W R. Automatically documenting program changes. In Proc. the IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.33-42.Google Scholar
  49. 49.
    Nielson F, Nielson H R, Hankin C. Principles of Program Analysis. Springer, 2015.Google Scholar
  50. 50.
    Kupiec J, Pedersen J O, Chen F. A trainable document summarizer. In Proc the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 1995, pp.68-73.Google Scholar
  51. 51.
    Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. In Proc. the 28th IEEE International Conference on Software Maintenance, Sept. 2012, pp.430-439.Google Scholar
  52. 52.
    Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, Volume 1, May 2010, pp.505-514.Google Scholar
  53. 53.
    Murray G, Carenini G. Summarizing spoken and written conversations. In Proc. the Conference on Empirical Methods in Natural Language Processing, Oct. 2008, pp.773-782.Google Scholar
  54. 54.
    Jiang H, Zhang J, Ma H, Nazar N, Ren Z. Mining authorship characteristics in bug repositories. Science China Information Sciences, 2015. (Accepted)Google Scholar
  55. 55.
    Ying A T T, Robillard M P. Code fragment summarization. In Proc. the 9th Joint Meeting on Foundations of Software Engineering, Aug. 2013, pp.655-658.Google Scholar
  56. 56.
    Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504-517.Google Scholar
  57. 57.
    Petrosyan G, Robillard M P, Mori R D. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering-Volume 1, May 2015, pp.869-879.Google Scholar
  58. 58.
    Mani S, Catherine R, Sinha V S, Dubey A. AUSUM: Approach for unsupervised bug report summarization. In Proc. the 20th International Symposium on the Foundations of Software Engineering, Nov. 2012, Article No. 11.Google Scholar
  59. 59.
    Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. Empirical Software Engineering, 2015, 20(2): 516-548.Google Scholar
  60. 60.
    Yeasmin S, Roy C, Schneider K. Interactive visualization of bug reports using topic evolution and extractive summaries. In Proc. the IEEE International Conference on Software Maintenance and Evolution, Sept. 2014, pp.421-425.Google Scholar
  61. 61.
    Fowkes J, Chanthirasegaran P, Allamanis M, Lapata M, Sutton C A. TASSAL: Autofolding for source code summarization. In Proc. the 38th International Conference on Software Engineering Companion, May 2016, pp.649-652.Google Scholar
  62. 62.
    Aponte J, Marcus A. Improving traceability link recovery methods through software artifact summarization. In Proc. the 6th International Workshop on Traceability in Emerging Forms of Software Engineering, May 2011, pp.46-49.Google Scholar
  63. 63.
    Fritz T, Shepherd D C, Kevic K, Snipes W, Bräunlich C. Developers’ code context models for change tasks. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.7-18.Google Scholar
  64. 64.
    Kevic K, Walters B M, Shaffer T R, Sharif B, Shepherd D C, Fritz T. Tracing software developers’ eyes and interactions for change tasks. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug.31-Sept.4, 2015, pp.202-213.Google Scholar
  65. 65.
    Ying A T T, Robillard M P. Selection and presentation practices for code example summarization. In Proc. the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2014, pp.460-471.Google Scholar
  66. 66.
    Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2011, pp.253-262.Google Scholar
  67. 67.
    Wang X, Zhang L, Xie T, Anvik J, Sun J. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. the 30th ACM/IEEE International Conference on Software Engineering, May 2008, pp.461-470.Google Scholar
  68. 68.
    Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In Proc. the 29th International Conference on Software Engineering, May 2007, pp.499-510.Google Scholar
  69. 69.
    McBurney P W, McMillan C. An empirical study of the textual similarity between source code and source code summaries. Empirical Software Engineering, 2014: 21(1): 17-42.Google Scholar
  70. 70.
    Hill E, Pollock L, Vijay-Shanker K. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.232-242.Google Scholar
  71. 71.
    Treude C, Filho F F, Kulesza U. Summarizing and measuring development activity. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Sept. 2015, pp.625-636.Google Scholar
  72. 72.
    Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 27.Google Scholar
  73. 73.
    Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871-1874.Google Scholar
  74. 74.
    Wong E, Liu T, Tan L. Clocom: Mining existing source code for automatic comment generation. In Proc. the 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2015, pp.380-389.Google Scholar
  75. 75.
    Jones K S, Galliers J R. Evaluating Natural Language Processing Systems: An Analysis and Review. Springer-Verlag Berlin Heidelberg, 1995.Google Scholar
  76. 76.
    Nenkova A, McKeown K. Automatic summarization. Foundations and Trends in Information Retrieval, 2011, 5(2/3):103-233.Google Scholar
  77. 77.
    Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37-46.Google Scholar
  78. 78.
    Nenkova A, Passonneau R J. Evaluating content selection insummarization: The pyramid method. In Proc. the Human Language Technology/North American Chapter of the Association for Computational Linguistics, May 2004, pp.145-152.Google Scholar
  79. 79.
    Kitchenham B, Brereton P. A systematic review of systematic review process research in software engineering. Information and Software Technology, 2013, 55(12): 2049-2075.Google Scholar
  80. 80.
    Mesquida A L, Mas A, Amengual E, Calvo-Manzano J A. It service management process improvement based on ISO/IEC 15504: A systematic review. Information and Software Technology, 2012, 54(3): 239-247.Google Scholar
  81. 81.
    Shihab E, Jiang Z M, Hassan A E. Studying the use of developer IRC meetings in open source projects. In Proc. the IEEE International Conference on Software Maintenance, Nov. 2009, pp.147-156.Google Scholar
  82. 82.
    Guzzi A, Begel A, Miller J K, Nareddy K. Facilitating enterprise software developer communication with cares. In Proc. the 28th IEEE International Conference on Software Maintenance (ICSM), Sept. 2012, pp.527-536.Google Scholar
  83. 83.
    Ponzanelli L, Mocci A, Lanza M. Summarizing complex development artifacts by mining heterogeneous data. In Proc. the 12th IEEE/ACM Working Conference on Mining Software Repositories, May 2015, pp.401-405.Google Scholar
  84. 84.
    Zhao Y, Zhu Q. Evaluation on crowdsourcing research: Current status and future direction. Information Systems Frontiers, 2014, 16(3): 417-434.Google Scholar
  85. 85.
    Howe J. The rise of crowdsourcing. http://www.wired.com/2006/06/crowds/, July 2006.Google Scholar
  86. 86.
    Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20-22.Google Scholar
  87. 87.
    Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15-28.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of SoftwareDalian University of TechnologyDalianChina
  2. 2.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina

Personalised recommendations