Empirical Software Engineering

, Volume 23, Issue 2, pp 800–832 | Cite as

EnTagRec ++: An enhanced tag recommendation system for software information sites

  • Shaowei Wang
  • David Lo
  • Bogdan Vasilescu
  • Alexander Serebrenik


Software engineers share experiences with modern technologies using software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. Tags help to improve the organization of questions and simplify the browsing of questions for users. However, tags assigned to objects tend to be noisy and some objects are not well tagged. For instance, 14.7% of the questions that were posted in 2015 on Stack Overflow needed tag re-editing after the initial assignment. To improve the quality of tags in software information sites, we propose EnTagRec ++, which is an advanced version of our prior work EnTagRec. Different from EnTagRec, EnTagRec ++ does not only integrate the historical tag assignments to software objects, but also leverages the information of users, and an initial set of tags that a user may provide for tag recommendation. We evaluate its performance on five software information sites, Stack Overflow, Ask Ubuntu, Ask Different, Super User, and Freecode. We observe that even without considering an initial set of tags that a user provides, it achieves Recall@5 scores of 0.821, 0.822, 0.891, 0.818 and 0.651, and Recall@10 scores of 0.873, 0.886, 0.956, 0.887 and 0.761, on Stack Overflow, Ask Ubuntu, Ask Different, Super User, and Freecode, respectively. In terms of Recall@5 and Recall@10, averaging across the 5 datasets, it improves upon TagCombine, which is the prior state-of-the-art approach, by 29.3% and 14.5% respectively. Moreover, the performance of our approach is further boosted if users provide some initial tags that our approach can leverage to infer additional tags: when an initial set of tags is given, Recall@5 is improved by 10%.


Software information sites Recommendation systems Tagging 


  1. Al-Kofahi JM, Tamrawi A, Nguyen TT, Nguyen HA, Nguyen TN (2010) Fuzzy set approach for automatic tagging in evolving software ICSM, pp 1–10Google Scholar
  2. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRefGoogle Scholar
  3. Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling ICSE, pp 95–104Google Scholar
  4. Baldi P, Lopes CV, Linstead E, Bajracharya SK (2008) A theory of aspects as latent topics OOPSLA, pp 543–562Google Scholar
  5. Bazelli B, Hindle A, Stroulia E (2013) On the personality traits of stackoverflow users. In: 2013 IEEE international conference on software maintenance, pp 460–463Google Scholar
  6. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188MathSciNetCrossRefzbMATHGoogle Scholar
  7. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. JMLR 13:281–305MathSciNetzbMATHGoogle Scholar
  8. Bindelli S, Criscione C, Curino C, Drago ML, Eynard D, Orsi G (2008) Improving search and navigation by combining ontologies and social tags. In: On the move to meaningful internet systems, OTM 2008 Workshops, OTM confederated international workshops and posters, ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET, OnToContent + QSI, ORM, PerSys, RDDS, SEMELS, and SWWS 2008, Monterrey, Mexico, November 9-14, 2008. Proceedings, pp 76–85Google Scholar
  9. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. JMLR, 993–1022Google Scholar
  10. Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code CHI. ACM, pp 1589–1598Google Scholar
  11. Cabot J, Izquierdo JLC, Cosentino V, Rolandi B (2015) Exploring the use of labels to categorize issues in open-source software projects. In: 22nd IEEE international conference on software analysis, evolution, and reengineering, SANER 2015. Montreal, QC, Canada, March 2-6, 2015, pp 550–554Google Scholar
  12. Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762CrossRefGoogle Scholar
  13. Cress U, Held C, Kimmerle J (2013) The collective knowledge of social tags: direct and indirect influences on navigation, learning, and information processing. Comput Educ 60(1):59–73CrossRefGoogle Scholar
  14. Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11(6):453–482CrossRefGoogle Scholar
  15. Gelman A, Carlin J, Stern H, Rubin D (2003) Bayesian data analysis. CRC PressGoogle Scholar
  16. Ghamrawi N, McCallum A (2005) Collective multi-label classification CIKM, pp 195–200Google Scholar
  17. Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–206CrossRefGoogle Scholar
  18. Grissom RJ, Kim JJ (2005) Effect sizes for research. A broad practical approachGoogle Scholar
  19. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann Publishers IncGoogle Scholar
  20. Held C, Kimmerle J, Cress U (2012) Learning by foraging: the impact of individual knowledge and social tags on web navigation processes. Comput Hum Behav 28(1):34–40CrossRefGoogle Scholar
  21. Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, SOMA ’10, pp 80–88Google Scholar
  22. Jäschke R, Marinho LB, Hotho A, Schmidt-Thieme L, Stumme G (2007) Tag recommendations in folksonomies PKDDGoogle Scholar
  23. Jmac (2013) Select and display ‘suggested tags’ for all posts based on related questions (or other logic).
  24. Joorabchi A, English M, Mahdi AE (2015) Automatic mapping of user tags to wikipedia concepts: the case of a q&a website âĂŞ stackoverflow. J Inf Sci 41 (5):570–583CrossRefGoogle Scholar
  25. Her J (2011) Tag recommendations for stack overflow.
  26. Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990CrossRefGoogle Scholar
  27. Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, Lucia AD (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms ICSE, pp 522–531Google Scholar
  28. Pletea D, Vasilescu B, Serebrenik A (2014) Security and emotion: Sentiment analysis of security discussions on github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, pp 348–351Google Scholar
  29. Porter MF (1997) An algorithm for suffix stripping Readings in information retrieval. Morgan Kaufmann, pp 313–316Google Scholar
  30. Puurula A (2011) Mixture models for multi-label text classification. In: 10th New Zealand computer science research student conferenceGoogle Scholar
  31. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP ’09, pp 248–256Google Scholar
  32. Rebouças M, Pinto G, Ebert F, Torres W, Serebrenik A, Castor F (2016) An empirical study on the usage of the swift programming language. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 634–638Google Scholar
  33. Samaniego FI (2010) A comparison of the bayesian and frequentist approaches to estimation. Series in Statistics, SpringerGoogle Scholar
  34. Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation MSRGoogle Scholar
  35. Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge WWW ’08, pp 327–336Google Scholar
  36. Storey M-A, Ryall J, Singer J, Myers D, Cheng L-T, Muller M (2009) How software developers use tagging to support reminding and refinding. IEEE Trans Softw Eng 35(undefined):470–483CrossRefGoogle Scholar
  37. Storey M-A, Treude C, van Deursen A, Cheng L-T (2010) The impact of social media on software engineering practices and tools. In: FoSER ’10, pp 359–364Google Scholar
  38. Thung F, Lo D, Jiang L (2012) Detecting similar applications with collaborative tagging. In: ICSM, pp 600–603Google Scholar
  39. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACLGoogle Scholar
  40. Treude C, Storey M-A (2009) How tagging helps bridge the gap between social and technical aspects in software development. In: ICSE ’09, pp 12–22Google Scholar
  41. Treude C, Storey M-A (2012) Work item tagging: communicating concerns in collaborative software development. IEEE Trans Softw Eng 38(1):19–34CrossRefGoogle Scholar
  42. Vasilescu B, Serebrenik A, Devanbu PT, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: CSCW, pp 342–354Google Scholar
  43. Vasilescu B, Serebrenik A, van den Brand MGJ (2013) The babel of software development: linguistic diversity in open source. In: Jatowt A, Lim E-P, Ding Y, Miura A, Tezuka T, Dias G, Tanaka K, Flanagin A, Dai BT (eds) Proceedings of the social informatics: 5th international conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013. Springer International Publishing, pp 391–404Google Scholar
  44. Vogt CC, Cottrell GW (1999) Fusion via a linear combination of scores. Inf Retr 1(3):151–173CrossRefGoogle Scholar
  45. Wang S, Lo D, Jiang L (2012) Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: ICSM, pp 604–607Google Scholar
  46. Wang S, Lo D, Vasilescu B, Serebrenik A (2014) EnTagRec: an enhanced tag recommendation system for software information sites. In: 30th IEEE international conference on software maintenance and evolution, Victoria, BC, Canada, September 29 - October 3, 2014. IEEE Computer Society, pp 291– 300Google Scholar
  47. Wang W, Niu N, Liu H, Wu Y (2015) Tagging in assisted tracing. In: 2015 IEEE/ACM 8th international symposium on software and systems traceability, pp 8–14Google Scholar
  48. Wang X-Y, Xia X, Lo D (2015) Tagcombine: recommending tags to contents in software information sites. J Comput Sci Technol 30(5):1017–1035CrossRefGoogle Scholar
  49. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1 (4):80–83CrossRefGoogle Scholar
  50. Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: MSR ’13, pp 287–296Google Scholar
  51. Zangerle E, Gassler W, Specht G (2011) Using tag recommendations to homogenize folksonomies in microblogging environments. In: SocInfo’11, pp 113–126Google Scholar
  52. Zubiaga A (2012) Enhancing navigation on wikipedia with social tags. CoRR, arXiv:1202.5469

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.SAIL, Queen’s UniversityKingstonCanada
  2. 2.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  3. 3.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  4. 4.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations