Skip to main content
Log in

FastTagRec: fast tag recommendation for software information sites

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Software information sites such as StackOverflow and Freeecode enable information sharing and communication for developers around the world. To facilitate correct classification and efficient search, developers need to provide tags for their postings. However, tagging is inherently an uncoordinated process that depends not only on developers’ understanding of their own postings but also on other factors, including developers’ English skills and knowledge about existing postings. As a result, developers keep creating new tags even though existing tags are sufficient. The net effect is an ever increasing number of tags with severe redundancy along with more postings over time. Any algorithms based on tags become less efficient and accurate. In this paper we propose FastTagRec, an automated scalable tag recommendation method using neural network-based classification. By learning existing postings and their tags from existing information, FastTagRec is able to very accurately infer tags for new postings. We have implemented a prototype tool and carried out experiments on ten software information sites. Our results show that FastTagRec is not only more accurate but also three orders of magnitude faster than the comparable state-of-the-art tool TagMulRec. In addition to empirical evaluation, we have also conducted an user study which successfully confirms the usefulness of of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Al-Kofahi, J.M., Tamrawi, A., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Fuzzy set approach for automatic tagging in evolving software. In: International Conference on Software Maintenance, pp. 1–10. IEEE (2010)

  • Begel, A., DeLine, R., Zimmermann, T.: Social media for software engineering. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, pp. 33–38. ACM (2010)

  • Behley, J., Steinhage, V., Cremers, A.B.: Laser-based segment classification using a mixture of bag-of-words. In: International Conference on Intelligent Robots and Systems, pp. 4195–4200. IEEE (2013)

  • Beyer, S., Pinzger, M.: Synonym suggestion for tags on stack overflow. In: Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, pp. 94–103. IEEE Press (2015)

  • Beyer, S., Pinzger, M.: Grouping android tag synonyms on stack overflow. In: IEEE/ACM 13th Working Conference on Mining Software Repositories, pp. 430–440. IEEE (2016)

  • Bishop, C.M.: Pattern recognition. Mach. Learn. 128, 1–58 (2006)

    Google Scholar 

  • Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-scale question classification in cqa by leveraging wikipedia semantic knowledge. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1321–1330. ACM (2011)

  • Fowkes, J., Sutton, C.: Parameter-free probabilistic api mining across github. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 254–265. ACM (2016)

  • Goodman, J.: Classes for fast maximum entropy training. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 561–564. IEEE (2001)

  • Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642. ACM (2016)

  • Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)

    Article  Google Scholar 

  • Hou, D., Mo, L.: Content categorization of api discussions. In: International Conference on Software Maintenance, pp. 60–69. IEEE (2013)

  • Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G.: Tag recommendations in folksonomies. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 506–514. Springer (2007)

  • Macbeth, G., Razumiejczyk, E., Ledesma, R.D.: Cliffs delta calculator: a non-parametric effect size program for two groups of observations. Univ. Psychol. 10(2), 545–555 (2011)

    Google Scholar 

  • Michaud, H.M., Guarnera, D.T., Collard, M.L., Maletic, J.I.: Recovering commit branch of origin from github repositories. In: International Conference on Software Maintenance and Evolution, pp. 290–300. IEEE (2016)

  • Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 196–201. IEEE (2011)

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 (2013)

  • Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM (2010)

  • Robillard, M.P., Medvidović, N.: Disseminating architectural knowledge on open-source projects: a case study of the book architecture of open-source applications. In: Proceedings of the 38th International Conference on Software Engineering, pp. 476–487. ACM (2016)

  • Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp. 327–336. ACM (2008)

  • Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)

    Article  Google Scholar 

  • Storey, M.A., Treude, C., van Deursen, A., Cheng, L.T.: The impact of social media on software engineering practices and tools. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, pp. 359–364. ACM (2010)

  • Thung, F., Lo, D., Jiang, L.: Detecting similar applications with collaborative tagging. In: IEEE International Conference on Software Maintenance, pp. 600–603. IEEE (2012)

  • Thung, F., Kochhar, PS., Lo, D.: Dupfinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 871–874. ACM (2014)

  • Treude, C., Robillard, M.P.: Augmenting api documentation with insights from stack overflow. In: Proceedings of the 38th International Conference on Software Engineering, pp. 392–403. ACM (2016)

  • Treude, C., Storey, M.A.: How tagging helps bridge the gap between social and technical aspects in software development. In: Proceedings of the 31st International Conference on Software Engineering, pp. 12–22. IEEE Computer Society (2009)

  • Tsoumakas, G., Katakis, I.: Multi-Label Classification: An Overview. Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki (2006)

    Google Scholar 

  • Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)

  • Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: International Conference on Software Maintenance, pp. 604–607. IEEE (2012)

  • Wang, H., Chen, B., Li, W.J.: Collaborative topic regression with social regularization for tag recommendation. In: International Joint Conference on Artificial Intelligence, pp. 2719–2725. ACM (2013a)

  • Wang, Q., Ruan, L., Zhang, Z., Si, L.: Learning compact hashing codes for efficient tag completion and prediction. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1789–1794. ACM (2013b)

  • Wang, S., Lo, D., Vasilescu, B., Serebrenik, A.: Entagrec: an enhanced tag recommendation system for software information sites. In: ICSME, pp. 291–300 (2014)

  • White, M., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 334–345. IEEE (2015)

  • White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 87–98. ACM (2016)

  • Xia, X., Lo, D., Wang, X., Zhou, B.: Tag recommendation in software information sites. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 287–296. IEEE Press (2013)

  • Xia, X., Feng, Y., Lo, D., Chen, Z., Wang, X.: Towards more accurate multi-label software behavior learning. In: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering, pp. 134–143. IEEE (2014)

  • Xia, X., Lo, D., Wang, X., Zhou, B.: Dual analysis for recommending developers to resolve bugs. J. Softw. Evol. Process 27(3), 195–220 (2015)

    Article  Google Scholar 

  • Xia, X., Lo, D., Ding, Y., Al-Kofahi, J.M., Nguyen, T.N., Wang, X.: Improving automated bug triaging with specialized topic model. IEEE Trans. Softw. Eng. 43(3), 272–297 (2017)

    Article  Google Scholar 

  • Xia, X., Lo, D., Pan, S.J., Nagappan, N., Wang, X.: Hydra: massively compositional model for cross-project defect prediction. IEEE Trans. Softw. Eng. 42(10), 977–998 (2016a)

    Article  Google Scholar 

  • Xia, X., Lo, D., Wang, X., Yang, X.: Collective personalized change classification with multiobjective search. IEEE Trans. Reliab. 65(4), 1810–1829 (2016b)

    Article  Google Scholar 

  • Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 51–62. ACM (2016)

  • Yang, L., Qiu, M., Gottipati, S., Zhu, F., Jiang, J., Sun, H., Chen, Z.: Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 99–108. ACM (2013)

  • Yang, D., Xiao, Y., Song, Y., Zhang, J., Zhang, K., Wang, W.: Tag propagation based recommendation across diverse social media. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 407–408. ACM (2014)

  • Yang, D., Xiao, Y., Tong, H., Zhang, J., Wang, W.: An integrated tag recommendation algorithm towards weibo user profiling. In: International Conference on Database Systems for Advanced Applications, pp. 353–373. Springer (2015)

  • Yin, D., Xue, Z., Hong, L., Davison, B.D.: A probabilistic model for personalized tag prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 959–968. ACM (2010)

  • Zangerle, E., Gassler, W., Specht, G.: Using tag recommendations to homogenize folksonomies in microblogging environments. In: International Conference on Social Informatics, pp. 113–126. Springer (2011)

  • Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  • Zhao, Z., Zhang, L., He, X., Ng, W.: Expert finding for question answering via graph regularized matrix completion. IEEE Trans. Knowl. Data Eng. 27(4), 993–1004 (2015)

    Article  Google Scholar 

  • Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2017)

  • Zimmermann, T., Nagappan, N.: Predicting defects using network analysis on dependency graphs. In: International Conference on Software Engineering, pp. 531–540. IEEE (2008)

Download references

Acknowledgements

The authors would like to acknowledge the support provided by the National Key Research and Development Program of China (2017YFB1400602, 2018YFB1003800), the National Natural Science Foundation of China (61572374, U163620068, 61472423), Open Fund of Key Laboratory of Network Assessment Technology from CAS, and Academic Team Building Plan for Young Scholars from Wuhan University (WHU2016012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Zhou, P., Yang, Z. et al. FastTagRec: fast tag recommendation for software information sites. Autom Softw Eng 25, 675–701 (2018). https://doi.org/10.1007/s10515-018-0239-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-018-0239-4

Keywords

Navigation