Frontiers of Computer Science

, Volume 8, Issue 1, pp 69–82 | Cite as

Tag recommendation for open source software

  • Tao Wang
  • Huaimin Wang
  • Gang Yin
  • Charles X. Ling
  • Xiao Li
  • Peng Zou
Research Article

Abstract

Nowadays open source software becomes highly popular and is of great importance for most software engineering activities. To facilitate software organization and retrieval, tagging is extensively used in open source communities. However, finding the desired software through tags in these communities such as Freecode and ohloh is still challenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effective algorithm to recommend tags for software. With comprehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.

Keywords

open source software semantic graph tag recommendation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang T, Yin G, Li X, Wang H. Labeled topic detection of open source software from mining mass textual project profiles. In: Proceedings of the ACM SIGKDD Workshop on Software Mining. 2012, 17–24CrossRefGoogle Scholar
  2. 2.
    Tang J, Leung H, Luo Q, Chen D, Gong J. Towards ontology learning from folksonomies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 9: 2089–2094Google Scholar
  3. 3.
    Liu K, Fang B, Zhang W. Ontology emergence from folksonomies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1109–1118Google Scholar
  4. 4.
    Wang W, Barnaghi P M, Bargiela A. Probabilistic topic models for learning terminological ontologies. IEEE Transactions on knowledge and Data engineering, 2010, 22(7): 1028–1040CrossRefGoogle Scholar
  5. 5.
    Griffiths T, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228–5235CrossRefGoogle Scholar
  6. 6.
    Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 247–256CrossRefGoogle Scholar
  7. 7.
    Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 327–336CrossRefGoogle Scholar
  8. 8.
    Song Y, Zhang L, Giles C. Automatic tag recommendation algorithms for social recommender systems. ACM Transactions on theWeb, 2011, 5(1): 4:1–4:31Google Scholar
  9. 9.
    Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74CrossRefGoogle Scholar
  10. 10.
    Alexopoulos P, Pavlopoulos J, Wallace M, Kafentzis K. Exploiting ontological relations for automatic semantic tag recommendation. In: Proceedings of the 7th International Conference on Semantic Systems. 2011, 105–110Google Scholar
  11. 11.
    Djuana E, Xu Y, Li Y, Cox C. Personalization in tag ontology learning for recommendation making. In: Proceedings of the 14th International Conference on Information Integration and Web-based Applications and Services. 2012, 368–377Google Scholar
  12. 12.
    Kawaguchi S, Garg P, Matsushita M, Inoue K. Mudablue: an automatic categorization system for open source repositories. Journal of Systems and Software, 2006, 79(7): 939–953CrossRefGoogle Scholar
  13. 13.
    Kuhn A. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 175–178Google Scholar
  14. 14.
    McMillan C, Linares-Vásquez M, Poshyvanyk D, Grechanik M. Categorizing software applications for maintenance. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 343–352Google Scholar
  15. 15.
    Tian K, Revelle M, Poshyvanyk D. Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 163–166Google Scholar
  16. 16.
    Blei D, Ng A, Jordan M. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022MATHGoogle Scholar
  17. 17.
    Wang Y, Agichtein E, Benzi M. Tm-lda: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 123–131CrossRefGoogle Scholar
  18. 18.
    Cleary B, Exton C, Buckley J, English M. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering, 2009, 14(1): 93–130CrossRefGoogle Scholar
  19. 19.
    Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P. Mining concepts from code with probabilistic topic models. In: Proceedings of the 22nd IEEE/ACMInternational Conference on Automated Software Engineering. 2007, 461–464Google Scholar
  20. 20.
    Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C. A search engine for finding highly relevant applications. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 475–484CrossRefGoogle Scholar
  21. 21.
    Zhou J, Zhang H, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 14–24Google Scholar
  22. 22.
    Si X, Sun M. Tag-LDA for scalable real-time tag recommendation. Journal of Computational Information Systems, 2009, 6(1): 23–31Google Scholar
  23. 23.
    Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM Conference on Recommender Systems. 2009, 61–68Google Scholar
  24. 24.
    Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G. Tag recommendations in social bookmarking systems. AI Communications, 2008, 21(4): 231–247MATHMathSciNetGoogle Scholar
  25. 25.
    Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W C, Giles C L. Real-time automatic tag recommendation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 515–522Google Scholar
  26. 26.
    Adrian B, Sauermann L, Roth-berghofer T. Contag: A semantic tag recommendation system. In: Proceedings of IMEDIA 2007 and ISEMANTICS 2007. 2007, 297–304Google Scholar
  27. 27.
    Prokofyev R, Boyarsky A, Ruchayskiy O, Aberer K, Demartini G, Cudré-Mauroux P. Tag recommendation for large-scale ontology-based information systems. In: Proceedings of the 11th International Conference on the Semantic Web. 2012, 325–336Google Scholar
  28. 28.
    Wartena C, Brussee R, Wibbels M. Using tag co-occurrence for recommendation. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications. 2009, 273–278Google Scholar
  29. 29.
    Krestel R, Fankhauser P. Tag recommendation using probabilistic topic models. In: Proceedings of the 2009 Discovery Challenge. 2009, 131–141Google Scholar
  30. 30.
    Asuncion H U, Asuncion A U, Taylor R N. Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 95–104CrossRefGoogle Scholar
  31. 31.
    Ramage D, Rosen E, Chuang J, Manning C D, McFarland D A. Topic modeling for the social sciences. In: Proceedings of NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond. 2009, 1–4Google Scholar
  32. 32.
    Somasundaram K, Murphy G C. Automatic categorization of bug reports using latent dirichlet allocation. In: Proceedings of the 5th India Software Engineering Conference. 2012, 125–130Google Scholar
  33. 33.
    Ramage D, Hall D, Nallapati R, Manning C. Labeled lDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 248–256Google Scholar
  34. 34.
    McCallum A. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002Google Scholar
  35. 35.
    Porter M F. An algorithm for suffix stripping. Program: electronic library and information systems, 1980, 14(3): 130–137CrossRefGoogle Scholar
  36. 36.
    Lewis D D, Yang Y, Rose T G, Li F. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 2004, 5: 361–397Google Scholar
  37. 37.
    FariÃ’sa A, Brisaboa N R, Navarro G, Claude F, Places n S, RodrÃguez E. Word-based self-indexes for natural language text. ACM Transactions on Information Systems, 2012, 30(1): 1:1–1:34Google Scholar
  38. 38.
    Batagelj V, ZaverAnik M. Generalized cores. Arxiv preprint cs/0202039, 2002Google Scholar
  39. 39.
    Gemmell J, Ramezani M, Schimoler T, Christiansen L, Mobasher B. A fast effective multi-channeled tag recommender. In: Proceedings of the 2009 Discovery Challenge Workshop. 2009, 497: 59–63Google Scholar
  40. 40.
    Gemmell J, Schimoler T, Ramezani M, Mobasher B. Adapting knearest neighbor for tag recommendation in folksonomies. In: Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems. 2009, 628: Paper 8Google Scholar
  41. 41.
    Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74CrossRefGoogle Scholar
  42. 42.
    Illig J, Hotho A, JÃd’schke R, Stumme G. A comparison of content-based tag recommendations in folksonomy systems. Lecture Notes in Computer Science, 2011, 6581: 136–149CrossRefGoogle Scholar
  43. 43.
    Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In: Proceedings of the 28th IEEE International Confer ence on Software Maintenance. 2012, 600–603Google Scholar
  44. 44.
    Mockus A. Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 11–20Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Tao Wang
    • 1
  • Huaimin Wang
    • 1
  • Gang Yin
    • 1
  • Charles X. Ling
    • 2
  • Xiao Li
    • 1
    • 2
  • Peng Zou
    • 1
    • 3
  1. 1.National Laboratory for Parallel and Distributed Processing, College of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.Department of Computer ScienceThe University of Western OntarioLondonCanada
  3. 3.Academy of EquipmentBeijingChina

Personalised recommendations