Automated Software Engineering

, Volume 23, Issue 2, pp 147–190 | Cite as

Semantic tagging and linking of software engineering social content

  • Ebrahim Bagheri
  • Faezeh Ensan


Social online communities and platforms play a significant role in the activities of software developers either as an integral part of the main activities or through complimentary knowledge and information sharing. As such techniques become more prevalent resulting in a wealth of shared information, the need to effectively organize and sift through the information becomes more important. Top-down approaches such as formal hierarchical directories have shown to lack scalability to be applicable to these circumstanes. Light-weight bottom-up techniques such as community tagging have shown promise for better organizing the available content. However, in more focused communities of practice, such as software engineering and development, community tagging can face some challenges such as tag explosion, locality of tags and interpretation differences, to name a few. To address these challenges, we propose a semantic tagging approach that benefits from the information available in Wikipedia to semantically ground the tagging process and provide a methodical approach for tagging social software engineering content. We have shown that our approach is able to provide high quality tags for social software engineering content that can be used not only for organizing such content but also for making meaningful and relevant content recommendation to the users both within a local community and also across multiple social online communities. We have empirically validated our approach through four main research questions. The results of our observations show that the proposed approach is quite effective in organizing social software engineering content and making relevant, helpful and novel content recommendations to software developers and users of social software engineering communities.


Semantic tagging Q&A websites Social software engineering Community interlinking Web 2.0 


  1. Achananuparp, P., Lubis, I.N., Tian, Y., Lo, D., Lim, E.-P.: Observatory of trends in software related microblogs. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 334–337 (2012)Google Scholar
  2. Al-Kofahi, J.M., Tamrawi, A., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Fuzzy set approach for automatic tagging in evolving software. In: IEEE International Conference on Software Maintenance (ICSM), pp. 1–10 (2010)Google Scholar
  3. Bagheri, E., Ensan, F., Gasevic, D.: Decision support for the software product line domain engineering lifecycle. Autom. Softw. Eng. 19(3), 335–377 (2012)CrossRefGoogle Scholar
  4. Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Softw. Eng. 14, 1–36 (2012)Google Scholar
  5. Basili, V., Shull, F., Lanubile, F.: Building knowledge through families of experiments. IEEE Trans. Softw. Eng. 25(4), 456–473 (1999)CrossRefGoogle Scholar
  6. Begel, A., DeLine, R., Zimmermann, T.: Social media for software engineering. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering research, ACM, pp. 33–38 (2010)Google Scholar
  7. Begel, A., Khoo, Y.P., Zimmermann, T.: Codebook: discovering and exploiting relationships in software repositories. In: ACM/IEEE 32nd International Conference on Software Engineering, vol. 1, pp. 125–134 (2010)Google Scholar
  8. Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, ACM, pp. 139–146. New York, NY (2009)Google Scholar
  9. Chi, E.H., Mytkowicz, T.: Understanding the efficiency of social tagging systems using information theory. In: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, ACM, pp. 81–88 (2008)Google Scholar
  10. Frost, R.: Jazz and the eclipse way of collaboration. Softw. IEEE 24(6), 114–117 (2007)CrossRefGoogle Scholar
  11. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)Google Scholar
  12. Genero, M., Poels, G., Piattini, M.: Defining and validating metrics for assessing the understandability of entity-relationship diagrams. Data Knowl. Eng. 64(3), 534–557 (2008)CrossRefGoogle Scholar
  13. Gómez, C., Cleary, B., Singer, L.: A study of innovation diffusion through link sharing on stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, IEEE Press (2013)Google Scholar
  14. Gottipati, S., Lo, D., Jiang, J.: Finding relevant answers in software forums. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, pp. 323–332 (2011)Google Scholar
  15. Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, ACM, pp. 902–903 (2005)Google Scholar
  16. Guy, M., Tonkin, E.: Tidying up tags. D-Lib Mag 12(1), 1082–9873 (2006)Google Scholar
  17. Hale, M., Jorgenson, N., Gamble, R.: Analyzing the role of tags as lightweight traceability links. In: Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering, ACM, pp. 71–74 (2011)Google Scholar
  18. Hassan, A.E., Xie, T.: Software intelligence: the future of mining software engineering data. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, ACM, pp. 161–166 (2010)Google Scholar
  19. Hubbard, D., Evans, D.: Problems with scoring methods and ordinal scales in risk assessment. IBM J. Res. Dev. 54(3), 2–10 (2010)CrossRefGoogle Scholar
  20. Kittur, A., Chi, E., Pendleton, B.A., Suh, B., Mytkowicz, T.: Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World Wide Web 1(2), 19 (2007)Google Scholar
  21. Lee, D.L., Chuang, H., Seamons, K.: Document ranking and the vector-space model. Softw. IEEE 14(2), 67–75 (1997)CrossRefGoogle Scholar
  22. Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F.: Content-based and collaborative techniques for tag recommendation: an empirical evaluation. J. Intell. Inf. Syst. 40(1), 41–61 (2013)CrossRefGoogle Scholar
  23. Lozano, L.M., García-Cueto, E., Mu niz, J.: Effect of the number of response categories on the reliability and validity of rating scales. Methodol. Eur. J. Res. Methods Behav. Social Sci. 4(2), 73–79 (2008)Google Scholar
  24. Pagano, D., Maalej, W.: How do open source communities blog? Empirical Softw. Eng. 15, 1–35 (2012)Google Scholar
  25. Pal, A., Harper, F.M., Konstan, J.A.: Exploring question selection bias to identify experts and potential experts in community question answering. ACM Trans. Inf. Syst. 30(2), 10:1–10:28 (2012)CrossRefGoogle Scholar
  26. Pandita, R., Xiao, X., Zhong, H., Xie, T., Oney, S., Paradkar, A.: Inferring method specifications from natural language api descriptions. In: Proceedings of the 2012 International Conference on Software Engineering, IEEE Press, pp. 815–825 (2012)Google Scholar
  27. Parnin, C., Treude, C., Grammel, L., Storey, M.-A.: Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Technical Report, Georgia Institute of Technology (2012)Google Scholar
  28. Pollock, L.: Leveraging natural language analysis of software: Achievements, challenges, and opportunities. In: 28th IEEE International Conference on Software Maintenance (ICSM), IEEE, pp. 4–4 (2012)Google Scholar
  29. Ponzanelli, L., Bacchelli, A., Lanza, M.: Seahawk: stack overflow in the ide. In: Proceedings of ICSE, pp. 1295–1298 (2013)Google Scholar
  30. Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. (JAIR) 30, 181–212 (2007)zbMATHGoogle Scholar
  31. Posnett, D., Warburg, E., Devanbu, P.T., Filkov, V.: Mining stack exchange: Expertise is evident from initial contributions. In: International Conference on Social Informatics, pp. 199–204 (2012)Google Scholar
  32. Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  33. Sawadsky, N., Murphy, G.C., Jiresal, R.: Reverb: recommending code-related web pages. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 812–821 (2013)Google Scholar
  34. Serrano, M.A., Calero, C., Sahraoui, H.A., Piattini, M.: Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw. Qual. J. 16(1), 79–106 (2008)CrossRefGoogle Scholar
  35. Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, ACM, pp. 327–336. New York, NY (2008)Google Scholar
  36. Sinclair, J., Cardew-Hall, M.: The folksonomy tag cloud: when is it useful? J. Inf. Sci. 34(1), 15–29 (2008)CrossRefGoogle Scholar
  37. Singer, L., Schneider, K.: Influencing the adoption of software engineering methods using social software. In: 34th International Conference on Software Engineering (ICSE), IEEE, pp. 1325–1328 (2012)Google Scholar
  38. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 21–29 (1996)Google Scholar
  39. Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social media-sentiment of microblogs and sharing behavior. J. Manage. Inf. Syst. 29(4), 217–248 (2013)CrossRefGoogle Scholar
  40. Storey, M.-A., Treude, C., van Deursen, A., Cheng, L.-T.: The impact of social media on software engineering practices and tools. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, ACM, pp. 359–364 (2010)Google Scholar
  41. Strandberg, K.: A social media revolution or just a case of history repeating itself? the use of social media in the 2011 Finnish parliamentary elections. New Media & Society (2013)Google Scholar
  42. Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. AAAI 6, 1419–1424 (2006)Google Scholar
  43. Tian, Y., Achananuparp, P., Lubis, I.N., Lo, D., Lim, E.-P.: What does software engineering community microblog about? In: 9th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 247–250 (2012)Google Scholar
  44. Treude, C., Barzilay, O., Storey, M.-A.: How do programmers ask and answer questions on the web?: Nier track. In: 33rd International Conference on Software Engineering (ICSE), IEEE, pp. 804–807 (2011)Google Scholar
  45. Treude, C., Storey, M.-A.: Work item tagging: communicating concerns in collaborative software development. IEEE Trans. Softw. Eng. 38(1), 19–34 (2012)CrossRefGoogle Scholar
  46. Treude, C., Storey, M.-A.D.: Bridging lightweight and heavyweight task organization: the role of tags in adopting new task categories. ICSE 2, 231–234 (2010)Google Scholar
  47. Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive tagging: a survey of multimedia tagging with human–computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)CrossRefGoogle Scholar
  48. Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: 28th IEEE International Conference on Software Maintenance (ICSM), IEEE, pp. 604–607 (2012)Google Scholar
  49. Wartena, C., Brussee, R., Wibbels, M.: Using tag co-occurrence for recommendation. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA’09, IEEE, pp. 273–278 (2009)Google Scholar
  50. Xia, X., Lo, D., Wang, X., Zhou, B.: Tag recommendation in software information sites. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pp. 287–296 (2013)Google Scholar
  51. Zangerle, E., Gassler, W., Specht, G. Using tag recommendations to homogenize folksonomies in microblogging environments. In: Proceedings of the Third International Conference on Social Informatics, SocInfo’11, pp. 113–126. Springer-Verlag, Berlin, Heidelberg (2011)Google Scholar
  52. Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8 (2007)Google Scholar
  53. Zhou, A., Qian, W., Ma, H.: Social media data analysis for revealing collective behaviors. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 1402–1402 (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Ryerson UniversityTorontoCanada
  2. 2.Athabasca UniversityAthabascaCanada

Personalised recommendations