Skip to main content
Log in

Semantic tagging and linking of software engineering social content

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Social online communities and platforms play a significant role in the activities of software developers either as an integral part of the main activities or through complimentary knowledge and information sharing. As such techniques become more prevalent resulting in a wealth of shared information, the need to effectively organize and sift through the information becomes more important. Top-down approaches such as formal hierarchical directories have shown to lack scalability to be applicable to these circumstanes. Light-weight bottom-up techniques such as community tagging have shown promise for better organizing the available content. However, in more focused communities of practice, such as software engineering and development, community tagging can face some challenges such as tag explosion, locality of tags and interpretation differences, to name a few. To address these challenges, we propose a semantic tagging approach that benefits from the information available in Wikipedia to semantically ground the tagging process and provide a methodical approach for tagging social software engineering content. We have shown that our approach is able to provide high quality tags for social software engineering content that can be used not only for organizing such content but also for making meaningful and relevant content recommendation to the users both within a local community and also across multiple social online communities. We have empirically validated our approach through four main research questions. The results of our observations show that the proposed approach is quite effective in organizing social software engineering content and making relevant, helpful and novel content recommendations to software developers and users of social software engineering communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The dump can be accessed from StackOverflow website: http://blog.stackoverflow.com/2011/09/creative-commons-data-dump-sep-11/

References

  • Achananuparp, P., Lubis, I.N., Tian, Y., Lo, D., Lim, E.-P.: Observatory of trends in software related microblogs. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 334–337 (2012)

  • Al-Kofahi, J.M., Tamrawi, A., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Fuzzy set approach for automatic tagging in evolving software. In: IEEE International Conference on Software Maintenance (ICSM), pp. 1–10 (2010)

  • Bagheri, E., Ensan, F., Gasevic, D.: Decision support for the software product line domain engineering lifecycle. Autom. Softw. Eng. 19(3), 335–377 (2012)

    Article  Google Scholar 

  • Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Softw. Eng. 14, 1–36 (2012)

    Google Scholar 

  • Basili, V., Shull, F., Lanubile, F.: Building knowledge through families of experiments. IEEE Trans. Softw. Eng. 25(4), 456–473 (1999)

    Article  Google Scholar 

  • Begel, A., DeLine, R., Zimmermann, T.: Social media for software engineering. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering research, ACM, pp. 33–38 (2010)

  • Begel, A., Khoo, Y.P., Zimmermann, T.: Codebook: discovering and exploiting relationships in software repositories. In: ACM/IEEE 32nd International Conference on Software Engineering, vol. 1, pp. 125–134 (2010)

  • Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, ACM, pp. 139–146. New York, NY (2009)

  • Chi, E.H., Mytkowicz, T.: Understanding the efficiency of social tagging systems using information theory. In: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, ACM, pp. 81–88 (2008)

  • Frost, R.: Jazz and the eclipse way of collaboration. Softw. IEEE 24(6), 114–117 (2007)

    Article  Google Scholar 

  • Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)

    Google Scholar 

  • Genero, M., Poels, G., Piattini, M.: Defining and validating metrics for assessing the understandability of entity-relationship diagrams. Data Knowl. Eng. 64(3), 534–557 (2008)

    Article  Google Scholar 

  • Gómez, C., Cleary, B., Singer, L.: A study of innovation diffusion through link sharing on stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, IEEE Press (2013)

  • Gottipati, S., Lo, D., Jiang, J.: Finding relevant answers in software forums. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, pp. 323–332 (2011)

  • Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, ACM, pp. 902–903 (2005)

  • Guy, M., Tonkin, E.: Tidying up tags. D-Lib Mag 12(1), 1082–9873 (2006)

    Google Scholar 

  • Hale, M., Jorgenson, N., Gamble, R.: Analyzing the role of tags as lightweight traceability links. In: Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering, ACM, pp. 71–74 (2011)

  • Hassan, A.E., Xie, T.: Software intelligence: the future of mining software engineering data. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, ACM, pp. 161–166 (2010)

  • Hubbard, D., Evans, D.: Problems with scoring methods and ordinal scales in risk assessment. IBM J. Res. Dev. 54(3), 2–10 (2010)

    Article  Google Scholar 

  • Kittur, A., Chi, E., Pendleton, B.A., Suh, B., Mytkowicz, T.: Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World Wide Web 1(2), 19 (2007)

    Google Scholar 

  • Lee, D.L., Chuang, H., Seamons, K.: Document ranking and the vector-space model. Softw. IEEE 14(2), 67–75 (1997)

    Article  Google Scholar 

  • Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F.: Content-based and collaborative techniques for tag recommendation: an empirical evaluation. J. Intell. Inf. Syst. 40(1), 41–61 (2013)

    Article  Google Scholar 

  • Lozano, L.M., García-Cueto, E., Mu niz, J.: Effect of the number of response categories on the reliability and validity of rating scales. Methodol. Eur. J. Res. Methods Behav. Social Sci. 4(2), 73–79 (2008)

    Google Scholar 

  • Pagano, D., Maalej, W.: How do open source communities blog? Empirical Softw. Eng. 15, 1–35 (2012)

    Google Scholar 

  • Pal, A., Harper, F.M., Konstan, J.A.: Exploring question selection bias to identify experts and potential experts in community question answering. ACM Trans. Inf. Syst. 30(2), 10:1–10:28 (2012)

    Article  Google Scholar 

  • Pandita, R., Xiao, X., Zhong, H., Xie, T., Oney, S., Paradkar, A.: Inferring method specifications from natural language api descriptions. In: Proceedings of the 2012 International Conference on Software Engineering, IEEE Press, pp. 815–825 (2012)

  • Parnin, C., Treude, C., Grammel, L., Storey, M.-A.: Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Technical Report, Georgia Institute of Technology (2012)

  • Pollock, L.: Leveraging natural language analysis of software: Achievements, challenges, and opportunities. In: 28th IEEE International Conference on Software Maintenance (ICSM), IEEE, pp. 4–4 (2012)

  • Ponzanelli, L., Bacchelli, A., Lanza, M.: Seahawk: stack overflow in the ide. In: Proceedings of ICSE, pp. 1295–1298 (2013)

  • Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. (JAIR) 30, 181–212 (2007)

    MATH  Google Scholar 

  • Posnett, D., Warburg, E., Devanbu, P.T., Filkov, V.: Mining stack exchange: Expertise is evident from initial contributions. In: International Conference on Social Informatics, pp. 199–204 (2012)

  • Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  • Sawadsky, N., Murphy, G.C., Jiresal, R.: Reverb: recommending code-related web pages. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, pp. 812–821 (2013)

  • Serrano, M.A., Calero, C., Sahraoui, H.A., Piattini, M.: Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw. Qual. J. 16(1), 79–106 (2008)

    Article  Google Scholar 

  • Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, ACM, pp. 327–336. New York, NY (2008)

  • Sinclair, J., Cardew-Hall, M.: The folksonomy tag cloud: when is it useful? J. Inf. Sci. 34(1), 15–29 (2008)

    Article  Google Scholar 

  • Singer, L., Schneider, K.: Influencing the adoption of software engineering methods using social software. In: 34th International Conference on Software Engineering (ICSE), IEEE, pp. 1325–1328 (2012)

  • Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 21–29 (1996)

  • Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social media-sentiment of microblogs and sharing behavior. J. Manage. Inf. Syst. 29(4), 217–248 (2013)

    Article  Google Scholar 

  • Storey, M.-A., Treude, C., van Deursen, A., Cheng, L.-T.: The impact of social media on software engineering practices and tools. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, ACM, pp. 359–364 (2010)

  • Strandberg, K.: A social media revolution or just a case of history repeating itself? the use of social media in the 2011 Finnish parliamentary elections. New Media & Society (2013)

  • Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. AAAI 6, 1419–1424 (2006)

    Google Scholar 

  • Tian, Y., Achananuparp, P., Lubis, I.N., Lo, D., Lim, E.-P.: What does software engineering community microblog about? In: 9th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, pp. 247–250 (2012)

  • Treude, C., Barzilay, O., Storey, M.-A.: How do programmers ask and answer questions on the web?: Nier track. In: 33rd International Conference on Software Engineering (ICSE), IEEE, pp. 804–807 (2011)

  • Treude, C., Storey, M.-A.: Work item tagging: communicating concerns in collaborative software development. IEEE Trans. Softw. Eng. 38(1), 19–34 (2012)

    Article  Google Scholar 

  • Treude, C., Storey, M.-A.D.: Bridging lightweight and heavyweight task organization: the role of tags in adopting new task categories. ICSE 2, 231–234 (2010)

    Google Scholar 

  • Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive tagging: a survey of multimedia tagging with human–computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)

    Article  Google Scholar 

  • Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: 28th IEEE International Conference on Software Maintenance (ICSM), IEEE, pp. 604–607 (2012)

  • Wartena, C., Brussee, R., Wibbels, M.: Using tag co-occurrence for recommendation. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA’09, IEEE, pp. 273–278 (2009)

  • Xia, X., Lo, D., Wang, X., Zhou, B.: Tag recommendation in software information sites. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pp. 287–296 (2013)

  • Zangerle, E., Gassler, W., Specht, G. Using tag recommendations to homogenize folksonomies in microblogging environments. In: Proceedings of the Third International Conference on Social Informatics, SocInfo’11, pp. 113–126. Springer-Verlag, Berlin, Heidelberg (2011)

  • Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8 (2007)

  • Zhou, A., Qian, W., Ma, H.: Social media data analysis for revealing collective behaviors. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 1402–1402 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ebrahim Bagheri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagheri, E., Ensan, F. Semantic tagging and linking of software engineering social content. Autom Softw Eng 23, 147–190 (2016). https://doi.org/10.1007/s10515-014-0146-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-014-0146-2

Keywords

Navigation