Are tweets useful in the bug fixing process? An empirical study on Firefox and Chrome

Mezouar, Mariam El; Zhang, Feng; Zou, Ying

doi:10.1007/s10664-017-9559-4

Are tweets useful in the bug fixing process? An empirical study on Firefox and Chrome

Published: 09 November 2017

Volume 23, pages 1704–1742, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Mariam El Mezouar¹,
Feng Zhang¹ &
Ying Zou²

822 Accesses
18 Citations
8 Altmetric
Explore all metrics

Abstract

When encountering an issue, technical users (e.g., developers) usually file the issue report to the issue tracking systems. But non-technical end-users are more likely to express their opinions on social network platforms, such as Twitter. For software systems (e.g., Firefox and Chrome) that have a high exposure to millions of non-technical end-users, it is important to monitor and solve issues observed by a large user base. The widely used micro-blogging site (i.e., Twitter) has millions of active users. Therefore, it can provide instant feedback on products to the developers. In this paper, we investigate whether social networks (i.e., Twitter) can improve the bug fixing process by analyzing the short messages posted by end-users on Twitter (i.e., tweets). We propose an approach to remove noisy tweets, and map the remaining tweets to bug reports. We conduct an empirical study to investigate the usefulness of Twitter in the bug fixing process. We choose two widely adopted browsers (i.e., Firefox and Chrome) that are also large and rapidly released software systems. We find that issue reports are not treated differently regardless whether users tweet about the issue or not, except that Firefox developers tend to label an issue as more severe if users tweet about it. The feedback from Firefox contributors confirms that the tweets are not currently leveraged in the bug fixing process, due to the challenges associated to discovering bugs through Twitter. Moreover, we observe that many issues are posted on Twitter earlier than on issue tracking systems. More specifically, at least one third of issues could have been reported to developers 8.2 days and 7.6 days earlier in Firefox and Chrome, respectively. In conclusion, tweets are useful in providing earlier acknowledgment of issues, which developers can potentially use to focus their efforts on the issues impacting a large user-base.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

App store mining is not enough for app improvement

Article 22 February 2018

GitHub as a Social Network

Towards a Cross-Country Analysis of Software-Related Tweets

Notes

References

Aberdour M (2007) Achieving quality in open-source software. IEEE Soft 24 (1):58–64
Article Google Scholar
Ahmadi N, Jazayeri M, Lelli F, Nesic S (2008) A survey of social software engineering. In: ASE Workshops, IEEE, pp 1–12
Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN (2010) On finding the natural number of topics with latent dirichlet allocation: Some observations. In: Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part I, Springer, Berlin, Heidelberg, PAKDD’10, pp 391–402
Chapter Google Scholar
Bacchelli A, Dal Sasso T, D’Ambros M, Lanza M (2012) Content classification of development emails. In: 2012 34th international conference on software engineering (ICSE), IEEE, pp 375–385
Baldi P F, Lopes C V, Linstead E J, Bajracharya S K (2008) A theory of aspects as latent topics. SIGPLAN Not 43(10):543–562
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300
MathSciNet MATH Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Article MathSciNet Google Scholar
Bies A, Ferguson M, Katz K, MacIntyre R, Tredinnick V, Kim G, Marcinkiewicz M A, Schasberger B (1995) Bracketing guidelines for treebank ii style penn treebank project. University of Pennsylvania 97:100
Google Scholar
Bird S (2006) Nltk: The natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, Association for Computational Linguistics, COLING-ACL ’06, pp 69–72
Blei D M, Ng A Y, Jordan M I (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bougie G, Starke J, Storey MA, German DM (2011) Towards understanding twitter use in software engineering: Preliminary findings, ongoing challenges and future questions. In: Proceedings of the 2Nd International Workshop on Web 2.0 for Software Engineering, ACM, Web2SE ’11, pp 31–36
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’00, pp 33–40
Cao J, Xia T, Li J, Zhang Y, Tang S (2009) A density-based method for adaptive lda model selection. Neurocomput 72(7-9):1775–1781
Article Google Scholar
Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) Ar-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th International Conference on Software Engineering, ACM, pp 767–778
Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494–509
Article Google Scholar
Coelho J, Valente M T (2017) Why modern open source projects fail. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE, vol 2017, pp 186–196
Deveaud R, SanJuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17:61–84. 10.3166/DN.17.1.61-84
Article Google Scholar
Fleiss J L (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Gachechiladze D, Lanubile F, Novielli N, Serebrenik A (2017) Anger and its direction in collaborative software development. In: Proceedings of the 39th International Conference on Software Engineering. IEEE Press, New Ideas and Emerging Results Track, pp 11–14
Go A, Huang L, Bhayani R (2009) Twitter sentiment analysis. Entropy 17
Griffiths T L, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Article Google Scholar
Grissom R J, Kim JJ (2005) Effect sizes for research: A broad practical approach. Lawrence Erlbaum Associates Publishers, New Jersey
Google Scholar
Guzzi A, Pinzger M, van Deursen A (2010) Combining micro-blogging and ide interactions to support developers in their quests. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’10, pp 1–5
Hill F, Reichart R, Korhonen A (2016) Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics
Keertipati S, Savarimuthu BTR, Licorish SA (2016) Approaches for prioritizing feature improvements extracted from app reviews. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, ACM, p 33
Kouloumpis E, Wilson T, Moore J (2011) Twitter Sentiment Analysis: The Good the Bad and the OMG!, AAAI Press pp 538–541
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article Google Scholar
Linstead E, Baldi P (2009) Mining the coherence of gnome bug reports with statistical topic models. In: 6th IEEE International Working Conference on Mining Software Repositories. MSR ’09., pp 99–102
Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining text data, Springer, pp 415–463
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd International Requirements Engineering Conference (RE), pp 116–125
MacMahon M, Stankiewicz B, Kuipers B (2006) Walk the talk: Connecting language, knowledge, and action in route instructions. Def 2(6):4
Google Scholar
O’Connor B, Balasubramanyan R, Routledge B R, Smith N A (2010) From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11 (122-129):1–2
Google Scholar
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol 10, pp 1320–1326
Panichella S, Sorbo AD, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 281–290
Picorini C (2015) Lucene-java wiki: Powered by. https://wiki.apache.org/lucene-java/PoweredBy, [Online; accessed 19-December-2016]
Piwowar H A (2011) Who shares? who doesn’t? factors associated with openly archiving raw research data. PloS one 6(7):e18,657
Article Google Scholar
Porter M F (1980) An algorithm for suffix stripping. Program 14(3):130–137
Article Google Scholar
Prasetyo PK, Lo D, Achananuparp P, Tian Y, Lim EP (2012) Automatic classification of software related microblogs. In: 28th IEEE International Conference on Software Maintenance (ICSM), pp 596–599
Reinhardt W (2009) Communication is the key - support durable knowledge sharing in software engineering by microblogging. In: Proceedings of the SENSE Workshop, Software Engineering within Social Software Environments, Germany
Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys?. In: Annual meeting of the Florida Association of Institutional Research, pp 1–3
Rowe S (2013) Lucene-java wiki: Lucene faq. URL https://wiki.apache.org/lucene-java/PoweredBy, [Online; accessed 19-December-2016]
Schwartz B (2014) A new click through rate study for google organic results
Sharma A, Tian Y, Lo D (2015) What’s hot in software engineering twitter space?. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 541–545
Sheskin DJ (2007) Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar
Singer L, Figueira Filho F, Storey M A (2014) Software engineering at the speed of light: How developers stay current using twitter. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE, vol 2014, pp 211–221
Socher R, Bauer J, Manning C D, Ng A Y (2013) Parsing with compositional vector grammars. In: ACL, vol 1, pp 455–465
Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent dirichlet allocation. In: Proceedings of the 5th India Software Engineering Conference, ISEC ’12, pp 125–130
Storey MA, Treude C, van Deursen A, Cheng LT (2010) The impact of social media on software engineering practices and tools. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, ACM, New York, NY, USA, FoSER ’10, pp 359–364
Tian Y, Lo D (2014) An exploratory study on software microblogger behaviors. In: 2014 IEEE 4th Workshop on Mining Unstructured Data, pp 1–5
Tian Y, Nagappan M, Lo D, Hassan A E (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 301–310
Vasilescu B, Serebrenik A, Devanbu P, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, ACM, CSCW ’14, pp 342–354
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: Proceedings of the 38th International Conference on Software Engineering, ACM, pp 14–24
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug?. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, Washington, DC, USA, MSR ’07, pp 1
Welch C (2014) Apple pulls ios 8.0.1 after users report major problems with update
Yao X, Van Durme B (2014) Information extraction over structured data: Question answering with freebase. In: ACL (1), Citeseer, pp 956–966
Yin RK (2002) Case Study Research: Design and Methods - Third Edition. 3rd edn. SAGE Publications, Thousand Oaks
Zhang F, Khomh F, Zou Y, Hassan A (2012) An empirical study on factors impacting bug fixing time. In: 2012 19th Working Conference on Reverse Engineering (WCRE), pp 225–234

Download references

Author information

Authors and Affiliations

School of Computing, Queen’s University, Kingston, ON, Canada
Mariam El Mezouar & Feng Zhang
Department of Electrical and Computer Engineering, Queen’s University, Kingston, ON, Canada
Ying Zou

Authors

Mariam El Mezouar
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariam El Mezouar.

Additional information

Communicated by: David Lo

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mezouar, M., Zhang, F. & Zou, Y. Are tweets useful in the bug fixing process? An empirical study on Firefox and Chrome. Empir Software Eng 23, 1704–1742 (2018). https://doi.org/10.1007/s10664-017-9559-4

Download citation

Published: 09 November 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10664-017-9559-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are tweets useful in the bug fixing process? An empirical study on Firefox and Chrome

Abstract

Access this article

Similar content being viewed by others

App store mining is not enough for app improvement

GitHub as a Social Network

Towards a Cross-Country Analysis of Software-Related Tweets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are tweets useful in the bug fixing process? An empirical study on Firefox and Chrome

Abstract

Access this article

Similar content being viewed by others

App store mining is not enough for app improvement

GitHub as a Social Network

Towards a Cross-Country Analysis of Software-Related Tweets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation