Empirical Software Engineering

, Volume 23, Issue 3, pp 1275–1312 | Cite as

Studying the dialogue between users and developers of free apps in the Google Play Store

  • Safwat Hassan
  • Chakkrit Tantithamthavorn
  • Cor-Paul Bezemer
  • Ahmed E. Hassan


The popularity of mobile apps continues to grow over the past few years. Mobile app stores, such as the Google Play Store and Apple’s App Store provide a unique user feedback mechanism to app developers through the possibility of posting app reviews. In the Google Play Store (and soon in the Apple App Store), developers are able to respond to such user feedback. Over the past years, mobile app reviews have been studied excessively by researchers. However, much of prior work (including our own prior work) incorrectly assumes that reviews are static in nature and that users never update their reviews. In a recent study, we started analyzing the dynamic nature of the review-response mechanism. Our previous study showed that responding to a review often has a positive effect on the rating that is given by the user to an app. In this paper, we revisit our prior finding in more depth by studying 4.5 million reviews with 126,686 responses for 2,328 top free-to-download apps in the Google Play Store. One of the major findings of our paper is that the assumption that reviews are static is incorrect. In particular, we find that developers and users in some cases use this response mechanism as a rudimentary user support tool, where dialogues emerge between users and developers through updated reviews and responses. Even though the messages are often simple, we find instances of as many as ten user-developer back-and-forth messages that occur via the response mechanism. Using a mixed-effect model, we identify that the likelihood of a developer responding to a review increases as the review rating gets lower or as the review content gets longer. In addition, we identify four patterns of developers: 1) developers who primarily respond to only negative reviews, 2) developers who primarily respond to negative reviews or to reviews based on their contents, 3) developers who primarily respond to reviews which are posted shortly after the latest release of their app, and 4) developers who primarily respond to reviews which are posted long after the latest release of their app. We perform a qualitative analysis of developer responses to understand what drives developers to respond to a review. We manually analyzed a statistically representative random sample of 347 reviews with responses for the top ten apps with the highest number of developer responses. We identify seven drivers that make a developer respond to a review, of which the most important ones are to thank the users for using the app and to ask the user for more details about the reported issue. Our findings show that it can be worthwhile for app owners to respond to reviews, as responding may lead to an increase in the given rating. In addition, our findings show that studying the dialogue between user and developer can provide valuable insights that can lead to improvements in the app store and user support process.


Google play store User-developer dialogue Developer reply Developer response Mixed-effect model Android mobile apps Empirical study Software engineering 


  1. Akdeniz (2013) Google play crawler. (last accessed: July 2017)
  2. Anderson DR, Burnham KP, Gould WR, Cherry S (2001) Concerns about finding effects that are actually spurious. Wildl Soc Bull 311–316Google Scholar
  3. App Annie (2013) The app analytics and app data industry standard: Google play store, united states, top overall, free, week 35. (last accessed: July 2017)
  4. AppBrain Free versus paid android apps. (last accessed: July 2017)
  5. Bates D, Maechler M, Bolker B, Walker S (2017) Package ‘lme4’. (last accessed: July 2017)
  6. Borgatti S Introduction to grounded theory. (last accessed: July 2017)
  7. Documentation for package ‘stats’. (last accessed: July 2017)
  8. Eisenhauer JG (2009) Explanatory power and statistical significance. Teach Stat 31(2):42–46CrossRefGoogle Scholar
  9. Fox J, Weisberg S (2017) Package ‘car’. (last accessed: July 2017)
  10. Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1–2):203–223MathSciNetCrossRefzbMATHGoogle Scholar
  11. Google: Google my business help - read and reply to reviews. (last accessed: July 2017)
  12. Google: Google play developer API - reply to reviews. (last accessed: July 2017)
  13. Guzman E, Azócar D, Li Y (2014) Sentiment analysis of commit comments in GitHub: an empirical study. In: 11tH working conference on mining software repositories (MSR). IEEE, pp 352–355Google Scholar
  14. Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: 22nd international requirements engineering conference (RE). IEEE, pp 153–162Google Scholar
  15. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefGoogle Scholar
  16. Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: MSR for app stores. In: 9th working conference of mining software repositories (MSR). IEEE, pp 108–111Google Scholar
  17. Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: 10th working conference on mining software repositories (MSR). IEEE, pp 41–44Google Scholar
  18. Iacob C, Harrison R, Faily S (2013) Online reviews as first class artifacts in mobile app development. In: 5th international conference on mobile computing, applications, and services (MobiCASE), pp 47–53Google Scholar
  19. Harrell FE Jr (2017) Package ‘hmisc’. (last accessed: July 2017)
  20. Khalid H (2013) On identifying user complaints of iOS apps. In: 35th international conference on software engineering (ICSE), pp 1474–1476Google Scholar
  21. Khalid H, Shihab E, Nagappan M, Hassan AE (2015) What do mobile app users complain about? IEEE Softw 32(3):70–77CrossRefGoogle Scholar
  22. Khalid M, Asif M, Shehzaib U (2015) Towards improving the quality of mobile app reviews. International Journal of Information Technology and Computer Science 35–41Google Scholar
  23. Khandkar SH Open coding. (last accessed: July 2017)
  24. Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. WileyGoogle Scholar
  25. Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 23rd international requirements engineering conference (RE). IEEE, pp 116–125Google Scholar
  26. Martin, P 77% Will not download a retail app rated lower than 3 stars. (Last accessed: July 2017)
  27. Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: 12th working conference on mining software repositories (MSR). IEEE/ACM, pp 123–133Google Scholar
  28. Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. IEEE Trans Softw Eng (TSE) PP(99):1–32Google Scholar
  29. McIlroy S, Ali N, Khalid H, Hassan AE (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng (EMSE) 21(3):1067–1106CrossRefGoogle Scholar
  30. McIlroy S, Shang W, Ali N, Hassan A (2015) Is it worth responding to reviews? a case study of the top free apps in the Google Play store. IEEE Software PP (99):1–1Google Scholar
  31. Moran K, Vásquez ML, Bernal-Cárdenas C, Poshyvanyk D (2015) Auto-completing bug reports for android applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering (ESEC/FSE). ACM, pp 673–686Google Scholar
  32. Oh J, Kim D, Lee U, Lee J, Song J (2013) Facilitating developer-user interactions with mobile app review digests. In: 2013 conference on human factors in computing systems (CHI). ACM SIGCHI, pp 1809–1814Google Scholar
  33. Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: 21st international requirements engineering conference (RE). IEEE, pp 125–134Google Scholar
  34. Perez S (2017) Apple will finally let developers respond to app store reviews. (Last accessed: July 2017)
  35. Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’s d indices the most appropriate choices. In: Annual meeting of the southern association for institutional researchGoogle Scholar
  36. Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng (TSE) 25(4):557–572CrossRefGoogle Scholar
  37. SentiStrength. (last accessed: July 2017)
  38. Snijders TA, Bosker RJ (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling. Sage PublicationsGoogle Scholar
  39. Snijders TAB (2005) Fixed and random effects. In: Encyclopedia of statistics in behavioral science. WileyGoogle Scholar
  40. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Comments on researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng (TSE) 11(42):1092–1094CrossRefGoogle Scholar
  41. Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment in short strength detection informal text. JASIST 61(12):2544–2558CrossRefGoogle Scholar
  42. Tourani P, Jiang Y, Adams B (2014) Monitoring sentiment in open source mailing lists: exploratory study on the apache ecosystem. In: 24th annual international conference on computer science and software engineering (CASCON), pp 34–44Google Scholar
  43. Top Android phones. (last accessed: July 2017)
  44. What does AUC stand for and what is it? (last accessed: July 2017)
  45. Wilcoxon rank sum and signed rank tests. (last accessed: July 2017)

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Software Analysis and Intelligence Lab (SAIL)Queen’s UniversityKingstonCanada
  2. 2.School of Computer ScienceThe University of AdelaideAdelaideAustralia

Personalised recommendations