Requirements Engineering

, Volume 21, Issue 3, pp 311–331 | Cite as

On the automatic classification of app reviews

  • Walid MaalejEmail author
  • Zijad Kurtanović
  • Hadeer Nabil
  • Christoph Stanik
RE 2015


App stores like Google Play and Apple AppStore have over 3 million apps covering nearly every kind of software and service. Billions of users regularly download, use, and review these apps. Recent studies have shown that reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. The majority of the reviews, however, is rather non-informative just praising the app and repeating to the star ratings in words. This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and text ratings. For this, we use review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques. We conducted a series of experiments to compare the accuracy of the techniques and compared them with simple string matching. We found that metadata alone results in a poor classification accuracy. When combined with simple text classification and natural language preprocessing of the text—particularly with bigrams and lemmatization—the classification precision for all review types got up to 88–92 % and the recall up to 90–99 %. Multiple binary classifiers outperformed single multiclass classifiers. Our results inspired the design of a review analytics tool, which should help app vendors and developers deal with the large amount of reviews, filter critical reviews, and assign them to the appropriate stakeholders. We describe the tool main features and summarize nine interviews with practitioners on how review analytics tools including ours could be used in practice.


User feedback Review analytics Software analytics Machine learning Natural language processing Data-driven requirements engineering 



We thank D. Pagano for his support with the data collection, M. Häring for contributing to the development of the coding tool, as well as the RE15 reviewers, M. Nagappan, and T. Johann for the comments on the paper. We are also very grateful to the participants in the evaluation interviews. This work was partly supported by Microsoft Research (SEIF Award 2014).


  1. 1.
    Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08). ACM, pp 23:304–23:318Google Scholar
  2. 2.
    Bano M, Zowghi D (2015) A systematic review on the relationship between user involvement and system success. Inf Softw Technol 58:148–169CrossRefGoogle Scholar
  3. 3.
    Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM Press, p 308Google Scholar
  4. 4.
    Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media, IncGoogle Scholar
  5. 5.
    Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering (ICSE 2014). ACM, pp 767–778Google Scholar
  6. 6.
    Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: mining app stores for relationships between customer, business and technical characteristics. Research Note RN/14/10, UCL Department of Computer ScienceGoogle Scholar
  7. 7.
    Fitzgerald C, Letier E, Finkelstein A (2011) Early failure prediction in feature request management systems. In: Proceedings of the 2011 IEEE 19th international requirements engineering conference (RE’11). IEEE Computer Society, pp 229–238Google Scholar
  8. 8.
    Galvis Carreño LV , Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: ICSE ’13 Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 582–591Google Scholar
  9. 9.
    Gartner (2015) Number of mobile app downloads worldwide from 2009 to 2017 (in millions). Technical report, Gartner IncGoogle Scholar
  10. 10.
    Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Proceedings of the 36th international conference on software engineering. ACM, pp 1025–1035Google Scholar
  11. 11.
    Groen EC, Doerr J, Adam S (2015) Towards crowd-based requirements engineering: a research preview. In: REFSQ 2015, number 9013 in LNCS. Springer, Berlin, pp 247–253Google Scholar
  12. 12.
    Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd international requirements engineering conference (RE), pp 153–162Google Scholar
  13. 13.
    Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: MSR for app stores. In: Proceedings of the working conference on mining software repositories—MSR’12, pp 108–111Google Scholar
  14. 14.
    Harman M et al (eds) (2014) The 36th CREST open workshop. University College LondonGoogle Scholar
  15. 15.
    Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401Google Scholar
  16. 16.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70MathSciNetzbMATHGoogle Scholar
  17. 17.
    Hoon L, Vasa R, Schneider J-G, Grundy J (2013) An analysis of the mobile app review landscape: trends and implications. Technical report, Swinburne University of TechnologyGoogle Scholar
  18. 18.
    Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of the international conference on knowledge discovery and data mining—KDD ’04. AAAI Press, pp 755–760Google Scholar
  19. 19.
    Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: MSR ’13 proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 41–44Google Scholar
  20. 20.
    Jakob N, Weber SH, Müller MC, Gurevych I (2009) Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations. In: Proceeding of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM PressGoogle Scholar
  21. 21.
    Johann T, Maalej W (2015) Democratic mass participation of users in requirements engineering? In: 2015 IEEE 23rd international requirements engineering conference (RE)Google Scholar
  22. 22.
    Li H, Zhang L, Zhang L, Shen J (2010) A user satisfaction analysis approach for software evolution. In: 2010 IEEE international conference on progress in informatics and computing (PIC), vol 2. IEEE, pp 1093–1097Google Scholar
  23. 23.
    Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE), pp 116–125Google Scholar
  24. 24.
    Maalej W, Happel H-J, Rashid A (2009) When users become collaborators. In: Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications—OOPSLA ’09. ACM Press, p 981Google Scholar
  25. 25.
    Maalej W, Pagano D (2011) On the socialness of software. In: 2011 IEEE ninth international conference on dependable, autonomic and secure computing. IEEE, pp 864–871Google Scholar
  26. 26.
    Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in Weblogs. In Proceedings of the 16th international conference on World Wide Web, pp 171–180Google Scholar
  27. 27.
    Neuendorf KA (2002) The content analysis guidebook. Sage, Beverly HillsGoogle Scholar
  28. 28.
    Pagano D, Brügge B (2013) User involvement in software evolution practice: a case study. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 953–962Google Scholar
  29. 29.
    Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: Proceedings of the international conference on requirements engineering—RE’13, pp 125–134Google Scholar
  30. 30.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  31. 31.
    Popescu A-M, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 339–346Google Scholar
  32. 32.
    Schneider K, Meyer S, Peters M, Schliephacke F, Mörschbach J, Aguirre L (2010) Product-focused software process improvement, vol 6156 of lecture notes in computer science. Springer, BerlinGoogle Scholar
  33. 33.
    Seyff N, Graf F, Maiden N (2010) Using mobile RE tools to give end-users their own voice. In: 18th IEEE international requirements engineering conference. IEEE, pp 37–46Google Scholar
  34. 34.
    Standish Group (2014) Chaos report. Technical reportGoogle Scholar
  35. 35.
    Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173CrossRefGoogle Scholar
  36. 36.
    Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558CrossRefGoogle Scholar
  37. 37.
    Torgo L (2010) Data mining with R: learning with case studies. Data mining and knowledge discovery series. Chapman & Hall/CRC, Boca RatonCrossRefGoogle Scholar
  38. 38.
    Xu Q-S, Liang Y-Z (2001) Monte Carlo cross validation. Chemom Intell Lab Syst 56(1):1–11MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Walid Maalej
    • 1
    Email author
  • Zijad Kurtanović
    • 1
  • Hadeer Nabil
    • 2
  • Christoph Stanik
    • 1
  1. 1.Department of InformaticsUniversity of HamburgHamburgGermany
  2. 2.German University of CairoCairoEgypt

Personalised recommendations