Mining non-functional requirements from App store reviews

  • 265 Accesses


User reviews obtained from mobile application (app) stores contain technical feedback that can be useful for app developers. Recent research has been focused on mining and categorizing such feedback into actionable software maintenance requests, such as bug reports and functional feature requests. However, little attention has been paid to extracting and synthesizing the Non-Functional Requirements (NFRs) expressed in these reviews. NFRs describe a set of high-level quality constraints that a software system should exhibit (e.g., security, performance, usability, and dependability). Meeting these requirements is a key factor for achieving user satisfaction, and ultimately, surviving in the app market. To bridge this gap, in this paper, we present a two-phase study aimed at mining NFRs from user reviews available on mobile app stores. In the first phase, we conduct a qualitative analysis using a dataset of 6,000 user reviews, sampled from a broad range of iOS app categories. Our results show that 40% of the reviews in our dataset signify at least one type of NFRs. The results also show that users in different app categories tend to raise different types of NFRs. In the second phase, we devise an optimized dictionary-based multi-label classification approach to automatically capture NFRs in user reviews. Evaluating the proposed approach over a dataset of 1,100 reviews, sampled from a set of iOS and Android apps, shows that it achieves an average precision of 70% (range [66% - 80%]) and average recall of 86% (range [69% - 98%]).

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Code 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

    The data is available at:

  6. 6.

  7. 7.

    Dataset is available at:

  8. 8.


  1. Apté C, Damerau F, Weiss S (1994) Towards language independent automated learning of text categorization models. In: Special interest group on information retrieval, pp 23–30

  2. Bakiu E, Guzman E (2017) Which feature is unusable? Detecting usability and user experience issues from user reviews. In: International requirements engineering conference workshops, pp 182–187

  3. Bano M, Zowghi D, da Rimini F (2017) User satisfaction and system success: An empirical exploration of user involvement in software development. Empir Softw Eng 22(5):2339–2372

  4. Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: A study of app store emergence and growth. Serv Sci 4(1):24–41

  5. Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291

  6. Bi W, Kwok J (2014) Multilabel classification with label correlations and missing labels. In: AAAI conference on artificial intelligence, pp 1680–1686

  7. Bird S, Loper E, Klein E (2009) Natural language processing with python. Sentiment Short Strength Detect Informal Text 61(12):2544–2558

  8. Blei D, Ng A, Jordan M (2003) LAtent Dirichlet Allocation. J Mach Learn Res 3:993–1022

  9. Brinker K, Fürnkranz J, Hüllermeier E (2006) A unified model for multilabel classification and ranking. In: European conference on artificial intelligence, pp 489–493

  10. Brusilovsky P, Kobsa A, Nejdl W (2007) The Adaptive Web: Methods and Strategies of Web Personalization. Springer, Berlin, pp 335–336

  11. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

  12. Carreño L, Winbladh K (2013) Analysis of user comments: An approach for software requirements evolution. In: International conference on software engineering, pp 582–591

  13. Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: Mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778

  14. Cheng W, Hüllermeier E (2009) A simple instance-based approach to multilabel classification using the mallows model. In: International workshop on learning from multi-label data, pp 28–38

  15. Chung L, Cesar J, do Prado Leite S (2009) On non-functional requirements in software engineering. Springer, Berlin, pp 363–379

  16. Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102

  17. Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371

  18. Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Requirements engineering, pp 39–48

  19. Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120

  20. Coulton P, Bamford W (2011) Experimenting through mobile apps and app stores. Int J Mob Hum Comput Interact 3(4):55–70

  21. Dehlinger J, Dixon J (2011) Mobile application software engineering: Challenges and research directions. In: Workshop on mobile software engineering, pp 29–32

  22. Eisenstein J, OĆonnor B, Smith N, Xing E (2014) Diffusion of lexical change in social media. PLoS ONE 9:1–13

  23. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: International conference on neural information processing systems: natural and synthetic, pp 681–687

  24. Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl Discov Data Min 96(34):226–231

  25. Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: Mining app stores for relationships between customer, business and technical characteristics, University of College London, Tech. Rep. rN/14/10, Tech Rep.

  26. Forman G, Zahorjan J (1994) The challenges of mobile computing. Computer 27(4):38–47

  27. Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Knowledge discovery and data mining, pp 1276–1284

  28. Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: International conference on information and knowledge management, pp 195–200

  29. Giardino C, Wang X, Abrahamsson P (2014) Why early-stage software startups fail: A behavioral framework. In: International conference of software business, pp 27–41

  30. Glinz M (2007) On non-functional requirements. In: International requirements engineering conference, pp 21–26

  31. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Advances in knowledge discovery and data mining, pp 22–30

  32. Gokcay D, Gokcay E (1995) Generating titles for paragraphs using statistically extracted keywords and phrases. Syst Man Cybern 4:3174–3179

  33. Gómez M, Adams B, Maalej W, Monperrus M, Rouvoy R (2017) App store 2.0: From crowdsourced information to actionable feedback in mobile ecosystems. IEEE Softw 34(2):81–89

  34. Gotel O, Cleland-Huang J, Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1. 0). In: Software and systems traceability, pp 343–409

  35. Gralha W, Damian D, Wasserman A, Goulao M, Araújo J (2018) The evolution of requirements practices in software startups. In: International conference on software engineering

  36. Groen E, Kopczynska S, Hauer M, Krafft T, Doerr J (2017) Users - The hidden software product quality experts? Requirements Engineering, pp 80–89

  37. Gross D, Yu E (2001) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36

  38. Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering, pp 153–162

  39. Harman M., Jia Y., Zhang Y. (2012) App store mining and analysis: MSR for app stores, In: Mining software repositories, pp 108–111

  40. Harrison R, Flood D, Duce D (2013) Usability of mobile applications: Literature review and rationale for a new usability model. J Interact Sci 1(1):1–16

  41. Hattori L, Lanza M (2008) On the nature of commits. In: International conference on automated software engineering, pp 63–71

  42. He W, Tian X, Shen J (2015) Examining security risks of mobile banking applications through blog mining. In: Modern artificial intelligence and cognitive science conference, pp 103–108

  43. Hindle A, Wilson A, Rasmussen K, Barlow J, Charles J, Romansky S (2014) GreenMiner: A hardware based mining software repositories software energy consumption framework. In: Working conference on mining software repositories, pp 21–21

  44. Hutto C, Gilbert, E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: International AAAI conference on weblogs and social media

  45. Ihm S, Loh W, Park Y (2013) App analytic: A study on correlation analysis of app ranking data. In: International conference on cloud and green computing, pp 561–563

  46. Javarone M, Armano G (2013) Emergence of acronyms in a community of language users. Eur Phys J B 86(11):474

  47. Jha N, Mahmoud A (2017a) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 273–287

  48. Jha N, Mahmoud A (2017b) MARC: A Mobile application review classifier. In: Requirements engineering: foundation for software quality, workshops, pp 1-15

  49. Jha N, Mahmoud A (2018) Using frame semantics for classifying and summarizing application store reviews. Empir Softw Eng 23(6):3734–3767

  50. Joachims T (1998) Text categorization with Support Vector Machines: Learning with many relevant features, pp 137–142

  51. Johann T, Stanik C, Maalej W et al (2017) Safe: A simple approach for feature extraction from app descriptions and app reviews. In: Requirements engineering, pp 21–30

  52. Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng 22(5):2543–2584

  53. Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Requirements engineering, pp 61–70

  54. Lee G, Raghu T (2011) Product portfolio and mobile apps success: Evidence from app store market. In: Americas conference information systems, pp 3912–3921

  55. Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: European conference on machine learning, pp 4–15

  56. Li J, Yan H, Liu Z, Chen X, Huang X, Wong D (2017) Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Syst J 11 (2):439–448

  57. Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go? In: International conference on software engineering, pp 94–104

  58. Luaces O, Díez J, Barranquero J, Coz J, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Prog Artif Intell 1(4):303–313

  59. Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering, pp 116–125

  60. Maalej W, Kurtanović Z, nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331

  61. Mahatanankoon R, Joseph Wen H, Lim B (2005) Consumer-based m-commerce: Exploring consumer perception of mobile applications. Comput Stand Interfaces 27 (4):347–357

  62. Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381

  63. Mairiza D, Zowghi D, Nurmuliani N (2010) An investigation into the notion of non-functional requirements. In: Association for computing machinery symposium on applied computing, pp 311–317

  64. Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133

  65. Martin W, Sarro F, Harman M (2016a) Causal impact analysis for app releases in google play. In: International symposium on foundations of software engineering, pp 435–446

  66. Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016b) A survey of app store analysis for software engineering. IEEE Transactions on Software Engineering

  67. Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847

  68. McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752, pp 41–48

  69. Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106

  70. Nayebi M, Adams B, Ruhe G (2016a) Release practices for mobile apps – what do users and developers think?. In: International conference on software analysis, evolution, and reengineering, pp 552–562

  71. Nguyen Duc A, Abrahamsson P (2016b) Minimum viable product or multiple facet product? The role of mvp in software startups. In: Agile processes in software engineering and extreme programming, pp 118–130

  72. Nayebi M, Farahi H, Ruhe G (2017a) Which version should be released to app store?. In: International symposium on empirical software engineering and measurement, pp 324–333

  73. Nayebi M, Ruhe G (2017b) Optimized functionality for super mobile apps. In: International requirements engineering conference, pp 388–393

  74. Nayebi M, Cho H, Ruhe G (2018) App store mining is not enough for app improvement. Empir Softw Eng 23(5):2764–2794

  75. Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119

  76. Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: Requirements engineering, pp 125–134

  77. Panichella S, Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290

  78. Paternoster N, Giardino C, Unterkalmsteiner M, Gorschek T, Abrahamsson P (2014) Software development in Startup companies: A systematic mapping study. Inf Softw Technol 56(10):1200–1218

  79. Pedregosa F, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

  80. Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: A systematic study of the mobile app ecosystem. In: Conference on internet measurement, pp 277–290

  81. Quinlan R (1986) Induction of Decision Trees. Mach Learn 1(1):81–106

  82. Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: International conference on data mining, pp 995–1000

  83. Regnell B, Höst M, Berntsson Svensson R (2007) A quality performance model for cost-benefit analysis of non-functional requirements applied to the mobile handset domain. In: Requirements engineering: foundation for software quality, pp 277–291

  84. Ribeiro F, Araújo M, Gonċalves P, Benevenuto F, Gonċalves M (2015) SentiBench-a benchmark comparison of state-of-the-practice sentiment analysis methods, arXiv:

  85. Shah F, Sabanin Y, Pfahl D (2016) Feature-based evaluation of competing apps. In: International workshop on app market analytics, pp 15–21

  86. Sorower M (2010) A literature survey on algorithms for multi-label learning, vol 18. Oregon State University, Corvallis

  87. Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation-based pruning of stacked binary relevance models for multi-label learning. In: International workshop on learning from multi-label data, pp 101–116

  88. Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24

  89. Wasserman A (2010) Software engineering issues for mobile application development. In: The FSE/SDP workshop on future of software engineering research, pp 397–400

  90. Williams G, Mahmoud A (2017a) Analyzing, classifying, and interpreting emotions in software users’ tweets. In: International workshop on emotion awareness in software engineering, pp 2–7

  91. Williams G, Mahmoud A (2017b) Mining Twitter feeds for software user requirements. In: International requirements engineering conference, pp 1–10

  92. Williams G, Mahmoud A (2018) Modeling user concerns in the app store: A case study on the rise and fall of Yik Yak. In: International requirements engineering conference, pp 64–75

  93. Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Human language technology and empirical methods in natural language processing, pp 347–354

  94. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B (2012) A wesslèn Experimentation in Software Engineering. Springer, Berlin

Download references


We would like to extend our gratitude to Dr. Daniel M. Berry from the University of Waterloo for his contribution to this work. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07 and by the LSU Economic Development Assistantships (EDA) program.

Author information

Correspondence to Anas Mahmoud.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: David Lo, Meiyappan Nagappan, Fabio Palomba, and Sebastiano Panichella

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jha, N., Mahmoud, A. Mining non-functional requirements from App store reviews. Empir Software Eng 24, 3659–3695 (2019).

Download citation


  • Requirements elicitation
  • Non-functional requirements
  • Application store
  • Classification