Welcome to this special issue that includes empirical studies in Requirements Engineering (RE). RE is a crucial factor for developing high-quality software, systems and services. RE methods, tools and processes are used to engineer systems of different scale and complexity in different domains. Collecting empirical evidence is key in RE research to determine the qualities and evaluating the maturity of the proposed RE solutions, thus fostering further research and pathing the way for adoption by practitioners. The selected articles extend research presented at REFSQ 2017, the 23rd International Working Conference on Requirements Engineering – Foundation for Software Quality. The conference was held in Essen, Germany, from February 27 to March 2, 2017 (Grünbacher and Perini 2017).

At REFSQ 2017, five papers were identified as candidates to be considered for this special issue. The selection of the three Technical Design papers and two Scientific Evaluation papers was based on the peer reviews from the conference, discussions with PC members, and the suitability of the work for the empirical software engineering journal. The authors of these candidate papers were invited to prepare a revised and substantially extended version, and to consider as possible extensions additional practical applications determined through case studies or experiments, additional empirical validations, systematic comparisons with other approaches, or sound theoretical foundations. The submitted manuscripts were each peer-reviewed by three reviewers. Finally, four articles could be accepted for inclusion in this special issue.

FormalPara Summary of the papers

Three papers in this special issue exploit Natural Language Processing (NLP) techniques to automatically derive knowledge from textual artifacts expressed in NL to support diverse RE tasks. Besides describing the proposed techniques, the manuscripts present experiments performed to provide empirical evidence on the effectiveness of their proposed solutions. The fourth paper presents research on human errors and error prevention strategies in writing requirements documents, which builds on human error theory and also describes empirical studies.

Specifically, in their paper titled “Semi-automatic Rule-based Domain Terminology and Software Feature-relevant Information Extraction from Natural Language User Manuals – An Approach and Evaluation at Roche Diagnostics GmbH” T. Quirchmayr, B. Paech, R. Kohl, H. Karey, and G. Kasdepke describe a technique to automatically extract feature-relevant information from the user manual of a software system. The effectiveness of the proposed approach is demonstrated by comparing it against a gold standard, and by applying it on selected sections of user manuals of software products of Roche Diagnostics GmbH.

A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S. Gnesi focus on requirements verification in the case of large requirements documents, such as those in the railway signaling domain. In their paper “Detecting Requirements Defects with NLP Patterns: an Industrial Experience in the Railway Domain” the authors investigate to which extent NLP can be practically applied to the automatic detection of defects in such documents. The proposed solution includes rule-based NLP patterns for defect detection that can be incrementally tuned for a specific application, thus also allowing to manage false positive cases, which are typically raised by these techniques. The experimental analysis of the proposed solution is performed on a document that contains 1866 requirements, resulting in precision above 83% and recall above 85%.

App user reviews are usually smaller textual documents, but can be available in huge amount, thus requiring high effort the requirements engineers analyzing them. In their paper titled “Using Frame Semantics for Classifying and Summarizing Application Store Reviews” N. Jha and A. Mahmoud investigate the applicability and effectiveness of frame semantics techniques for automatically classifying and summarizing user reviews, thus allowing to identify the most pressing issues contained in such reviews. Experimental analysis of classification accuracy has been performed on different datasets of app store reviews showing that the technique is both efficient and accurate. Human experts evaluated the summaries generated by the frame semantics techniques against summaries generated by text-based summarization and perceived the latter as more comprehensive. The proposed techniques have been integrated with others into a new version of a tool called MARC (Mobile App Review Classifier) that supports the whole process, from crawling user review to summarization of key contained issues.

The fourth paper in this special issue empirically investigates the usefulness of human error information for fault prevention during requirements engineering. The article by W. Hu, J. C. Carver, V. Anu, G. Walia, and G. Bradshaw titled “Using Human Error Information for Error Prevention” also reports which error prevention strategies are used in industrial practice. The study investigates the role of two taxonomies for error prevention, namely a recovery error taxonomy (RET) defined based on reviewing the requirements engineering literature and a human error taxonomy (HET) built upon RET, which takes into account human error theory. A controlled experiment with 31 students trained on the two taxonomies confirmed that the better the students understood human errors from the training process the fewer errors they made in writing requirements documents. Two industrial studies performed by interviewing requirements experts resulted in the identification of 75 prevention and mitigation strategies concerning errors present in the HET taxonomy, and additional types of error that were missing in HET.