1 Introduction

Performance aspects such as time behavior, capacity, or throughput, are essential non-functional requirements (NFR) of software products. Performance testing is the process of measuring the availability, response time, throughput, and resource utilization of a software product [50]. The importance of software performance and relation to functional requirements is acknowledged since the 1990s [65]. A real-world example is HealthCare.gov, a “health insurance exchange website” run by the United States government, where on the launch day 99% of people who wanted to get insurance failed to register [75]. Further investigations showed that no adequate performance testing was performed [68].

Performance-related issues can have a large impact on cost, especially if those issues are not treated early [15, 16, 66]. Another example of a software performance issue was Pokemon Go [51], a mobile game that, after the initial roll-out, became unusable in many countries. The large number of users caused server failures, leading to a delayed roll-out of the game to reduce the load [51]. A potential reason for such a failure is the different nature of performance requirements compared to functional requirements, which makes it difficult for developers to translate performance requirements into written code [78]. Therefore, performance testing is necessary, since it can detect the causes of performance-related issues and verify whether the software product meets the requirements or not [78].

Model-based testing (MBT) is a software testing approach that uses an abstraction of the system (or part thereof) to generate test cases [69]. According to a software testing survey conducted in Canada [30], more than 35% of the respondents use MBT approaches to generate test cases in their projects. This indicates that MBT is prevalent in the industry. MBT forces testability into the product design when creating the model. The model is created from the requirements and describes the behavior of the system. Successfully modeled system requirements indicate that those requirements are testable, complete, and can be validated since they were formalized in an unambiguous manner [32].

Many studies explored the state of the art of MBT [18, 19, 25, 35, 69, 70]. Utting et al. [69, 70] created a taxonomy of existing MBT approaches and tools, and Dias-Neto et al. [18, 19] systematically reviewed the literature of MBT in 2007 and 2010. These studies agreed that the existing MBT approaches focus on testing the functional rather than the non-functional part (i.e., quality aspects) of the system. Later, Häser et al. [35] reviewed the literature for model-based integration testing for NFR, and Felderer et al. [25] model-based security testing. A look at the state of the art for model-based performance testing is missing.

In this paper, we study the current status of model-based performance testing and identify approaches that we can use to model different aspects of performance requirements. Then, we propose the Performance Requirements verificatiOn and Test EnvironmentS generaTion approach (PRO-TEST) which supports model-based performance testing by checking the ambiguity, measurability, and completeness of performance requirements, and generating test environments. Finally, we evaluate PRO-TEST on real software requirements specifications.

The main contributions of this study are:

  1. 1.

    A categorization of MBT studies in the context of performance requirements, based on the performance aspect, testing level, study type, research method, model type, application type, and contribution.

  2. 2.

    A categorization of the Software Requirements Specifications (SRS) from a public repository [26], based on the described application type and performance requirements.

  3. 3.

    PRO-TEST, an approach to model performance requirements to verify them, understand what should be tested, and generate test environments.

  4. 4.

    An evaluation of PRO-TEST, illustrating its benefits and drawbacks.

The remainder of this paper is organized as follows. Section 2 introduces the concepts of software performance and model-based testing and reviews related work. Section 3 illustrates the design and methodology used in our research and the validity threats. In Sect. 4, we present state of the art and state of practice of model-based performance testing. Section 5 presents PRO-TEST and the obtained benefits but also the faced challenges when modeling performance requirements. We discuss PRO-TEST in relation to literature in Sect. 6. Section 7 answers our research questions. Finally, we conclude the paper in Sect. 8 with directions for future work.

2 Background and related work

In this section, we briefly review aspects of software performance, model-based testing, and related work.

2.1 Software performance

Software performance is considered in many software quality models [5, 40]. Synthesizing these quality models, as shown in Table 1, the main aspects of software performance are time behavior [17, 31, 37], capacity [37], resource utilization [17, 31, 37], speed/throughputFootnote 1 [31] and efficiency [10, 17, 20, 49]. Next, we provide a definition of these software performance aspects.

Table 1 Quality models and their related performance aspects

Time behavior

the time required to perform specific tasks or complete requests. It usually has multiple instances or values depending on different anticipated capacities (i.e., the number of users). This aspect is included in all three models (ISO9126, ISO25010, and FURPS) as time behavior or response time. It is an explicit aspect, that is used by the users to infer software performance. It could have a direct effect on the usability of the software.

Resource utilization

the amount or percentage of the resources used to run the software. The software should not always utilize all resources when running; instead, it should be limited to a specific amount so that it has a margin for peak times and new updates that would require more resources.

Capacity

the maximum capacity (in terms of requests, sessions, users, data, etc.) that the system can handle without crashing. This aspect is crucial when planning for the project in later stages, especially when considering scalability. If not accounted for, it could result in an overload of the system, which would affect the business operations and lead to extra charges. Capacity gives an insight into the anticipated data size used by the software, which would affect the decision regarding the required resources for the system to operate.

Speed/ throughput

the number of requests or processes per time unit that the system can handle while still maintaining the time behavior requirements.

Efficiency

the relation between the output (i.e., time behavior, speed) and the input (i.e., capacity, resource utilization). This is a relatively complex aspect since it is affected by all other mentioned aspects of performance.

2.2 Model-based testing

Model-based testing is a software testing technique that automates the process of test case generation from a model that represents the system under test (SUT). MBT consists of three main tasks [61]: designing a functional test model, determining test generation criteria, and generating the tests. The model could be an end-to-end model, e.g., a business process or per function process model. Abstract test cases are generated from a systems’ model by random generation, search-based algorithms, model–checking, symbolic execution, theorem proving, or constraint solving [42, 69]. Then a tool builds the test skeleton to test the software.

Utting et al. [69] present five steps of the MBT process. We illustrate this process in Fig. 1. In Step 1, a test model is created from the requirements. The model can be either created specifically for testing or reuse some parts of the models used at the design phase. In the case of the latter, the test model should be independent of the design model, so issues in the design phase do not appear in the test model. A model should be verified with little effort to ensure the efficiency of the MBT approach. In Step 2, test selection criteria are defined, which will set the rules for the automatic generation of test cases. Examples of test selection criteria are system functionality (requirements-based), the structure of the test model, or properties of the environment. In Step 3, test case specifications are written as a more formal representation of test case selection criteria. In Step 4, the test specification is, with help of the model, transformed into concrete test cases. At this stage, algorithms are used to select the minimum set of test cases that ensure full test coverage. In Step 5, the tests are run on the SUT in a test environment. First, test inputs are fed to the function under test (5-1), then the test verdicts are recorded by comparing the test results with the expected outcome (5-2).

Fig. 1
figure 1

MBT process diagram from Utting et al. [69]

There are many benefits associated with MBT. It has been shown to be effective in testing real-time adaptive systems [4], verifying the system behavior, and identifying possible performance enhancements. Furthermore, the benefits of MBT automation are generally more numerous the more testing the system requires [58]. Another benefit is that MBT finds missing and unclear requirements by modeling the systems’ requirements [52, 54]. Besides, MBT can make the requirements more understandable for software engineers [78]. Since performance requirements are often written at a high abstraction level, it may be difficult to understand how they impact software design and code. This could be made easier by modeling functional and non-functional requirements using the same model. A UML activity diagram that models functional requirements could be annotated with performance requirements [64]. We could see in the resulting model where the performance requirements apply in the software.

2.3 Related work

There exist many studies that investigate MBT to test functional requirements, while fewer studies focus on non-functional requirements. Utting et al. created in 2006 [70] and 2012 [69], respectively, a taxonomy for model-based testing to categorize the existing approaches and tools, as well as to classify their usefulness. Their study focused on functional requirements testing. In general, there is no clear distinction between functional and non-functional requirements when MBT is applied [34].

Although MBT for non-functional requirements is not explored extensively, there are still some studies in this area. A systematic review (78 papers) of MBT approaches by Dias-Neto et al. [18], published in 2007, was not limited to functional requirements and explores the non-functional aspects considered by the models. Some limitations of using MBT for non-functional requirements were pointed out by the study. The irregular behavior of software users makes it hard to create a behavioral model of non-functional requirements. Another challenge is the limited support for non-functional aspects in the existing MBT approaches; NFRs like usability, reliability, and security were not supported. Moreover, the majority of MBT approaches proposed by research are never used in industry [18].

Dias-Neto’s original study was renewed in 2010 [19] (including 219 papers), with a focus on the techniques used for modeling, coverage, and the challenges of MBT. This study introduces selection criteria for MBT approaches based on their characteristics. The use of MBT techniques was still difficult, as observed in their previous study in 2007. Apparently, NFRs (usability, reliability, and security) that were not possible to test with MBT (according to the 2007 review [18]), started to get some attention in research. The difference between these studies [18, 19] and the systematic mapping study presented in Sect. 4 is that ours has a more narrow focus on model-based performance testing.

In 2014, Häser et al. [35] conducted a systematic literature review of model-based integration testing. They asked in their research questions about the software paradigms, assessment type, and which NFR can be tested with MBT approaches. However, they did not ask whether the MBT approach tests different aspects of an NFR (i.e., what aspects of performance were tested using these MBT approaches?), and they scoped their research to integration testing. Their findings indicate a lack of research in model-based integration testing for NFR.

In 2016, Felderer et al. [25] presented a taxonomy and systematic classification of model-based security testing. They extended the study of Dias-Neto et al. [19] while focusing on security requirements. Woodside et al. [78] described the domain of software performance engineering (SPE). They did a survey of current work on a sample of papers in SPE and pictured the future of SPE. They collected some models and methods which are used for performance and listed many benefits of modeling performance. The focus of that study is to provide a look at the future of model-based performance testing. In contrast, our study focuses on identifying current techniques that can be used in practice.

Motivated by this research gap, the lack of systematic reviews in MBT of performance requirements, the limited support for NFR in general, and performance in particular in existing techniques, we focus our research on finding and studying different performance requirements models, for the purpose of using them in MBT.

3 Research methodology

To achieve our research aim defined in Sect. 1, we have identified the following four objectives.

  • O1 Identify which aspects of performance are important and can be modeled.

  • O2 Identify modeling techniques and methods that suit performance requirements.

  • O3 Identify a modeling approach that can validate performance requirements, ensure that those requirements are testable, and support the generation of test cases, all three of which are key aspects of MBT.

  • O4 Evaluate the identified modeling approach on a set of requirements specifications.

In alignment with those objectives, we define our research questions in Table 2.

Table 2 Research questions

Figure 2 shows the steps of our research in alignment with the research questions. First, we start with a systematic mapping study (SMS). The mapping study is an appropriate method for gaining an overview of a particular research area. We explored which performance aspects were studied and modeled using MBT (RQ1,1, RQ1.2), and what models exist to model performance requirements (RQ2.1). Second, we conducted a sample study on real-project requirements, for the purpose of finding out the relevance of performance aspects in practice (RQ1.3). Based on the results from the SMS and software requirement mining, we developed PRO-TEST (RQ2.2). Finally, we conducted a sample study, to evaluate PRO-TEST (RQ3). We focus our study on the domain of software-intensive systems.

Fig. 2
figure 2

Research methodologies

A Systematic Literature Review (SLR) and an SMS are research methodologies that systemically survey the literature but differ in their aim, execution, and outcome [41, 56]. An SLR aims to aggregate data from the literature and has specific research questions for that purpose, while an SMS aims to explore trends and identify gaps in research. In terms of execution, an SLR requires a quality assessment to be conducted on the extracted papers, while it is not the case for an SMS. The output of an SLR is a synthesis of the reviewed studies, while an SMS classifies a set of papers based on different dimensions.

A sample study is a form of research done on a sample of the population for generalization [67]. The data could be collected using interviews, questionnaires, metric reports, or available for access online, e.g., in a software repository. One of the research methods associated with sample studies is software repository mining [67]. Software repository mining research usually uses open-source software repositories. There is no human to collect data from, i.e., no interviews or questionnaires are involved.

The purpose of evaluating PRO-TEST is to validate that it works in practice, i.e., it can model the performance requirements and generate test environments. Similar to the software requirements mining approach described in Sect. 3.2, we conduct again a sample study, i.e., we use an openly accessible resource for software requirements specifications.

3.1 Systematic mapping study

We developed the SMS protocol based on the SLR conducted by Dias Neto et. al. [19], following the guidelines by Petersen et al. [56]. There were two reasons for choosing this study by Dias-Neto. First, the research group has conducted two SLRs [18, 19] on MBT using the same protocol. This provides some evidence for the repeatability of their study. Second, there was enough information presented about the search keywords and procedure, making it easier to adapt and extend the protocol. The choice of reusing and extending an existing protocol has, however, also disadvantages. The study of performance requirements concerns research beyond MBT, such as requirements engineering and software testing in general, software performance engineering and agile software development. Hence, we emphasize that our review covers the area of performance requirements within the scope of MBT only.

3.1.1 Study identification

Choosing the search strategy

We used keyword search in digital databases similar to the search method used by Dias Neto et al. [19]. They used six databases for their search. Two of the databases (i.e., Compendex IE and INSPEC) we did not have access to. Therefore we ran the search on the other four databases (SCOPUS, ACM, IEEE Xplore, and Web of Science). We searched the title, keywords, and abstract of the paper on SCOPUS, WoS, and ACM, while we searched the full text of IEEE (due to a limitation of the database).

Developing the search

We took the search string used by Dias Neto et al. [19] and extended it to fit the purpose of our research. The keywords we added are related to performance. We extracted those keywords from the quality models for software performance discussed in Sect. 2.1. Table 3 shows the borrowed search string and the extension with performance-related keywords.

Table 3 Search strings used in the SMS

Evaluating the search string

We evaluated the quality of the search string to mitigate the risk of missing key papers. We did that in two steps:

  • We ran Dias Neto et al. [19] search string on the selected databases and randomly checked whether the returned research papers were presented by Dias Neto et al. in their study [19].

  • To validate the whole search string including the extension, we reviewed the papers published at the International Conference On Software Testing Verification And Validation (ICST) over the period 2014–2018. We read the title and abstract to see if the topic is related to model-based performance testing. We collected the papers related to our topic and looked for them in our search results. We found three papers in the ICST conference proceedings that were not returned by our search string. After further analysis of the search string, we removed a part of Dias Neto et al. search string (approach OR method OR methodology OR technique) and adjusted our extension to ensure those papers are included.

3.1.2 Selection criteria

We developed the following inclusion and exclusion criteria. Inclusion:

  1. 1.

    The publication is available in full text.

  2. 2.

    The publication language is English.

  3. 3.

    The date of the publication is within the range of August 2009 (the date when Dias Neto et al. [19] conducted their search) and February 2019 (when we conducted our search).

  4. 4.

    The publication proposes and/or evaluates model-based performance testing techniques.

Exclusion:

  1. 1.

    The publication presents secondary studies, i.e., SMS, SLR, literature survey.

  2. 2.

    The publication is not related to the topic model-based performance testing.

  3. 3.

    Duplicated publications that refer to the same study.

  4. 4.

    The publication is about model-based mutation testing

  5. 5.

    Proceeding, table of content, book, tutorial, demo, editorial

After careful analysis of the model-based mutation testing approach, we have decided to exclude it. Although it uses MBT as a basis, it is concerned with introducing faults during the test to find issues in the system rather than the modeling and test case generation.

3.1.3 Quality assessment

No detailed quality assessment was conducted. Since the goal of our SMS was to find a method that we can use, there was no need to evaluate the quality of each paper selected for our research.

3.1.4 Data extraction

We extracted the following data from our and Dias-Neto et al.’s [19] primary studies (after we applied our inclusion/exclusion criteria).

Performance aspect

We extracted data related to the five performance aspects discussed in Sect. 2.1, i.e., time behavior, resource utilization, capacity, throughput, and efficiency. We added a “not specified” category for those papers that do not mention or focus on a specific aspect of performance. This classification supports answering RQ1, RQ1.1, and RQ1.2.

Testing level

Testing can be conducted on five different levels [7]: acceptance, system, integration, module, and unit level. This classification supports answering RQ2 and determines on which level performance testing is conducted.

Study type

We used Stol et al. [67] to classify study types in software engineering: field study, field experiment, experimental simulation, laboratory experiment, judgment study, sample studies, formal theory, and computer simulation. This classification helps us to understand how mature the studied MBT techniques are, i.e., whether they are empirically studied and adopted by industry or initial proposals that require more empirical evidence. This is an additional criterion for choosing the model and answering RQ2, RQ2.1, and RQ2.2.

Research method

The research method differs from the study type. A research method defines the set of rules and practices to follow, having a specific goal in mind, i.e., answering a set of research questions. The study type is a grouping of different research methods based on their “metaphor, purpose and goals” [67]. In software engineering research, many research methods can be associated to study types [67]. Some of those methods are case study, experiment, survey, and concept development. Since there is no complete list of those research methods, we kept this classification dynamic and extracted the options directly from the research papers. This classification helps to distinguish between papers that present a new approach or theory to others that empirically evaluate existing approaches.

Model type

We classified each paper based on the approach used to model performance requirements. The classification is based on the essence of the model, i.e., some models were novel while others were extensions of previous models. For example, Maâlej et al. [48] present timed-automata, while Abbors et al. [1] present a probabilistic extension of timed-automata. This helps to determine the frequency of the models used for performance requirements and answer RQ2.1. We did not have predetermined options for this classification, since one of our research objectives was to identify all possible modeling approaches.

Application type

We classify the type of the application (e.g., web application, mobile, desktop) to understand where model-based performance testing is used or studied. This is also a dynamic classification with no predetermined options.

Contribution

This classification assigns papers into categories based on their contribution to the field (e.g., tool, method, evaluation). With this classification, we can understand the maturity of the models.

3.1.5 Data analysis

We use the frequency of the extracted data, discussed in the previous section, to analyze the state-of-art in model-based performance testing.

Also, to identify a model that can achieve the MBT’s goals, we examined the following aspects of the identified MBT techniques:

  • reported benefits of modeling performance requirements

  • modeled performance aspects

  • type and strength of evaluation of the proposed method

3.2 Software requirements mining

The research questions RQ1 and RQ1.3 in Table 2 were answered by conducting software requirements mining. Ferrari et al. [26] published a data set [24] that contains a collection of software requirements specifications gathered from various industries and applications. There are 77 SRSs in total in the collection from which we constructed a subset as described next.

3.2.1 Selection criteria

Inclusion: the SRS and the individual requirements that are classified and shown in our results have the following properties.

  • SRS: have at least one performance requirement.

  • Requirement: fits in one of the descriptions for performance aspects in Sect. 2.1.

Exclusion: the SRS and the individual requirements that we excluded from our classification and the results have the following properties.

  • SRS: without any performance requirements or not written for a software product.

  • Requirement: does not fit in any of the performance aspects descriptions.

3.2.2 Coding

Since the data in the SRS documents is of qualitative nature, we used coding to efficiently identify and extract relevant information. The codes we created are based on having three dimensions (performance aspect, application type, and quantifiability) that we describe next.

Performance aspect

We extract five performance aspects, i.e., time behavior, resource utilization, capacity, speed/throughput, efficiency, and a general option for the performance requirements that did not fit in any of the five aspects’ descriptions. We apply this classification to each performance requirement and provide thereby data to answer RQ1.3.

Application type

Similar to the SMS, we extract the type of application specified in the SRS, e.g., web application, mobile application, embedded system, etc. This allows us to evaluate whether the SRS data set is a good presentation of the population (i.e., software products).

Requirements quantifiability

Testability is one of the major criteria in requirements verification and validation [10]. The requirement “must be specific, unambiguous, and quantitative wherever possible” such that a developer can write software code that satisfies the requirements. The performance requirement should be quantitative and quantified to be testable. We evaluated each requirement by looking for numerical values.

3.3 Evaluating PRO-TEST

We evaluated PRO-TEST on a set of realistic software requirements specifications (SRS) containing performance requirements. The evaluation was done by modeling the performance requirements and assessing the quantifiability and degree of quantification of the specified requirements, and identifying the possible missing requirements.

3.4 Threats to validity

In the SMS, there were threats related to the data extraction methods. (1) We may have missed some papers because two databases used by Dias Neto et al. [19] we did not have access to. To keep this to a minimum we made sure that we use the SCOPUS database, which includes publications from different technical publishers. (2) We may have excluded papers by our search string. We extended the search string from Dias Neto et al. [19] study with words related to performance. This could lead to fewer results if some keywords are missing from the search string. We tried to include as many keywords as possible and used performance checklists to make sure this threat is kept to a minimum. (3) Another type of threat is related to the human factor; we could have interpreted the data in the wrong way or placed a paper in the wrong classification. We addressed this threat by having the selection and classification done by two researchers independently and the results were then compared. When conflicts were discovered the corresponding paper was discussed by both researchers and if still no consensus could be achieved, a third researcher was consulted.

In software requirements mining, the human factor also introduces threats to validity. First, we could have coded some requirements wrongly or missed out on some performance requirements from the SRS documents. We mitigated this threat by having two researchers involved in coding. The researchers coded a sample of seven SRS documents independently and compared the results. When conflicts were discovered, the corresponding requirement was discussed by both researchers. Then we divided the work equally between the two researchers. When no consensus could be achieved by the two researchers a third researcher was consulted. Second, the sample size may not be enough for generalization since the SRS collection had 77 documents that might not cover all application types or represent the population, i.e., software products.

Finally, in the implementation of PRO-TEST, the small sample size is not enough to generalize the competence of the approach. Only 34 SRS documents of the SRS collection had performance requirements, which might lead to threes issues: (1) The sample we chose might be small to represent the population, i.e., software products. (2) The SRS collection from Ferrari et al.’s study [26] might not be a good representation for the population as well. (3) The most recent SRS document goes back to 2010, which could be considered old. A validation of the model on more recent SRS documents is required.

4 Model-based performance testing

This section reports on the results from the SMS on model-based performance testing (Sect. 4.1) and on the prevalence of performance requirements in a publicly available repository of software requirements specifications (Sect. 4.2). We discuss our findings in Sect. 4.3.

4.1 State of the art

We identified 57 primary studies through our database search and extracted 20 from Dias-Neto’s study (see  “Appendix”.Footnote 2) A paper could be mapped to more than one value in each classification, which depends on the content of the paper. The choice of these maps was driven by our research questions. We show in Figs. 3 and 5 the relation of the performance aspect with all other research area classifications. Moreover, a typical SMS should classify papers in both 1) the research area and 2) the research type [55], hence our choice of Fig. 4.

Fig. 3
figure 3

Papers mapping between a performance aspect and testing level, b performance aspect and model

Fig. 4
figure 4

Papers mapping between a research method and study type, b research method and contribution

In Fig. 3, the y-axis represents the performance aspects, while the x-axis in Fig. 3a represents the testing level and in Fig. 3b the model types. The “Not mentioned” option in the performance aspects, represents the papers that did not mention or focus on any aspect. We categorized the extracted models based on the essence of the model.

In Fig. 4, the y-axis represents the research method, while the x-axis in Fig. 4a represents the study type and in Fig. 4b the contribution of the paper. The study type is based on the classification in Sect. 3.1.4.

In Fig. 5, the y-axis represents the performance aspect, while the x-axis represents the application type (grouped). We grouped the applications based on the category, purpose, and platform, e.g., web application, mobile, and embedded system.

Fig. 5
figure 5

Papers mapping between performance aspect and application type

Figure 6 represents the number of publications related to the topic model-based performance testing. Figures 3, 45 and 6 are based on the results of Dias Neto et al. [19] (for the period 1990–2009) and our research (for the period 2009–2019). We combined the results from the two mentioned studies and present the combined results in these figures.

Fig. 6
figure 6

Number of publications between 1990 and 2019-03

4.2 Performance requirements in SRS documents

The SRS collection contained 77 SRS documents; 34 documents contained at least one performance requirement, and 43 documents specified no performance requirements.

Figure 7 shows the mapping of the extracted performance requirements from the SRS collection. The mapping has two dimensions, representing the performance aspect that the requirement belongs to and the application type specified in the SRS document.

Fig. 7
figure 7

Mapping of extracted requirements between performance aspect and application type

To extract the performance requirements, we applied the coding described in Sect. 3.2.2. The total number of quantifiable performance requirements was 149. However, only 106 requirements were actually quantified and thus could be modeled and tested. Figure 8 shows the number of extracted performance requirements per performance aspect and the quantified requirements per aspect.

Fig. 8
figure 8

The frequency of total and quantified performance requirements per performance aspect

4.3 Discussion

Research on model-based performance testing has gained momentum over the past 30 years (Fig. 6).

Performance aspects were studied to a different extent. By far the most prevalent performance aspect in studies in the context of MBT is time behavior with 66 instancesFootnote 3 in terms of both testing level and model used (Fig. 3). Resource utilization, capacity, and speed/throughput were in close range with a median value of 10 instances in terms of both testing level and model used. Efficiency was the least studied performance aspect in the context of MBT, where it only appeared in one instance in terms of testing level and one in terms of the model used.

We observe a similar trend analyzing the requirements specifications. Time-behavior was the most common performance aspect (Fig. 7). Out of the 149 extracted performance requirements, time behavior was specified in (71) requirements (e.g., The system shall be able to search for a specified product in less than 1 second.,Footnote 4) followed by capacity (38) (e.g., The system must handle at least 100 concurrent users and their operations,Footnote 5) speed/throughput (18) (e.g., The system shall be able to retrieve 200 products per second.,Footnote 6) efficiency (13) (e.g., Management—all management software functions shall take optimal advantage of all language, compiler and system features and resources to reduce overheads to the minimum practical level.Footnote 7) and resource utilization (9) (e.g., The FTSS software and the VxWorks operating system, together shall [SRS193] utilize no more than 3 megabytes of ROM.Footnote 8).

We can see from Fig. 7 that most of the SRS documents with performance requirements were written for web applications, followed by real-time and embedded systems. There was a diversity in terms of performance aspects in the specified requirements for web applications, whereas for real-time systems and embedded systems the specified requirements were mostly related to time behavior. A similar observation can be made by looking at Fig. 5 where web application and embedded system appeared in 22 instances each and real-time systems in 19 instances. The importance of performance in web application, embedded systems and real-time system is not surprising. In a web application a large number of application users are distributed and use different communication media to access the application. Embedded and real-time systems are crucial to perform optimally, since a safety hazard could arise if performance is not addressed. For instance, in self-driving cars the time behavior for reading the value of a sensor is crucial and needs to be specified explicitly, allowing the product to be tested against that specification.

In both the identified primary studies and the reviewed SRSs, time behavior was the most common performance aspect. Nonetheless, the other performance aspects are also relevant, since they appeared in a median of 10 instances each (except efficiency) and specified in 78 requirements combined. That said, we should consider all the performance aspects when modeling performance requirements. Efficiency was the least studied (found in one paper [39]), and the least quantified in (3) requirements (Fig. 8). However, efficiency was specified in (13) requirements, from which we conclude that efficiency is difficult to document and quantify. We found few examples of quantified efficiency requirements: (1) The external server data store containing RLCS status for use by external systems shall be updated once per minute,Footnote 9 and (2) The system must accomplish 90% for transactions in less than 1 second.Footnote 10 The examples show that it is possible to quantify efficiency. In the first requirement “only once every minute” and in the second “90%...less than 1 second”. Both combine two performance aspects, i.e., capacity and time behavior.

Looking at testing levels, performance testing research seems to focus on system level testing (Fig. 3). This observation coincides with the notion that software performance is not associated with a single function, but rather associated with the overall system and influenced by its structure. This is also shown in the performance requirements models used in MBT. The purpose of those models is to verify the overall system behavior, e.g., timed-automata [46, 48, 62] and behavior models [4, 6].

We extracted 50 performance requirements models and categorized them into 11 main categories (Fig. 3).Footnote 11 All 11 categories had models which were used to model time behavior requirements. The purpose of those models is to verify if the written requirements are met. This is accomplished by comparing the testing results with the corresponding performance requirements.

The most studied models were timed-automata and UML-related diagrams. Timed-automata were used to model and analyze the time behavior by measuring time differences between different states, which can model and verify time behavior aspects of software performance. However, timed-automata models have two main drawbacks. First, the models do not make the factors influencing performance explicit, which is needed to generate better test cases for performance requirements. Second, timed-automata can only model time behavior and are unable to cover other performance aspects [29], and are therefore only adequate when time behavior is the only performance aspect that needs to be tested. As we can see from our analysis of SRSs, time behavior is seldom the only performance requirement. UML-based models use an annotation approach to make the performance requirements more intuitive and the system behavior more understandable [4]. UML-based models solely document performance requirements, and are not used for test case generation of performance requirements. In many cases where UML is used, the performance requirements (e.g., time behavior, or capacity) is set on the model as annotation, which is later used during the test generation to add an extra assert to check this requirement. This model annotation is beneficial to verify the performance constraints of a functional requirement in a test-environment (machine resources and test data).

The models and frameworks that we extracted during the SMS were mostly newly developed with little to no validation [27, 28, 45, 76] as seen in Fig. 4a. Although 31 case studies exist that validate those models (e.g., timed-automata), researchers still develop new performance requirements models and testing frameworks (Fig. 4b). The reasons for developing those models and frameworks are various:

  1. 1.

    Model-based performance testing in a specific field has not been done before, e.g., robotics [4], self-adaptive systems [74] and cloud API [73], has not been studied for a specific performance aspect, e.g., resource utilization [6, 38, 76] or time behavior [44], or has not been proposed in a particular development stage, e.g., early before a prototype is created [43], or late during run-time [60].

  2. 2.

    Issues associated with human factors where it is difficult to understand the model [9, 14], it takes extra effort to create the model [1, 63], or the current approaches are prone to human error [59].

  3. 3.

    The lack of automation in the current MBT approaches [23, 47, 71, 72]

  4. 4.

    Others reasons, e.g., using petri nets to model time behavior aspects [13].

A majority of the analyzed papers (46) suggest a new concept or framework for MBT, using formal theory research (Fig. 4). This set is followed by 41 papers conducting field studies and field experiments that aim at validating the new model presented in the same paper. This focus on theoretical work and studies in a relative controlled environment is another indication that the models are not validated under realistic conditions, as also observed by Prenninger et al. in their review of eight case studies on MBT [57]. A similar observation can be made by looking at the contribution of theses papers in Fig. 4b where most papers introduced new ideas and methods rather than evaluating pre-existing models. It would be crucial to evaluate those models, as the lack of evaluation of MBT techniques poses a risk factor of using those techniques in industry practice. This factor influences the techniques’ reliability, and evaluated techniques would positively affect their adoption in future software projects [19].

4.4 Implications of the SMS on performance requirements in MBT

We gained useful insights into performance requirements modeling in the context of MBT by conducting the SMS. First, performance requirements that were not studied before, (e.g., resources and speed/throughput), gained interest in recent years, as seen in Fig. 6. This is an indication that more research is required in these aspects. Second, some performance attributes (e.g., time behavior) were used as test verdicts [59], while others (e.g., capacity) were used as a foundation to the test environments [36]. Third, performance requirements could be modeled separately from functional requirements, and test environments could be generated from the model [2].

However, we argue that the performance requirements models found by our SMS (Fig. 3), do not satisfy all goals of MBT simultaneously, i.e., support requirements validation, ensure requirements testability, and support test case generation. Therefore, we developed PRO-TEST to aid the model-based performance testing process, which we introduce next.

5 PRO-TEST

In this section, we introduce and evaluate the Performance Requirements VerificatiOn and Test EnvironentS generaTion approach (PRO-TEST). PRO-TEST aims at checking the completeness and correctness of performance requirements and at generating the parameters of test environments.

Figure 9 illustrates the MBT process in the context of performance testing. The figure is a modified version of Utting et al.’s [69] MBT process diagram that we introduced in Sect. 2.2. The modified process steps are shown with dotted arrows, and the modified/added artifacts are filled in grey color. We made three modifications to the diagram. First, we split the step of requirements modeling into two sub-steps: functional modeling (1-1) where a model is created from the functional requirements, and performance requirements modeling (1-2) where performance requirements are modeled. Second, we added an iterative process between the requirements and the created models (functional and performance). This change underlines how MBT supports requirements validation (an MBT goal). The modeling stage should detect requirements issues and changes should be made to the requirements to fix these issues. Third, we added a new Step 5, in which test environments are generated from the performance requirements model. The software performance is thereby directly related to the test environment. Setting up a test environment requires specifying setup parameters (e.g., capacity of users) and metrics parameters (e.g., response time). These parameters are derived from the performance requirements models.

Fig. 9
figure 9

MBT process in the context of performance testing

In summary, PRO-TEST consists of (1) performance requirements modeling and (2) test environment generation. These activities correspond, respectively, to step 1–2 and step 5 in Fig. 9. The approach is not meant to be used as standalone but rather accompanied by any MBT approach that generates functional test cases, which results in functional test cases mapped to test environments that test the performance of the SUT.

We illustrate PRO-TEST’s core concepts in Sect. 5.1 and explain the steps and guidelines for creating the performance requirements model in Sect. 5.2. Additionally, we explain the steps of generating test environments in Sect. 5.3, illustrate its application on an example in Sect. 5.4, and apply PRO-TEST on a set of 149 performance requirements in Sect. 5.5, discussing the strengths and weaknesses of the approach.

5.1 PRO-TEST approach development and description

We intend to propose a modeling approach that addresses the limitations of current model-based performance testing. Specifically, the modeling of the five performance aspects in Sect. 2.1 to verify the requirements while generating test environments. Looking at the existing approaches that we identified in our SMS, we found that performance requirements affect test environments. Therefore, instead of creating an approach that models both performance and functional requirements, we chose to develop an approach that focuses on modeling performance requirements and generating test environments. This approach can be accompanied by existing well-established MBT approaches that already handle functional modeling and testing. By focusing on performance requirements, we increase the chance of our approach being used by practitioners who are already using existing MBT for functional testing, without the need to replace their existing tools but add to what they already use.

The development of PRO-TEST was inspired by two related principles. First, the experiment principle that illustrates the relationship between dependent and independent variables [77]. Second, cause–effect graphs (CEGs) [22] that can be used to model the relationships between causes and effects.

We analyzed the different performance aspects while having the cause–effect concept in mind. The main insight we had is that one set of performance requirements (capacity, resource constraints) can influence another set (time behavior, throughput, efficiency). This concept is shown in a taxonomy tree (Fig. 10) that classifies the aspects in independent and dependent performance parameters. The independent parameters consist of capacity (e.g., the maximum number of users), and resource constraint (e.g., storage size), which represent constraints on the software. The dependent parameters consist of time behavior (e.g., response time), throughput/speed (e.g., requests per time unit), and efficiency (e.g., response time in regards to memory size), and are measurements of the software performance. The manipulation of the independent parameters causes changes in the dependent parameters. For example, if we require the system to use fewer resources (all other things being equal), it will lead to a higher response time, lower throughput, or efficiency. The purpose of this taxonomy tree is to identify which performance requirements are the influencing factors and which ones are impacted, as this is important to distinguish when modeling testable system requirements. The taxonomy tree is by no means exhaustive, but rather a classification of the most common (studied and specified) performance requirements.

Fig. 10
figure 10

Performance parameters taxonomy

In the previous paragraph, we used the term resource “constraints” instead of “utilization” in order to emphasize the interpretation of the parameters as an independent parameter. Looking at our results from the SRS analysis, we found that the specified resource utilization requirements could be both a dependent variable that we measure when we run the tests or an independent variable that affects the dependent variables when constructing and running the tests. For example, if we take the requirement “The FTSS software and the VxWorks operating system, together shall [SRS193] utilize no more than 3 megabytes of ROM.”,Footnote 12 there are two methods to test it. Firstly, we run the tests, measure the utilized ROM, and make sure the software does not utilize more than 3 megabytes. Alternatively, we set up a test environment with 3 megabytes of ROM as a constraint, run the tests, and if the tests run completely, then the software satisfies this requirement. We chose to apply the second method (hence the use of terminology resource constraints) since it works better when the specified requirement affects our decision when setting up the test environment. For instance, to test the requirement “GParted is not a resource hog and will run on almost every computer”,Footnote 13 we can’t run the tests and measure the utilized resources (even if this requirement is to be quantified). Instead we need to define a set of representative computers and run the tests on them.

Figure 11 presents the main components of a performance requirements model. The model consists of three main parts:

  1. 1.

    The object element referring to the SUT or part of it, i.e., a function that has the performance requirements associated with it.

  2. 2.

    The independent parameters which act as inputs. They affect the test environment where the test runs and affect the test data.

  3. 3.

    The dependent parameters which act as outputs. They are the metrics or results of running the tests, used to compare the test results with the written performance requirements.

Performance requirements are modeled with PRO-TEST using the taxonomy tree that acts as a guide when extracting, categorizing, and finding missing performance requirements.

Fig. 11
figure 11

Performance requirements model

5.2 Performance requirements model

There are three steps that should be followed when modeling the performance requirements of the software.

  • Step 1: Define the objects. Look up the object that the performance requirements on hand applies to. The objects could be the system, specific functions, or a collection of functions.

  • Step 2: Define the independent and dependent parameters. Extract the performance parameters from the requirements, and code them with the appropriate performance aspect using the taxonomy tree. Then add those parameters to the corresponding model.

  • Step 3: Compare the model with the taxonomy tree. Take the created performance requirements model and compare it with the taxonomy tree. Look for any possible missing parameters. If some parameters are missing, look for the possibility of merging models with the same object. If there are still some missing parameters, then there is a problem with the requirements. Check with requirements engineers or customers to negotiate the requirements. Otherwise, the model is complete and the specified requirements are quantified and can be tested. When the modeling is done, the next step is to design the test suite.

When using PRO-TEST with performance requirements, one should take into consideration the following guidelines which help to model the requirements.

  • Guideline 1: Verify the completeness of the requirements. Check the relation between different requirements. There should be a correspondent independent input for each dependent output. Having one without the other would result in ambiguous requirements, which would reflect an incomplete performance requirements model.

  • Guideline 2: Verify feasibility. The requirement should fit with one of the performance aspects’ definitions in Sect. 2.1.

  • Guideline 3: Verify quantifiability. Each requirement should have a quantity that describes the target level of performance, and an object that specifies where the target level applies (system, a specific function, or a collection of functions).

  • Guideline 4: Specific condition. Check if the requirements apply in specific circumstances or scenarios. The performance requirements might have the same objects but under different conditions, i.e., peak time. In this case, one should make a different model for each of those conditions, because each condition has different parameters that apply to the test environment and different measurement levels.

  • Guideline 5: Mandatory performance aspects. To generate meaningful test environments, each model requires the following performance aspects to be specified: (1) capacity and resource constraints to help set up the test environment, and (2) time behavior or throughput which acts as the metric to measure when running a test.

While these suggestions stem from our experience of modeling nearly 150 performance requirements from 34 SRS documents, they are not exhaustive and should not be considered as rules.

5.3 Generating test environments

One of the goals of PRO-TEST is to generate test environments, which aids the verification of performance requirements in the SUT. As seen in Fig. 9, the generated test environments are required to run performance test and affect the outcome of performance tests.

Using the created performance requirements models, we generate parameters for the test environments. These parameters are divided into two groups: constraints and metrics. The constraints parameters are required to set up the test environments and stem from the independent parameters in the taxonomy in Fig. 10. The metrics parameters are indicators for the success or failure of the test cases run in the test environment, and stem from the dependent parameters in the taxonomy in Fig. 10.

figure a

We show in Listing 1 the algorithm to generate the test environments that will be used to run the test cases. The algorithm consists of three main steps. (1) Create two lists of parameters, constraintsList and metricsList, and add the parameters from the created performance requirements models to the corresponding list based on the classification in the taxonomy tree. (2) Create an environmentsList, one for each parameter in the constraintsList with all parameters in the metricsList, and an environment where all parameters in constraintsList and metricsList are included. (3) Map the test cases to the created environments in environmentsList. The test cases mapped to the test environments are those that verify the object (e.g., function) to which the performance requirements refer.

To automatically generate test environments from the created performance requirements model, we implemented a Python script.Footnote 14 The script takes as input the list of performance requirements models (in CSV format) created by the tester. The output of the script is a list of test environments (in JSON format). Each test environment consists of a list of constraints to construct the test environment and object-metric pairs that indicate what functions should be tested and measured in this environment. We chose JSON as output format since it is a widely used in practice. Generating test environments in this format makes it fairly easy to adapt to different testing tools.

5.4 Example of PRO-TEST

To illustrate PRO-TEST, we present an example, following the three steps described in Sect. 5.2 for creating the performance requirements model. Then, we generate parameters for test environments following the test environment generation presented in Sect. 5.3. We extracted performance requirements for a telescope control software shown in Table 4.

Table 4 Example performance requirements for PRO-TEST approach demonstration

5.4.1 Performance requirements model

Step 1: Define the objects. We defined five objects from the requirements: command response, status display update, request for status info, user interface and software. Then we created five models, one for each object as shown in Fig. 12.

Fig. 12
figure 12

PRO-TEST Example—Step 1

Step 2: Define the independent and dependent parameters. We extracted the performance parameters (10 active nodes, large number of stations, simultaneous users, 6 active control nodes and 2 monitoring nodes, ≤ 2 s, ≤  4 s, ≤  5 s and network) from the requirements, and coded them with the related performance aspects as per the taxonomy tree. We present the associated performance aspect in the last column of Table 4. Then we added those parameters as independent and dependent parameters in the model as shown in Fig. 13. At this stage we identified four issues in the requirements:

  1. 1.

    PR10 is an availability requirement, which is not to be found in our taxonomy tree (guideline 2), hence we exclude PR10. (2) PR5 indicates that there should be a time behavior requirement for the user interface. However, we examined the SRS document and we did not find any time behavior requirements for the user interface. Hence, PR5 can not be modeled and it indicates a missing requirement.

  2. 2.

    PR1 (simultaneous users), PR6 (large number of stations), and PR7 (network) are not quantified (guideline 3).

  3. 3.

    PR6 is ambiguous as “without appreciable degradation of performance” is not unclear.

  4. 4.

    PR8 and PR9 are conflicting requirements. PR8 specifies a capacity of 8 nodes (6 active plus 2 monitoring), however, PR9 specifies a capacity of 10 active nodes.

Fig. 13
figure 13

PRO-TEST Example—Step 2

Step 3: Compare the model with the taxonomy tree. We compared the created model with the taxonomy tree to identify any possible missing parameters. We put the possible missing requirements on each corresponding model as seen in Fig. 14. Resource constraints parameters are missing from the models and the specified requirements for the software since there were no requirements indicating resource constraints. Another issue is that the requirement PR5 (large number of stations) applies to other parts of the system as well (missing requirement). Moreover, there are no performance requirements from the dependent parameters (time behavior, speed/throughput, or efficiency) that apply to the software or the user interface.

At this point of the analysis, the identified issues should be discussed with the requirements engineers or customers to negotiate the requirements and fix the issues: asking for (1) the missing requirements, (2) quantify PR1, PR6, and PR7, (3) clarify or reformulate the existing requirement PR8 into two requirements, one that specifies the capacity for the software, and the other that specifies the dependent parameter e.g., time behavior, and (4) resolve the conflict in the requirements PR8 and PR9.

5.4.2 Test environments generation

We generate test environments parameters following the test environment generation algorithm presented in Listing 1. We feed each model in Fig. 14 (constraints and metrics) to the algorithm as input, and as output we get a set of environments (one per constraint and one with all constraints). This makes debugging easier, as the tester can identify the troublesome performance constraint(s) just by looking at the constraint(s) used to construct the test environments in which the failed test was run.

Fig. 14
figure 14

PRO-TEST Example—Step 3

We used our test environment generations script to automatically generate test environments from the created models. In Fig. 15, we show the structure of the generated file. The root node of the file contains an array of generated test environments. Each test environment consists of a list of constraints and a list of object-metric pairs. A constraint presented using a description and an att class (the performance aspect). An object-metric pair consists of the object to be tested (e.g., a function), and the metric to be measured. A metric is presented using a description and att class. Errors in the modeled performance requirements will be shown in errors.

Fig. 15
figure 15

Test environment JSON file structure

The results of generating test environments can be found in Table 5. The rows 1-4 can be used to construct test environments. This is not possible for the remaining rows (5-8), as they are missing constraints and/or metrics. For instance, the question mark in row 5 for the constraint simultaneous users is an indication of a missing quantity of simultaneous users. As we mentioned earlier in this section the requirements should be negotiated with the customer, so we can fill the gaps in our tests.

Table 5 PRO-TEST Example—test environments summary

5.5 Sample study—model evaluation

We applied PRO-TEST on 34 SRS documents from the SRS collection. We extracted in total 149 performance requirements from the SRS documents, i.e., requirements that fit the definition of performance aspects in Sect. 2.1.

We extracted the performance requirements from the SRS collection and applied PRO-TEST by modeling the requirements as explained in Sect. 5.2. We did not generate test environments from the created performance relational models, since test environments generation would be more meaningful if used with another MBT approach to generate test cases from functional requirements. This is outside the scope of this paper.

In Table 6, we present two types of defects found by PRO-TEST. The first defect is related to quantifiability. We found that 106 out of 149 requirements were quantified, while the remaining 43 were quantifiable but were not actually quantified (e.g., “The product will reside on the Internet so more than one user can access the product and download its content for use on their computer.”Footnote 15).

Table 6 PRO-TEST evaluation results

The second type of defect is related to under-specified or missing requirements. We found a total of 180 missing parameters in the analyzed requirements. The majority of them (100) were related to resource constraints, followed by capacity (39), time behavior (22), and throughput/speed (19). No missing parameters for efficiency requirements were detected. As defined in Sect. 2.1, efficiency is a combination of more than one parameter. Hence, to some extent, the existence of those parameters (e.g., time behavior and resources constraints) eliminates the need for efficiency requirements.

In the included SRS documents, there were 204 performance requirements, categorized by the original author of the SRS; we identified and categorized 132 of those requirements, while we could not fit 67 requirements to any of the performance aspects definitions in Sect. 2.1. For example, “Assuming submitted statistics for jobs are accurate, the Libra scheduler will ensure that all jobs are completed with a 10% error allowance.”Footnote 16 Other requirements were hard to understand how they fit in performance requirements, e.g., “The database retrieval and update response time shall not impact any other performance requirements such as the GUI response time or monitoring and control responses.”Footnote 17 This requirement mentions response time, but it does not clearly state where does it apply or what the target level of performance is. There were some requirements that were more difficult to identify, e.g., “The HATS-GUI shall allow a user to request transformations while HATS-SML is performing transformations or parsing.”Footnote 18 It could be argued that this requirement is an efficiency requirement. But reading it carefully we concluded that this is not a performance requirement, but rather a usability requirement that demands parallel processing or multitasking. According to Ho et al. [33], a performance requirement can be categorized into four levels (0 to 3). These levels show the maturity, suitability, and validation of performance requirements. Based on their definition, this requirement is classified as level 0 (lowest), which is descriptive and can only be evaluated qualitatively. The requirements in this paragraph were extracted from 2001-libra, an SRS for economy-driven cluster scheduler for high-performance clusters, 2004-rlcs, an SRS for an interstate reversible lane control system, and 2001-hats, a high assurance transformation system. Relying solely on a qualitative evaluation of performance in these systems leads potentially to unsatisfied customers.

Out of the 149 requirements, 43 were not quantified. Those requirements fall into two categories. (1) Requirements with minor issues, i.e., just missing the numerical value. For example “The tools shall be able to scale to process large collections using distributed processing and data transport.”Footnote 19 This is a capacity requirement, that applies to the whole tool (object). However, the size of the collection is not defined; it could be 100 or 100,000. Since the requirement does not specify a range, we do not know how to test it. (2) Requirements with major issues. For example “Loading speed: The data system shall load as quickly as comparable productivity tools on whatever environment it is running in.Footnote 20 This requirement refers to efficiency in an ambiguous manner: “as quick as possible” and “on whatever environment”. No test could be written to verify if the system satisfies this requirement.

Performance aspects were not considered equally by the requirements engineers when writing the SRS documents, which shows the lack of knowledge in the inter-dependency relation between different aspects as shown by PRO-TEST. 100 out of 109 created models had missing requirements in resource constraints. It could be argued that resource constraints are not a part of performance requirements. However, it does affect software performance, and there were some SRS documents that specified resource constraints properly, e.g., “The Framework Shell SHOULD NOT utilize more than 40 megabytes of RAM.”Footnote 21

We generated 96 test environments from the performance requirements models that we created from the SRS collection. All of the generated test environments had missing or unquantified requirements.

Bondi [11] suggested that a performance requirement should have nine characteristics: unambiguous, measurable, verifiable, complete, correct, mathematically consistent, testable, traceable, and can be linked to business and engineering needs. Our study corroborates that PRO-TEST supports a subset of these characteristics: it helps engineers in verifying performance characteristics as it makes lack of information explicit (completeness and quantifiability), it detects unclear information (ambiguity), and associates performance requirements to test environments.

6 Discussion

In this section, we discuss the different aspects of PRO-TEST. We compare the performance taxonomy and the performance aspect inter-dependency relation with those from the literature, list the limitations of the approach, discuss how the approach differs from other MBT approaches, show our observations regarding performance requirements, and finally we discuss PRO-TEST with performance prediction.

6.1 Previous performance aspect classifications

As we saw from our SMS and SRS analysis results, five performance aspects were studied and used in practice. Thus, testers should consider these aspects when testing software performance. Eckhardt et al. [21] specify a template to write performance requirements. They considered three aspects of performance requirements, namely time behavior, throughput and capacity. In addition, they specified performance context (e.g., platform, measurement location and load) as part of each requirement. However, they do not consider resource constraints, but rather the platform (hardware) under which the requirement applies. It is seldom the case that specifying hardware requirements is enough to test system performance and ensure the desired time behavior, throughput and efficiency. For instance, smartphone applications, vehicles software, cloud services, and desktop applications, all share resources with other applications running on the same platform. In this case, performance testing verdicts are more reliable when we specify the available resources for the system rather than the platform it runs on. Nixon et al. [53] categorized performance requirements into time (response time, throughput and management time) and space (main memory, secondary storage). They did not account for capacity which we consider in our taxonomy tree.

6.2 Performance aspects inter-dependency

The dependency relation between the five performance aspects as far as we know was not observed before. Cai et al. [12] considered two aspects of performance, time and space, and called the relation between these aspects side-effects. They did not define clearly the nature of the effect, nor considered the other performance aspects. Eckhardt et al. [21] proposed that each specified performance requirement should have platform and load in the same requirement, since these aspects affect all other performance aspects. They do not consider the case when platform and load requirements are specified in separate requirements, which can be the case as we saw in our SRS analysis results.

6.3 PRO-TEST benefits

Using PRO-TEST to model performance requirements and generate test environments has the following benefits:

  1. 1.

    It helps software engineers to understand the requirements better. When the performance requirements are visualized and by using the taxonomy tree, it becomes easier to find the relation between the requirements and how they relate to functional requirements

  2. 2.

    It acts as a validation tool for the requirements. By modeling the performance requirements, we can find out (1) if there are issues with some requirements, which can not be modeled, and (2) if other requirements are missing.

  3. 3.

    It informs software testers in what environments the tests should be run. This saves time and resources as it allows testers design efficient test suites.

6.4 PRO-TEST limitations

There are some limitation of using this modeling approach to model performance requirements. First, the taxonomy tree is rather abstract. By using the taxonomy, we can identify that capacity requirements are missing, however, currently it provides no support or details about what is missing, e.g., data, users, requests. These could be specified in more detail in further nodes of the taxonomy. Second, the approach is prone to human error. Since the extraction and coding of the parameters is done manually, the process depends on the engineers’ interpretation of the requirements. This could be avoided by automating the process using natural language processing. Fourth, a lack of inspection of the requirements’ quality. As argued by Bondi [11] a good performance requirement should specify to what degree a requirement should be met, i.e., we should specify if the requirement applies all the time or a specific amount of the time (99% of the time). Using the PRO-TEST, we do not detect those quality aspects of the requirements.

The main limitation of our approach of test environments generation, is that it can be difficult for a tester to debug the failed performance test. PRO-TEST generates one test environment per constraint, in addition to a test environment that aggregates all constraints. If performance tests fail in the test environment that aggregates all the constraints, then it is difficult to identify which interaction of the test constrains is the cause of the failure.

6.5 Observations on dependent and independent parameters

The dependent parameters (time behavior, throughput, efficiency) were more often specified than independent parameters (capacity, resource constraints) in performance requirements. This is clear from the results, where out of the 180 under-specified requirements 139 missing requirements were under the category of independent parameters (i.e., capacity and resource constraints). There could be many reasons for this outcome. First, it is possible that some requirements engineers or customers have a misconception when it comes to some performance aspects. Resource constraints could be thought of as part of hardware specifications. Second, it may be more difficult to specify those parameters during the initial stage of a software development cycle. If no prior experience exists, it is difficult to asses how much resources are utilized or capacity required, i.e., no clear estimation existed about capacity. This increases the risk of scalability issues appearing later. Similar to what happened at the PokemonGo launch [51], as the developers did not expect the big surge in the number of users. Third, resource constraints was left out intentionally. Today hardware virtualization is used extensively in deployed applications, and it is very flexible and affordable to invest in higher specs hardware than more efficient software.

6.6 Performance prediction

Performance prediction is an approach to ensure the performance of the system by simulating the system behavior. Similar to MBT, performance prediction can use models to illustrate the system behavior [8, 78]. Performance prediction is used to validate the system performance early before building the system (e.g., in a simulated environment) [8]. In contrast, PRO-TEST verifies performance requirements through modeling and generates test environments for performance testing.

Performance prediction is useful in systems with hardware components, where we want to understand the effect of the components used on the system performance. At the same time, the PRO-TEST and model-based performance testing approaches are appropriate to generate means of testing the software before deployment.

7 Answering the research questions

We answer now our four main research questions.

RQ1 Which aspects of performance requirements are used in MBT?

All performance aspects presented in Sect. 2.1 were used in MBT but to different extents. Time behavior was the most studied by researchers and specified by practitioners in the SRSs. Capacity, throughput, and resource constraints were studied and specified but to a lesser extent compared to time behavior. Efficiency was the least studied aspect with one paper and was only quantified in about 3 out of the 13 written efficiency requirements. We found many models that can be used to model those aspects. We can see in Fig. 3, many of the models were used to model more than one performance aspect.

RQ2 How to implement MBT on performance requirements aspects?

We found 50 models in the literature to model software performance requirements, and grouped them into 11 clusters (Fig. 3). The purpose of those models is to document and visualize performance requirements. Those models do not satisfy the goals of MBT, which are (1) validate the specified requirements, (2) better understand those requirements, and (3) generate a suitable test suite. Hence, we developed PRO-TEST that consists of a model and a taxonomy tree for performance aspects, which verifies performance requirements and generate test environments. The performance requirements model with the taxonomy tree is not just a modeling approach for performance requirements. It is also a concept that identifies the relationship between different performance aspects.

RQ3 To what extend is the identified approach effective at modeling performance requirements written for real-life projects?

The results from PRO-TEST evaluation indicate that the developed approach can be used to model requirements from real-life projects. We applied PRO-TEST to performance requirements from 34 SRS documents. The approach could detect issues related to ambiguity, quantifiability and completeness of performance requirements. We could also understand the interrelation between those requirements. However, there are some limitations to PRO-TEST. (1) The taxonomy tree is not detailed enough, e.g., we do not know which type of capacity is missing (users, data size). (2) Manually modeling the requirements is prone to human errors. Those limitations should be addressed to achieve the maximum benefits of MBT.

8 Conclusions and future work

In this study, we illustrated how PRO-TEST can improve the understanding of performance requirements and support the identification of requirement defects. We conducted a systematic mapping study in the context of model-based performance testing and studied a repository of publicly available software requirements. We found from our SMS that researchers studied and modeled all performance aspects. However, there was a need to develop an approach to verify performance requirements that takes into consideration the goals of MBT. We developed PRO-TEST and showed by our evaluative study that it can be used to verify performance requirements and generate test environments. The benefits of PRO-TEST adds value to MBT. It helps software engineers to understand the requirements better, validate them, and generate test environments semi-automatically. In addition to the performance relational model, we developed the taxonomy tree, which shows the cause–effect relation between different performance aspects.

Future work concerns more in-depth validation of PRO-TEST, finding solutions for the limitations of the approach, extending PRO-TEST to existing diagrams, and other non-functional requirements. We have identified the following possible directions for future work, which would be of benefit to researchers who are interested in this area.

  1. 1.

    Apply the proposed modeling technique on a larger set of well-built SRS with relatively completed performance requirements and to enhance PRO-TEST further.

  2. 2.

    Investigate the possibility of implementing the relational modeling concept in other non-functional requirements, e.g., security.

  3. 3.

    Integrate PRO-TEST with MBT approaches that generate functional test cases, and evaluate the effectiveness of test environment generation.

  4. 4.

    Extend the taxonomy tree by finding the possible sub-categories for the performance aspects.

  5. 5.

    Automate the process of creating the model from natural language requirements to avoid human errors.

Finally, we hope that this list of future work inspires researchers to do more research in the area of model-based performance testing and performance requirements veri.