1 Introduction

The empirical evidence of the impact of requirements engineering (RE) on the software development life cycle has shown that the quality of requirements artifacts and processes influences project success and budget adherence [1,2,3]. Moreover, the cost of defects introduced during the RE phase of a project is reported to scale exponentially the longer they remain undetected [4]. This necessitates quality assurance techniques capable of detecting RE defects as soon and as reliably as possible.

Requirements quality research is dedicated to supporting the software engineering process with the means to evaluate and improve the quality of requirements, mainly focusing on requirements artifacts [5]. However, recent systematic investigations of requirements quality literature revealed a lack of rigor and relevance of these contributions [6, 7]. Moreover, the impact of the quality factors proposed in the literature (i.e., requirements writing rules) remains largely unexplored in practice [7], hindering its adoption in industry [8,9,10,11].

Existing quality theories and frameworks are too abstract to guide requirements quality research at an operational level [12, 13]. These theories often only divide quality into sub-categories without any means of applicability. In this paper, we argue for the need for a theoretical and operationalizable foundation of requirements quality research. We review the closely related software quality research and draw parallels to requirements quality research to consolidate a harmonized requirements quality theory. Additionally, we survey requirements quality literature with respect to the theory to reveal current shortcomings. Accordingly, we make the following contributions:

  1. 1.

    A harmonized requirements quality theory serving as a theoretical foundation for requirements quality research.

  2. 2.

    A survey of requirements quality research revealing if and how concepts of the theory are reported in the state of the art, but also emphasizing shortcomings.

  3. 3.

    A consequent research roadmap aimed at mitigating these shortcomings.

The rest of this manuscript is organized as follows: Sect. 2 illustrates the evolution of software quality research and draws the parallel to requirements quality research. In Sect. 3, we derive a harmonized requirements quality theory from this comparison. This theory is used to evaluate the state of requirements quality research in Sect. 4 and reveal current shortcomings. The consequent research roadmap to mitigate these shortcomings is presented in Sect. 5 before concluding in Sect. 6.

2 Software quality research

Software quality research follows a similar premise as requirements quality research. It is necessary to control the quality of software artifacts (e.g., source code) as it impacts the overall quality of the development life cycle and the final product. This premise aligns with the aim of requirements quality research. To show commonalities and differences between these two research fields, we review the evolution of software quality research in Sect. 2.1 and draw a parallel to requirements quality research in Sect. 2.2. We reach conclusions about the necessary direction the latter needs to take.

2.1 Evolution of software quality research

Software quality research revolves around assessing the quality of software artifacts [14]. In the following, we describe the evolution of the field according to Broy et al. [14] and Deissenboeck et al. [15].

Guidelines and Metrics-based approaches

Guidelines are the simplest approach for controlling the quality of software artifacts. For example, the Java coding conventions [16] prescribe—among other suggestions—how to name and structure Java files. However, guidelines commonly fail to significantly impact software quality, likely because they lack the motivation for their relevance [17]. For example, the aforementioned suggestions are justified because “[c]ode conventions improve the readability of the software” [16] without any empirical evidence of that claim. Furthermore, guideline conformance is difficult to assess and hence seldom done in practice [15]. The latter shortcoming was addressed by introducing metrics-based approaches where metrics were devised to measure relevant attributes of software artifacts. Among others, lines of code [18] and cyclomatic complexity [19] were used to evaluate software quality automatically. Nevertheless, most metrics continue to lack justification of their relevance [14, 20,21,22].

Quality models

To overcome the relevance shortcoming, quality models aggregated metrics into hierarchical trees of criteria [23, 24]. The leaf nodes are specific enough to be operationalized as an evaluation metric, while the aggregation into higher-level quality characteristics provided the justification for their relevance. For example, low-level concepts such as structuredness and conciseness of code were justified by their aggregation to understandability and maintainability, which were widely accepted as relevant software quality characteristics [24]. However, hierarchical models suffered from unclear decomposition rules and constrained levels of granularity, which were either too abstract to be operationalized or too detailed, disconnecting the applicable metrics from their rationale [14, 15].

Quality meta-models

The popularity of quality models necessitated a structure for the proposed models [25]. Meta-models like the Goal Question Metric approach by Basili et al. [26] and the factor-strategy quality meta-model by Marinescu and Ratiu [27] provide this overarching structure. Deissenboeck et al. [28] contribute the DAP classification for quality models, which categorizes the aim of a quality model to be to define (D), assess (A), or predict (P). The publication further postulates quality meta-models as the “model of the constructs and rules needed to build specific quality models.” [28].

Activity-based quality models

In addition to the shortcomings that existing quality models continued to suffer, the elements populating these models were found to be heterogeneous [15]—i.e., properties of a system were mixed with properties of activities in which the system is used. For example, the maintainability branch in the software quality characteristics tree by Boehm et al. [29] contains both system properties like the structuredness of a software artifact, but also attributes of activities in which these artifacts are used, like modifiability. The latter describes the activity of modifying an artifact rather than a system property, despite the adjective’s nominalization suggesting otherwise.

So far, no clear rule for distinguishing a system from an activity property has been proposed. We derived two heuristics from the implicit argumentation of previous publications [15]. First, if a property involves an additional agent (e.g., testability involves a test engineer, modifiability involves a modifier, although not necessarily human), then it represents how the system is used—i.e., an activity property. The second heuristic comes in the form of a syntactical criterion:

  • Nominalized adjectives (e.g., structured-ness, concise-ness) tend to be system properties

  • Nominalized verbs (e.g., modify-ability, access-ability, augment-ability) tend to be activity properties

Interpreting activity properties as system properties ignores an underlying impact relationship. For example, interpreting modifiability as the system property of how receptive it is to change omits that actual system properties (e.g., whether the system is digital or analog or who has writing access rights) impact the ability of a stakeholder to modify the system, which is an activity property.

To address the issue of heterogeneous properties, Deissenboeck et al. introduced activity-based quality models [14, 15], which separate system properties from activity properties and form two distinct, orthogonal dimensions. The model expresses quality as the impact of system properties on activity properties. Figure 1 visualizes a simplified version of the quality model [15], showing how code clones impact the modification sub-activity and expressive identifiers impact the concept-location sub-activity.

Fig. 1
figure 1

Excerpt from the activity-based quality model for maintainability

The activity-based quality model was successfully applied to usability [30], security [31], and service-oriented architecture [32] before Wagner et al. distilled a comprehensive activity-based meta-model in the scope of the Quamoco project [33, 34]. In parallel, the original use case of the activity-based quality model, which focused on maintainability, received extensive tool support [35, 36] contributing evidence to the operationalization of quality models in practice [37].

Activity-based quality models solve limitations of previous quality models at the cost of increased complexity, which manifests in additional challenges to operationalize and communicate the notion of quality [38]. However, the complexity of these models is necessary to tackle the faceted concept of quality [38, 39]. Research continuously tackles the inability of activity-based quality models to assess artifact quality and distinguish quality levels [40]. For example, weights empirically derived from historical data replaced expert-based propositions [41], and Bayesian networks were utilized to model the impact relationships [42].

2.2 Mapping to requirements quality research

In the following, we draw a parallel of the evolution of quality research between the areas of software engineering and requirements engineering.

Metrics and quality models

Similar to software quality, requirements quality research historically originated from proposing metrics like passive voice of requirements sentences [43] or sentence length [44], which are associated with bad quality of requirements specifications. An ongoing research endeavor [7] collects these quality factors and indicats their limitations. Most existing publications either fail to gauge the impact of these metrics [45] or explicitly disregard their relationship [46]. Requirements quality models [47, 48] integrate these factors into larger frameworks but often remain vague on their notion of impact.

The investigation of impact is often limited to a comparison between the quality factor and practitioners’ subjective, general perception of the quality of the requirements entities [49]. Wilson et al. contribute a first impact matrix between quality indicators and quality attributes [50], but the latter suffers from the same system and activity properties heterogeneity. Similarly, Yang et al. [51] state that “[a]mbiguity is therefore not a property just of a text, but a conjoint property of the text and of the interpretations held by a group of readers of that text,” exposing the necessary distinction between system and activity properties.

Activity-based requirements quality

A large portion of requirements quality research exhibits the same shortcomings identified and overcome by software quality research, namely that (1) requirements quality factors lack relevance due to their unknown impact, which in turn inhibits adoption in practice, and (2) the terminology of requirements quality aspects confuses system and activity properties.

Femmer et al. apply the activity-based quality perspective to requirements engineering by proposing the activity-based requirements engineering quality model (ABRE-QM) [52]. This model leverages the insights from activity-based software quality models [15, 17, 33] and shows that the quality of requirements depends on the impact they have on the activities in which they are used. However, despite the authors’ call for action [53], ABRE-QM saw little adoption in research as demonstrated in recent systematic investigations of the requirements quality literature [6, 7].

The ABRE-QM example above raises the concern that requirements quality researchers do not properly utilize the activity-based approach successfully employed in software quality research. In this manuscript, we want to encourage further research on this approach by presenting a revised requirements quality theory, a thorough investigation of the requirements quality literature verifying the hypotheses from previous studies [6, 7], and a consequent research roadmap.

3 Requirements quality theory

We generated a harmonized requirements quality theory (RQT) by consolidating the evolution of software quality models described in Sect. 2.1, their application in requirements engineering as described in Sect. 2.2, and alignment to the established Quamoco quality model [34]. In terms of theory types [54], the RQT is both explanatory, as it explains the notion of requirements quality, and prescriptive, as it prescribes how to report contributions to requirements quality. The building blocks of the theory are described in Sect. 3.1 and illustrated with an example in Sect. 3.2.

3.1 Theory

The concepts that constitute this theory are visualized in Fig. 2, and each concept is described in Table 1. The model represents an evolution of the original activity-based requirements engineering quality model (ABRE-QM) proposed by Femmer et al. [52]. Here, we present changes to the original model.

Fig. 2
figure 2

Concepts of the requirements quality theory

Table 1 Explanation and origin of theory concepts

The artifact-related section of the model (left part of Fig. 2) is largely equivalent to the original publications [15, 52]. Entities represent requirements artifacts of different granularity [5], which can be decomposed into further entities. For example, a requirements specification can be decomposed into sections, which in turn consist of paragraphs and sentences or requirements. We consider an artifact to be a high-level requirements entity and hence do not explicitly add the artifact to the model, deviating from the original [52]. Similarly, factors can be decomposed into sub-factors to accommodate composite factors. For example, Antinyan et al. [58] position their proposed quality factor of conjunctive complexity as a sub-factor of syntactical complexity.

The activity-related section of the model (middle part of Fig. 2) again adapts the original models [15, 52]. The concept activity does not represent common requirements activities, like elicitation, analysis, and validation [59], but rather every process that takes a requirements entity as input and produces an output. This includes some requirements activities (like analysis and validation, which use requirements as input) but not others (like elicitation, which often does not presuppose existing requirements). Hence, we rather refer to them as requirements-affected activities. These further include implicit sub-activities (e.g., understanding and interpreting an entity), which can be aggregated with other, more explicit sub-activities (e.g., test case design) to form high-level activities (e.g., validation). The decomposition relationship of the activity concept accommodates this aggregation. To accommodate not only human actors involved in activities but also any automatism like requirements processing tools [60] we abstract the concept of stakeholder to agent.

We generalized the impact concept in this theory. While previous models assumed that impact is categorical (i.e., the occurrence of a fact has either a positive, negative, or no impact at all, like in Fig. 1 [15] or linear (i.e., the larger the evaluation of a quality factor, the better/worse is its quality), we consider the impact to model any kind of relationship between Entity-facts and Activity-facts. This opens up the theory to more complex relationships, which can model the actual impact more accurately and allows to compare the impact of quality factors with each other.

Two concepts were added to the model. First, the impact was related to an Activity-fact composed of an activity and an attribute as proposed by Winter et al. [30]. This way, the structure of the variables on the two sides of the impact relationship is mirrored. Furthermore, the necessity to associate an impact with a measurable property of an activity is emphasized. Second, context factors also influence the impact of an Entity-fact on an Activity-fact. As recognized by previous publications [55, 56], the impact differs depending on external factors related to, among others, the organization and the people involved [61].

The economic section of the model (right part of Fig. 2) is a novel addition to previous iterations of the activity-based models [15, 34, 52]. As long as the subsequent economic impact of an Activity-fact is unknown, the Entity-fact that produces the Impact on this Activity-fact will remain neglected [56, 57]. Hence, the software process economics perspective introduces a Cost for a specific Resource such as time or money.

3.2 Example

In this section, we illustrate the RQT with a fictitious example to demonstrate its application. The example is additionally visualized in Fig. 3.

Fig. 3
figure 3

Exemplary instantiation of the theory

In this example, a customer’s requirements were elicited and documented in a requirements specification containing the entity user story 42. One relevant quality factor used by the organization responsible for implementing the requirements is template conformance, which prescribes that all user stories must follow the Connextra template [62] “As a \(\langle\)role\(\rangle\) I want to \(\langle\)goal\(\rangle\) so that \(\langle\)benefit\(\rangle\).” This quality factor maps the entity to a categorical value, containing—among others—the values conform, missing role, and missing all elements. In this example, the role is omitted from the user story. Hence, the quality factor template conformance is evaluated to missing role, which constitutes the entity-fact (yellow box in Fig. 3).

The organization uses this user story in a subsequent, requirements-affected development activity, where a different stakeholder—the developer—is responsible for translating the entity into code. This activity can be decomposed into two distinct sub-activities: understanding the entity and programming the respective implementation.

One desired attribute of the activity understanding is determinism—i.e., a requirements entity should have only one unique interpretation. Possible variations of the interpretation and, therefore, the subsequent translation of a requirement must be avoided. Because the conformance quality factor is evaluated to missing role on the user story entity, the understanding activity is less deterministic, as the developer can make a different assumption about the role implied by the requirement. The understanding activity has become ambiguous, which constitutes the activity-fact (orange box in Fig. 3).

The relationship between the entity-fact and the activity-fact is the impact of the quality factor. Instead of limiting the impact concept to categorical values (e.g., either has an impact or has no impact), the RQT enables more complex impact relationships. In this fictitious example, the quality factor value missing role is associated with a 64% chance of making the understanding sub-activity ambiguous. This relationship can be determined empirically via experimental research investigating the likelihood of the different values of the conformance quality factor reducing the determinism of the understanding sub-activity.

The programming sub-activity may go unaffected by the entity-fact that the conformance has a value of missing role (green box in Fig. 3): regardless of the agent’s interpretation of the requirements entity, the programming sub-activity will remain unaffected in respect to the relevant attribute duration under the assumption of a similar user interface for both roles. Whether the feature is coded for the role receptionist (as the customer intended) or patient (as the developer assumed) does not significantly change the duration of the sub-activity if the user interfaces only barely differ.

The significant impact on understanding is influenced by the organizational model, which is one relevant context factor. Since the organization is globally distributed and the two involved agents are unlikely to have informal interactions, the impact is amplified. In contrast, in a small organization where all involved agents share an office, the impact can be alleviated as missing information is recovered through informal communication. Similarly, the software development process model may significantly influence the impact of the quality factor, and the use of an agile approach may reduce the impact by encouraging communication between the customer and developer. The context factors significantly influence the impact and, therefore, have to be included in the relationship between entity-facts and activity-facts.

The reduced determinism of the understanding activity has an economic effect—i.e., the less deterministic the activity is, the more the implementation needs to be revised, which costs money and time (red box in Fig. 3). Context factors influence the extent of this effect as, for example, a re-implementation can be more costly in larger organizations due to organizational overhead.

For the sake of brevity, the example omits the following aspects: (1) the example limits the number of elements populating the relationship. More quality factors of the entity, activities, attributes of activities, and context factors are possibly involved in the relationship. (2) Interaction effects between quality factors and context factors are plausible but not reported here.

However, the example demonstrates how adherence to this activity-based RQT elevates requirements quality factors from normative rules (i.e., user stories must conform the template for the sake of it) to empirically backed impact predictions (i.e., user stories must conform the template to mitigate ambiguous interpretations and avoid implementation cost).

4 State of research

Despite the publication of the ABRE-QM [52] and its authors’ proposition to adapt the quality meta-model for future requirements quality research [53], recent systematic reviews raised concerns regarding a perspective on requirements quality limited to the artifact-related section of the model (left part of Fig. 2) [6, 7].

To validate these concerns, we formulate the following research question. How are the concepts of the requirements quality theory reported in requirements quality literature? Answering this research question requires extracting information from a population of publications; accordingly, we employ survey research as our approach to gain insight into the current state of research. We follow the survey guidelines by Molléri et al. [63] and report our survey in the following subsections. All supplementary material for replicating this study is available in our replication package.Footnote 1

4.1 Survey objects

The target population of our survey is the requirements quality literature dealing with quality factors in requirements artifacts. In a previous research endeavor [7], we conducted a systematic study on requirements quality factors, including a sample of 57 primary studies. To our knowledge, this is the only sample that fulfills our aforementioned requirements. This classifies the sampling as non-probabilistic, more specifically convenience sampling [63].

4.2 Study design

We follow the recommended practices for the survey research process and report our steps accordingly [63]. However, we disregarded steps that only apply to surveys with human subjects, such as participant recruitment and response management.

We derived the definition of the research objectives in the form of the research question directly from previous research [6, 7, 53]. We established a study plan, rigorously documenting all research progress and justifications for any deviations during the process. We identified and characterized the population of our survey and executed our sampling plan as described in Sect. 4.1.

For our instrument design, we maintained two artifacts. We created an extraction guideline based on the RQT concepts. Each concept of the RQT was associated with one or more categorical variables, each containing a set of codes that represented if and how the concept was reported. The codes were created ad hoc in the first iteration of extraction and refined based on discussions and theoretical background in the second iteration.

The extent of the codes varied. The codes that represent how the concept entity is reported are, for example, explicit and implicit. An entity is either reported explicitly if its scope and form are clear. It is reported implicitly if the authors just report that the factor applies to a “requirement” without defining whether this is a single or multiple natural language sentence, whether the language is constrained or not, or whether it assumes a full sentence at all.

The codes of other concepts were more complex and grouped into distinct categories. For example, the codes of the concept Factor were split into two groups, representing both the explicitness when reporting a factor (i.e., whether the factor is explicitly reported or referenced from another publication) and the form in which the factor is reported (i.e., if the factor is represented with a textual description or defined using a logical or mathematical formula).

The first author extracted the appropriate code for each concept in the requirements quality theory from each publication. The extractions for each publication in the sample were recorded in a spreadsheet. For instrument validation, the second author of this manuscript independently performed the extraction task using the guideline on six (\(\approx 10\%\)) publications randomly sampled from the survey objects. The second author performed the extraction on two of these six publications as training, and the remaining four were used to calculate the inter-rater reliability between the first and second author.

The task overlap achieved an percentage agreement [64] of \(83.3\%\), whereas Cohen’s Kappa yields a moderate agreement of \(54.2\%\). As Cohen’s Kappa is unreliable for uneven marginal distributions [65], we calculated the more robust S-Score [66]—yielding a good agreement of \(76.8\%\)—which we deem sufficient for assessing the inter-rater reliability.

We used the codes in the data analysis phase to generate descriptive statistics on which we based our interpretation of the state of requirements quality. These form a quantified foundation for interpreting the state of requirements quality literature with respect to the research question. For final reporting, we adapted established reporting guidelines [63] and disclosed all material in a reusable replication package.

4.3 Study results

Figure 4 visualizes the distribution of the relevant codes among all concepts included in the requirements quality theory. Each concept is overlaid with a bar representing how many of the 57 publications contained the concept. The row below each concept represents its dimensions derived from the appropriate codes.

Fig. 4
figure 4

Survey results depicting the distribution of codes

Though both entities and factors are explicitly reported in all 57 publications of the sample, a large portion (\(24/57 = 42.1\%\)) of entities is reported implicitly—i.e., the entity’s scope is not clear. This occurs mostly because authors attach the reported quality factor to the entity requirement without specifying the scope or form of the entity. Montgomery et al. [6] have already noted this shortcoming in the requirements quality literature and it represents a terminological ambiguity in the research domain.

Seventeen out of 57 publications (\(29.8\%\)) do not report any impact on activities (code N/A) and hence neglect the practical relevance of the proposed quality factors. Agents are only reported in 14 (\(24.6\%\)) of all publications. Activities are—when reported—predominantly elicited ad hoc (\(37/40 = 92\%\)) and rarely systematically—i.e., when activities impacted by a quality factor are discussed, the identification of activities has no systematic approach. Attributes are also only rarely reported (\(8/57 = 14\%\)).

We grouped the codes classifying how impact is reported into four distinct dimensions, two of which are reported here. The evidence for the impact—when at all reported—is dominantly hypothesized (\(19/40 = 47.5\%\)) and rarely either inductive (\(11/40 = 27.5\%\)) or referenced (\(10/40 = 25\%\)), i.e., draws the evidence from another publication. Previous studies [6, 7] have also noted this dominance of anecdotal, non-empirical evidence. The modality of impact relationships is balanced between necessary and possible—i.e., the impact of quality factors is reported almost equally often to be certain or potential. The remaining two dimensions of impact (generality and frame of reference) yielded no additional insight into the surveyed objects and are hence not reported here but contained in the replication package.

Context factors are almost completely neglected and only reported to a degree varying between zero (no publication reports the influence of any tools) and \(24.6\%\) (14 out of 57 publications reporting product-related factors, e.g., the system’s size or type).

Both cost and resources are reported only rarely (\(9/57 = 15.8\%\) and \(5/57 = 8.8\%\), respectively) and, if so, only hypothesized or referenced, never determined empirically. Money and time are mentioned as the resources affected by activity impact, and the cost is only estimated in terms of expected change (e.g., “reduction of the time spent” [46]) or general magnitude (e.g., “significant amounts of money” [67]).

4.4 Interpretation

In this section, we interpret the results presented in Sect. 4.3 and answer the research question.

Publications in the requirements quality literature adhere to the RQT to a varying degree. While all publications in the sample mentioned both an entity and a quality factor, activity-related concepts, context factors, and the economic impact are often neglected. Failing to consider the context factors severely threatens the external validity of the proposed quality factors [55, 56] and neglecting the economic impact risks undermines their acceptance [56, 57].

Context factors and economic impact are arguably more challenging to investigate [68]; however, we emphasize that the lack of activity perspective when proposing quality factors is critical for several reasons. The complete negligence of a quality factor’s impact limits the factor to a normative, unmotivated prescription and challenges its practical relevance [52], which in turn promotes skepticism regarding requirements quality factors in industry [8,9,10,11].

The survey emphasized two additional shortcomings in the field of requirements quality research. First, the tendency to elicit activities ad hoc when discussing the impact of requirements quality factors bears the risk of missing other important impacts. Most publications discuss a hypothesized impact of a quality factor on a non-systematically selected activity or set of activities. This selection is usually justified by anecdotal or folkloric circumstances, like “[a]mbiguous requirements may bring about misinterpretations among stakeholders, and prompt a few issues” [69].

While these impact relationships are neither empirically proven nor falsified, the non-systematic selection of activities can disregard other impact relationships. Femmer et al. [52] demonstrated that a systematic elicitation of activities could reveal both positive and negative impacts by the same quality factor. For example, the factor free of UI design details, which states that an “artifact should describe the problem domain instead of the solution domain” [52], will positively affect maintainability, as UI details are volatile in the beginning and require a lot of change management if specified in a requirement. Conversely, the same factor negatively impacts understandability, as the presence of UI design makes requirements more comprehensible.

Second, while activities are not reported consistently, attributes of activities are reported even less. Attributes represent measurable characteristics of activities; for example, the activity understanding can be quantified by its attribute level of agreement [58, 70] or a readability index [71]. Neglecting the quantifiable attributes of activities impedes an empirical evaluation of a quality factor impact because it omits the measurement instrument for the dependent variable (i.e., the activity-fact) in the impact relationship [30].

We conclude that the requirements quality theory is implicitly embedded in the requirements quality literature. However, insufficient adherence to it results in several limitations when reporting new requirements quality factors. While the artifact-centric theory concepts are commonly covered, activity-centric concepts, context factors, and economic concepts receive less attention, which decreases these publications’ practical relevance. With this study, we empirically confirm the concerns voiced in previous investigations of the requirements quality literature [6, 7].

4.5 Threats to validity of this research

We discuss the threats to validity proposed by Wohlin et al. [72] and extended by Molléri et al. [63].

Internal validity

We acknowledge a threat to internal validity due to sampling of publications. The method of object selection [6, 7] is deemed sufficiently rigorous to derive an initial theory.

Construct validity

The constructs in this study—i.e., the elements of the theory—are established strictly following mature quality theories from the field of software quality. This ensures the alignment between the underlying theory and measurement constructs.

The lack of a theory to which the surveyed publications could have adhered when reporting quality factors resulted in the concepts of requirements quality often being embedded implicitly, complicating the extraction task. We minimized the resulting threat to internal validity through independent labeling and calculating appropriate inter-rater reliability metrics [65].

External validity

The selected sample of publications [7] is constrained to empirical contributions to requirements quality research [6]. This limits the conclusion validity of the type of evidence for the impact concept, as non-empirical work could contribute theoretical evidence for impact relationships. For example, the impact of quality factors like nominalization [73] can be derived deductively by referring to valency reduction caused by nominalization [74]. While publications utilizing linguistic theory are unknown to the authors, a valid conclusion regarding this type of evidence requires a more thorough extension of the sampling strategy.

5 Research roadmap

Femmer et al. proposed an initial research roadmap detailing how to advance the field of requirements quality research [53]. Based on concerns of previous studies [6, 7] and the survey of the state of research reported in this study, we assess and update the three suggested steps by Femmer et al. [53]:

  1. 1.

    Creation of “a reference artifact and a usage model” eliciting typical entities, activities, and agents.

  2. 2.

    Creation of “a taxonomy of quality factors” as a central, accessible repository of quality factors.

  3. 3.

    Creation of “a taxonomy of impacts” as a catalog of impacts from quality factors onto activities.

We reflect on these proposed research streams in Sects. 5.1 to 5.3 and add three further proposals in Sects. 5.4 to 5.6. Because these research streams are grounded in the experiences from the software quality research, we expect contributions to them to promote requirements quality research that is relevant to practice.

5.1 Artifact and usage model

Mendez et al. have contributed a reference artifact model for requirements engineering [5, 75] based on their fundamental positioning on artifact orientation [76, 77]. The AMDiRE approach constitutes a domain-agnostic reference for artifact types and serves the purpose requested by Femmer et al. [53] in that it can be tailored toward any industry context to model an artifact structure.

While the elicitation of human [78] and non-human, automatic agents [79] has been addressed, a reference model for activities requires explicit attention in the literature. More importantly, with the update of the requirements quality theory over the initial ABRE-QM [52], we argue that a reference model for requirements-affected activities needs to provide attributes to quantify each activity. Such attributes enable an empirical assessment of the impact of quality factors.

Additionally, a majority of publications reporting an impacted activity mention some variation of understanding or interpreting (\(32/40 = 80\%\)). We assume that every requirements-affected activity comprises an initial interpretation sub-activity. However, such composition is obscured by the lack of a proper reference model for requirements-affected activities accounting for their aggregated nature.

It is conceivable that the interpretation sub-activity is most prone to defects, which explains the research community’s focus on ambiguity [6], as ambiguity represents the non-determinism of an interpretation. We argue that a proper reference model for requirements-affected activities accounting for their aggregated nature can steer research toward identifying critical sub-activities—i.e., the ones most prone to impacting subsequent activities.

5.2 Taxonomy of quality factors

Requirements quality factors [7, 53] are the cornerstone of artifact-centric quality assurance. The requirements quality factor ontology [7] furthered this research stream. Although the ontology is in an early stage and requires additional iterations, quality factors and related objects—such as data sets and automation approaches—are now collected in a central repository.

5.3 Taxonomy of impacts

The taxonomy of impacts that Femmer et al. [53] deem the necessary final step of the roadmap has to be extended. Previous quality models—including the ABRE-QM [52]—consider only categorical or, at most, linear impact relationships. Therefore, a taxonomy seemed sufficient to record “a list of well-examined effects of quality factors on activities” [53]. We argue that the impact relationship can be more complex and requires a more general representation—i.e., rather than aiming for a taxonomy of impacts, we argue for developing an impact framework.

Given the evaluation of quality factors on requirements entities on one side and the evaluation of activity attributes on the other side, the impact relationship between these variables can be formulated as a regression problem. Instead of relying on experts to hypothesize the (categorical) type or (linear) extent of an impact, more complex relationships can be determined using, for example, Bayesian data analysis [80]. Consequently, this research stream aims to develop an impact framework capable of determining these impact relationships based on statistical instruments given sufficient data.

5.4 Context factors

Context factors must be considered in the impact relationship to operationalize the requirements quality theory [55]. Large-scale endeavors acknowledge the importance of context factors in regard to requirements quality [1], yet no unified collection of context factors relevant to requirements engineering exists. Established sets of software engineering context factors [61, 81] can be used as a starting point but require a dedicated investigation from the requirements engineering perspective.

A clear set of relevant context factors can support developing reporting guidelines for empirical studies on requirements quality and enable context-driven research [82]. While empirical software and requirements engineering publications typically strive for generalizability [81], scoping an empirical study according to the given context factors allows the data collected in that study to be integrated into the impact framework as outlined in Sect. 5.3. Conversely, reporting the limited scope of a study enables a general requirements quality theory that can be assembled from multiple studies in well-defined contexts.

5.5 Economic impact

With the addition of economic concepts in the requirements quality theory, a research stream should be dedicated to the economic impact of activity facts. The impact relationship between quality factors and activities already benefits the acceptance of those factors for quality assurance in practice [53]. Adding an economic perspective—i.e., what amount of which resource a change of a certain activity-fact entails—can further bridge the gap between the normative, artifact-centric quality factors on one side and an economic decision-making process on the other side [57]. Since the purpose of quality factors is to support quality assurance in industry, understanding this economic perspective is of high priority despite the complexity of the topic.

5.6 Tool support

Fig. 5
figure 5

Architectural overview of the proposed tool-support

We aim to make the RQT applicable to the industrial context through the development of tool support. The components necessary to realize this tool support are visualized in Fig. 5. The goal of the tool is to estimate the impact of requirements entities and their context on the attributes of requirements-affected activities.

To this end, the tool needs an interface to the requirements entities, context information about the involved agents, and context information about the organization. The former two are often available in a requirements tracking system like Jira, while the latter a company likely has to generate and provide manually [83].Footnote 2

Once provided with the necessary information, the tool characterizes both entities and context, i.e., quantifies the natural language requirements entities and the elusive factors determining the context. The quantified entities and context serve as input to the impact prediction model as described in Sect. 5.3, estimating the impact on the attributes of the requirements-affected activities, which in turn enables quantifying the economic impact as described in Sect. 5.5.

The realization of this tool depends on the previously described streams of research to identify valid quality factors (Sect. 5.2), context factors (Sect. 5.4), and activity attributes (Sect. 5.1). For the tool to provide an automated impact prediction the following automation modules must be realized:

  1. 1.

    Automatic entity characterization: a shared architecture to automatically evaluate the requirements quality factors collected in the quality factor ontology [7]

  2. 2.

    Automatic impact prediction: an accessible statistical model estimating the impact of quantified entities and context on affected activities, trained on historical data.

Developing this tool while adhering to open science principles will allow scholars to propose new quality and context factors, customize relevant activity attributes, and contribute historical data to improve the impact estimation of the prediction model. We invite contributions to the implementation and maintenance of the tool via its dedicated repository on Github.Footnote 3

6 Conclusion

In this manuscript, we investigated the software quality literature and the application of the activity-based quality perspective to the requirements engineering domain. We extend the work of Femmer et al. [52] by proposing an evolved and harmonized requirements quality theory and assess the adherence of the requirements quality literature to this theory. Our survey confirms the bias toward artifact-centric and the negligence of activity-centric concepts, which was noted in previous secondary studies [6, 7]. Finally, we update the requirements quality research roadmap initiated by Femmer et al. [53] to guide future contributions in the requirements quality research domain.

We are confident that the harmonized requirements quality theory provides the necessary guidance to propel requirements quality research and establish a common understanding of quality that is operationalizable in practice. We invite fellow researchers to contribute to the theory and the requirements quality research field in adherence to it.