Introduction

This paper copes with the problem of compliance checking and assessment (which we will define it carefully in the first subsection of this introduction). In the technical literature, there are some approaches to deal with this problem. Sharing a theoretical nature, many compliance assessment approaches are based on a systematic, rigorous, and formal theory of proof or evaluation (such as logics, formal languages, proof systems, reference models, and domain models). While the proposed theoretical approaches have high levels of internal integrity but usually they suffer from the lack of supporting tools and the ability of adaptation to the diversity and complexity of real-world cases. In contrast, sometimes an aggregation of multiple simple tools is more successful than a single, rigid, unified, in-depth-designed, and sophisticated tool.

In some real-world compliance-solving cases, more lightweight approaches to formal specification that support semantic modeling (e.g. generative grammars, production rules, set-theoretic notations, and rewriting logics) can play central roles in overcoming semantic diversity and complexity. These approaches support a kind of semantic compilation of diverse, domain-specific semantic models. For instance, generative rules can be set to define the mapping and composition logic of different, independent semantic models.

In this paper, we propose KARB solution system, i.e. Keeping away compliance Anomalies through Rule-based Benchmarking. In fact, rule-based benchmarking means evaluating an under-compliance system with its symbolic specification and using a set of symbolic rules (on the behalf of the semantic logic of evaluation).

In the remaining part of this first section, we will introduce the primitive concepts of the problem domain and review related works. In the second section, we introduce the KARB solution. Then, in “Case study: software quality evaluation” section, a case study is presented for the application of this method for the issue of compliance to quality in the field of software engineering. In “DD-KARB” section, we introduce DD-KARB, which extends the KARB solution by a data-driven approach. In the last section, “Evaluation and discussion: IR-QUMA study” section, we have an evaluation and discussion based on IR-QUMA study using DD-KARB method.

Compliance checking

Compliance solutions concern assessment, evaluation, verification, validation, and checking of systems, services, processes, products, designs, organizations, or environments with regard to rules, regulations, laws, standards, specifications, policies, guidelines, protocols, methods, principals, and reference-models [1, 2]. The application domains that need and use compliance solutions including organizations and corporates in the following areas: software and IT industry [3], e-governance [2], finance and banking [4], legal sectors and professions [5], commerce and trade [6], highly regulated industries (e.g. food [7] and drug, medical services and devices [8], and construction industry [9]), complex and interdisciplinary products and services [10], emerging technology products and services (e.g. cyber-physical systems [11], self-driving cars [12], cognitive robotics and agents [13], and smart applications [14]).

There are many prominent compliance concerns that have been considered by numerous regulations, standards, laws, and acts. The most important concerns are as follows: security [15], safety [16,17,18], privacy [19, 20], data protection [21], accountability [22], responsibility [23], transparency [24], competency [25], anti-piracy [26], anti-corruption [27], antitrust [27], accessibility [28], HCI, quality management and assurance [29,30,31], environmental management [32], sustainability [33], usability [34], human comfort [35], ethics [36], conformance with the disabilities [37], adherence to the children [38], the elderly [39], simplicity [40], and ease of use.

Modern paradigms have amplified the necessity of compliance requirements (paradigms such as standardization in business, automation in industries, artificial intelligence and ubiquitous computing in society, complex systems engineering, sociotechnical systems, ongoing growth in the economy, social complexity, and quality maturity of services/processes).

In [41], a formal definition (as a 4-tuple) was presented for a special kind of benchmarking. There are a few formally-defined frameworks for compliance checking in legal applications, which are defined in theoretical manners such as formal systems [5] as well as conceptual modeling of legal texts [5]. Grammar-like and production-rule formalisms have been suggested for automated compliance checking in legal applications [42,43,44,45].

Rules and grammars for architecture conformance checking, especially for “software quality assurance” [46], is another application domain. Some rule-based approaches for architecture selection relate the nonfunctional requirements, domain requirements, and quality characteristics to architectural styles [47], architectural models [48], architectural patterns [49] and architectural aspects [50].

Circuits and flows are considered recurrent modeling approaches in systems engineering. Some researchers regard circuits and flows as a basis for compliance modeling, checking, and benchmarking [51, 52]. There are numerous verification tools and solutions for flow-based models. These tools serve as a means of compliance checking. For instance, agent-coordination protocols for crisis situations could be modeled and checked by these tools and solutions [53].

In software engineering, there are some model-based approaches to compliance assurance [54]. These approaches employ a modeling notation or framework (e.g. UML, KAOS [55], and GSN [56]). A “model-based assurance case” is an approach to safety compliance management. It encompasses a compliance meta-model covering “claims” or “requirements”, “evidence”, “arguments”, and “contexts” [57, 58]. In Kokaly et al. [54], the need for a general model of compliance and compliance activity is addressed as an open-ended problem.

In Zhang and El-Gohary [9, 59], very close approaches were introduced. The meta-model and system architecture of these approaches are comparable with the one proposed in this paper. There are some other meta-models for compliance checking applications and frameworks (see [3, 60,61,62]).

Benchmarking

A benchmark is the common or standard infrastructure employed to analyze, evaluate, and compare the reality of solutions, tools, or systems by their executions (for a few definitions, see [63,64,65,66,67,68,69,70,71]. For some instances in other fields, see [30, 72,73,74,75,76]). Sometimes, a measure can simply be used as a benchmark [77]. Procedures, measures, and computers are considered the most common concepts in diverse definitions of benchmarking (Fig. 1).

Fig. 1
figure 1

A segmented word cloud showing the emphasis on different concepts in definitions of benchmarking

For some early attempts in the history of benchmarking in IT and computing, see [78,79,80,81,82]. There is also a growing trend in benchmarking for quality assurance, management, and process improvement [83, 84]. It has been a progressive journey so far [85,86,87].

From a managerial standpoint, benchmarking requires a significant investment in time and perhaps money [84]. Hence, it should be considered a long-term profitable activity and a sort of infrastructure development for a field. In cloud computing, prior investments in performance measuring tools lead to the already available tools for the new filed (for a case, see [88]). It was a chance; however, the dedicated attempts began for defining and developing benchmarks for cloud computing from scratch [89]. Moreover, investment in benchmarking is also important [90], for this decision can benefit all stakeholders [85].

Successful notions of benchmarking (in every field) are characterized by a community that creates, promotes, and uses benchmarks. Benchmarking can also be viewed as an applied manifestation and adoption of community knowledge and expertise [91].

As another instance of the application domains, there are informal guidelines for ensuring the quality of software (in terms of quality attributes such as security, integrity, and maintainability). Formal specification and automatic checking of these guidelines can contribute to the higher quality assurance of software (see [92] as an example for the formalization and automation of security guidelines).

Benchmarks can assess the quality (rather than only functionality) [29, 69, 71]; therefore, they are suitable for formal or systematic qualitative analysis of systems. For instance, security and compliance benchmarks have been reported [66, 93]. Measuring the productivity of an organization is another case that has qualitative dimensions (such as the level of customer satisfaction, the quality of products, or the extent to which an organization has the right group of staff [85]) which can be measured through some systematic approaches [85].

Soft benchmarks

A set of knowledge can be represented once using some tools and techniques, and then be used many times—ontologies are a practical case of this manner of reusability (see [94]). Thus, a community can construct a knowledge representation and use it as a standard and reusable asset. If this asset helps the community share their expertise, analyze their systems and solutions, and evaluate the behavior and other characteristics of their systems, it can then be considered as a soft benchmark. The “soft” part of the title indicates its knowledge-related nature.

Knowledge representations are not limited to ontologies [95]. Formal specifications such as logical formulations, description logics, semantic networks, and rule-based approaches are considered alternatives [96]. Logical models have a share in compliance checking approaches. For instance, logical modeling of regulations is a method for rule representation and checking automation [97]. Rules can also be used as a paradigm for knowledge representation [98].

A logical theory for a piece of knowledge has the three essential characteristics of a benchmark. (1) It can be considered the common infrastructure due to the reusable and defined nature of a formal specification. (2) The results of reasoning indicate an examination and evaluation of the studied system and provide a basis for comparison between alternative and competing systems. (3) Executing reasoning on a logical theory of a piece of knowledge is an execution of meanings and semantics behind that knowledge. Thinking and mental activities can result in a hypothetical situation. In KARB, the rule-based reasoning schema can be viewed as semantic rules a mimic of these natural procedures (the simulation of human auditing by automated and intelligent compliance audit tools would result in a proper need for the compliance industry [2, 66, 99]).

Object models can act semantic models [100], especially for compliance checking purposes [101, 102]. For instance, Fornax objects capture specific rule semantics for the compliance checking of building designs [101]. These object models include contexts, domains, and sometimes system specifications [100]. As an example, Regarding Fornax, the objects for hospital design semantics differ from those for airport designs [101].

Software patterns are another representation form or media of technical knowledge. Pattern-based solutions to compliance checking have been addressed by some studies [103]. Compliance patterns are a kind of knowledge-capturing tool for compliance assessment.

In KARB, knowledge is represented through the intuitionistic formal semantics known as the “semantic logic”. The semantic logic was created to capture the semantics and meanings of text [40]. It is used in KARB for knowledge representation. Any knowledge has its semantics and meaning [96, 104,105,106]. The knowledge itself is captured if its semantics and meanings are captured. Knowledge also has a specific structure [107]. Therefore, a meaning structure (or a semantic construct) could be a proper candidate for the manifestation of knowledge. Based on and adopted from [40], semantics and meanings are considered as constructions, namely lattices,Footnote 1 or systems of realities (= intuitions). Any well-defined formulation of knowledge represents and refers to a combination of entities, things, objects, events, affairs, facts, physics, concepts, cognitions, affections, or any other sorts of basic realities and intuitions. Therefore, knowledge can aggregately and abstractly be considered a combination and construction of basic realities and intuitions (with a glue of operators such as logic, structures, modalities, and any necessary ones). This manner of semantic definition is a constructive and intuitionistic one (Fig. 2).

Fig. 2
figure 2

An instance of a semantic logic rule and effects of its application on working memory. The rule specifies the existence of Risk Study obligation and Early Cancellation probability when a project occurs in a riskful beginning [40]

As an underlying philosophy in KARB, the reality and its meanings are composed of statics-related and dynamics-related meanings. Symbolic constructs capture the statics-related part of knowledge meanings, whereas the generative rules capture the dynamics-related part of knowledge meanings.

The KARB solution

In this section, we introduced our proposed solution. The results of a rule-based benchmarking in KARB differ from those of other compliance assessment approaches. Other approaches yield the results in the form of “yes or no”, “correct or incorrect”, etc.; however, a pool of quantities (i.e. derived and generated symbols) is considered as the results in KARB. Therefore, the overall state of working memory at the end of each benchmarking process indicates the evaluations of the studied compliance case, whereas a rigorous and reasoned evaluation with diverse dimensions and values would be achieved.

Every aspect of compliance concerns can be addressed with a separate rule-based benchmark. Every benchmark draws and adds a new simple line on the overall picture of compliance assessment scenes. A set of multiple, different, and diverse benchmarks can make an applied and realistic compliance assessment of a complex system. Relying on rule-based benchmarks, this experimental and applied approach to compliance assessment provides a new space for new sorts of innovative, creative, and diverse methods for compliance assessment.

Figure 3 illustrates a brief meta-model of KARB. The assessment of every compliance requirement is reified by a compliance benchmark, which in turn consists of some concrete rule-based benchmarks. Therefore, every compliance requirement declares meanings and semantics for a compliance benchmark that assesses it.

Fig. 3
figure 3

The meta-model of the KARB solution is depicted as a class diagram (UML Class Diagram is a popular tool for concept modeling and meta-modeling) of elements. There are three main dimensions for KARB elements: compliance-related (red), benchmarking-related (white), and semantics-related (green) concepts

A compliance symbol (CSYM) abstracts a compliance concept (CCON) in a similar sense of atom symbols in LISP, objects in object-oriented languages (e.g., Java), and JSON fragments in NoSQL DBs, all of which are units for compositional parts. Every compliance concern (CC) is in association with some compliance requirementsFootnote 2 (CR) which capture the notion and attitude of that concern. For instance, safety is a compliance concern which can be defined in a zoo as the following notion: the zoo animals must not be able to harm or threat the visitors (see Example 1).

Some compliance rules (CRUL) aggregately define the operational realization of a compliance requirement. Every CRUL defines a more rigorous, concrete, and special obligation than a CR. In KARB, the rules are considered to be finer than requirements. The overall shape of a requirement consists of the limiting lines of its constituting parts (= rules). Every CRUL has some CCONs in its definition. In the computational mechanisms of KARB, a CSYM abstracts a CCON. By using a glue of (logical, structural, modality, and any necessary) operators, a formal definition of a CRUL can be constructed from the CSYMs of its CCONs (see Example 1).

Example 1

The system under compliance

A zoo

CC1

Safety

CR1

The zoo animals must not be able to harm or threat the visitors

CRUL1

The cage fences must have proper specifications and conditions

CCONs

Cage, fence, proper specifications, proper conditions

CSYMs

CE, FE, PS, PC

Formal specifications of CRUL1

X [IS-A] FE(CE) \(\Rightarrow\) O[PS(X)] [AND] O[PC(X)]

In KARB, the manner of formal specification of a CRUL is based on the intuitionistic logic called the “semantic logic”. Technically, it can be viewed as an axiomatic system on symbols with Brouwer–Heyting–Kolmogorov interpretations for semantics [108]. Symbols are considered to be on the behalf of basic intuitions (concepts, entities, objects, things, events, values, quantities, qualities, etc.), whereas and the studied system is viewed as a complex construction of basic intuitions.

Formally, it would be sufficient to consider the semantic logic consisting of (1) a set of symbols (on the behalf of basic intuitions) and (2) a set of rules on them. Every rule describes a symbol generation action. When its left-side symbolic structure is ready in the working memory, the right-side symbolic structure is generated and pushed to the lattice of symbols in the working memory (see Fig. 2).

Case study: software quality evaluation

Since the 1990s, there have been various approaches to defining measurable quality such as quality function deployment, goal question metrics, and software quality metrics [109]. These methods seek to shape a general and common framework for quality measurement concerns. However, some quality factors are contextual and user-dependent [110]. For instance, some studies have measured the quality attributes of messenger apps and services from the user perspective (see [111,112,113,114]) or based on user behavior (see [115,116,117]).

Quality definitions can be seen as a hierarchical formal system of interrelated concepts [118] or attributes [119, 120]. This view helps create an explicitly defined conceptual construct for qualities, i.e. a concept-quantized definition of qualities. Hence, a quantification of qualities (which is a well-known but poorly-achieved goal for rigorous software engineering [121]) helps measure and perceive the true level of qualities in each application.

After definition, it is the time for operationalization. Every theoretical concept has its real instances on the ground. User feedback, comments, experiences, requests, requirements, desires, cognitions, and intuitions can help this grounding operationalization. Therefore, a good quality-definition theory needs a good quality-grounding theory.

There is a semantic gap between definition theories and grounding theories. The former has a neat nature, whereas the latter has a scruffy one. How is it possible to bridge the neat nature and the scruffy nature [96]? A glue model can come forward to resolve this challenge. This model must contain the main conceptual elements of both sides and try to relate them in a gradient conceptual spectrum. Since this is exactly the manner of the KARB solution, it can be used as a method of designing a software quality evaluation technique. It is also a kind of evaluation for KARB, for it demonstrates its usefulness for a real concern or problem in the software engineering community.

Example 2

The semantic logic is provided to model the semantics of this scenario.

Using SMS-Based Dynamic Passwords for E-Banking Transactions This logic contains certain rules and intuitions from four context theories, i.e. mobile apps, deontic predicate logic, security and system (see Fig. 4). The stateless model-checking of this scenario semantics (by symbolic-value generations) yields a “false” value; hence, there is a contradiction in the scenario. Figure 5 presents the explainable results in the proof construction lattice (= Why and how the overall result was obtained?). Although there are some unmentioned reasoning operations, they are omitted for the sake of simplicity in this preliminary example.

Fig. 4
figure 4

The involving semantic theories and the semantic logic for Example 2

Fig. 5
figure 5

The proof construct lattice for false value from the scenario semantics

The KARB manifestation for this compliance scenario:

The system under compliance

Using SMS-based dynamic passwords for e-banking transactions

CC

User data security

CR

Hackers cannot gain access to users' banking information during SMS exchange

CRUL

It is not possible to breach the bank data of users

CCONs

Breach, data

CSYMs

BR, DT

Formal specifications of CRUL

NOT(BR(DT))

DD-KARB

In order to boost the model pragmatics, a methodic extension of the principal model of KARB is considered. The Data-Driven KARB (DD-KARB) incorporates data-calculated weights (based on Big Data gathered from people) and values to parameterize the rules. An example of a parameterized rule is presented below:

Example 3

  • Rule:

    $${\text{alpha}}*\left( {{\text{A}} \Rightarrow {\text{B}}} \right) \Rightarrow {\text{beta}}*\left( {{\text{P(A)}} \Rightarrow {\text{P(B)}}} \right)$$
  • Interpretation:

    If the alpha instances of (A \(\Rightarrow\) B) are generated in the semantic solution to the system (or if the weight of (A \(\Rightarrow\) B) is equal to alpha), then the beta instances of (P(A) \(\Rightarrow\) P(B)) must be generated by applying this rule.

Based on expert scores, data examinations, data schemas, AI pre-trained models, and other sources of data-driven models, the semantics-based KARB models can be parameterized, annotated, and enriched with data-driven aspects. The full-fledged methods (by combining the data and semantics aspects) could be better than the solo methods. Each system context or domain of application has its own semantics and data. DD-KARB can be employed to record and adapt both data and semantics aspects. This manner of description or declaration can help define a hybrid semantic core for compliance checking and solving. A hybrid semantics can help approach some hard-to-check compliance requirements through automatic compliance-checking solutions. The goal and related dataset for evaluation are discussed in the next section.

Evaluation and discussion: IR-QUMA study

The case study is a popular evaluation method in software engineering research. Case studies are frequently used in papers to demonstrate the capabilities of new techniques and methods [122]. A case study was conducted in order to demonstrate and analyze the manner of KARB solution. The IR-QUMA study (Iranian Survey on Quality in Messenger Apps) was defined to evaluate the quality of some messenger applications. It consists of these stages:

  1. 1.

    Selecting messenger applications The selected applications were Telegram, WhatsApp, Eita, Soroush, Bale, and some other popular mobile messengers in the Iranian cyberspace. They were selected for the IR-QUMA case study due to the access to a large community of their users.

  2. 2.

    Collecting data An online questionnaire was designed to collect the opinions of users and trace the specifications of user experiences. The seven main questions concerned “absolute quality”, “relative quality”, “user satisfaction”, “error-freeness”, “perceived UI complexity”, “rationality of routines”, and “accordance and usability”. The answer to each question ranged between 1 and 5 to represens choices from “very weak” to “excellent”. Figure 6 shows the running-average series of user responses (for a portion of dataset).

  3. 3.

    Using the KARB solution

    1. a.

      Elicitation of involving semantic theories

    2. b.

      Specification of involving semantic theories The KARB-based specifications were developed for each of the involving semantic theories. Figure 7 presents a detailed map of the involving semantic theories. The emphasis was given to these theories:

      1. i.

        KARB-based specification of messenger apps

      2. ii.

        KARB-based specification of some quality terms

      3. iii.

        KARB-based specification of user behavior

      4. iv.

        KARB-based specification of some pieces of HCI knowledge

      5. v.

        KARB-based specification of risks and threats

      6. vi.

        KARB-based specification of software platform and mechanisms

      7. vii.

        KARB-based specification of cognitive aspects

      8. viii.

        KARB-based specification of social aspects

    3. c.

      Computation and model checking The KARB solution was employed to compute some of the compliance anomalies.

  4. 4.

    Evaluating the results The results were compared in three aspects: expert judgments, IT reports, and user opinions.

Fig. 6
figure 6

The running-average series of responses to 7 different questions in the questionnaire

Fig. 7
figure 7

A detailed map of involving semantic theories in the IR-QUMA study

The IR-QUMA study details will be published in a separate report. However, this paper we used the collected data and the semantic model to conduct some experiments on deferent quality benchmarks, especially the KARB solution.

IR-QUMA data collection

A questionnaire was designed to evaluate some quality-related measures, metrics, and features from the user experience perspective. The questionnaire was published in the popular channels of Iranian mobile social networks on 10 different messengers (i.e. Telegram, WhatsApp, Instagram, Eita, Soroush, Bale, Gap, iGap, Shaad, and Rubika). More than 40 communities of users on these 10 messengers (which are shaping more than 350 micro-communities based on visiting hours and spatial partitions) contributed to this research questionnaire. The collected data exceeded 7k completed online forms (from more than 7k distinct participants). In the research dataset [123], for the sake of data privacy and protection reasons, the names of these messengers were hashed randomly by assigning the ID codes from M1 to M10.

Different sets of statistical analysis, time series analysis, frequency analysis, cluster analysis, classification analysis, geometry locus of data points, and topological data analysis were based on user opinion data to obtain useful insights. As an analysis example, the data were sorted in a temporal order (which conserved the segregation of micro-communities). A running average method was then adopted (with a window-size = 20). Therefore, 7k data points were obtained from different segments of those 350 micro-communities. Every micro-community with its segment-average had its own footprint in the total space of data-points. Moreover, every messenger app had its own footprint in the total space of data-points.

Figure 8 depicts a correlation locus analysis for two of quality measures for seven different messenger applications in 7k data segments of 350 micro-communities. Every point is in accordance with the measure values obtained from one segment of a micro-community. The blue points refer to the mentioned messengers, whereas the red points indicate the entire data space (for all studied messengers). Every axis demonstrates a 5-level measure value (obtained by averaging user opinions in one data segment).

Fig. 8
figure 8

The correlation between “absolute quality” and “relative quality” from the user point of view for some messenger applications based on the IR-QUMA data. Every point represents average values of one data segment

Figure 9 demonstrates the analysis for two other measures, i.e. correctness vs. quality. Correctness means the error freeness and bug freeness of the messenger application. The results indicate that there is a buffer between “correctness increase/decrease” and “overall quality increase/decrease”. This means that the other factors (rather than correctness) can play a key role in the overall quality of software.

Fig. 9
figure 9

The correlation between “correctness” and “absolute quality” from the user point of view for some messenger applications based on the IR-QUMA Data. Every point represents average values of one data segment

Figure 10 illustrates the histogram of score instances of user quality judgments for 10 applications. The topological analysis of these 10 curves indicates six different curve clusters based on change trends during five levels.

Fig. 10
figure 10

The histogram analysis of quality score levels (obtained from user judgments) about “absolute quality” for 10 different messenger applications. Every messenger application has five data points for the numeration of 1—very poor, 2—poor, 3—moderate, 4—good and 5—excellent quality scores. These scores reflect the user experience point of view to the quality of messengers

Evaluation of the method

The KARB solution is based on the semantic logic specified in previous research steps and its performance to correctly calculate (or mimic) the user opinions (evaluated in terms of error percentage of benchmark-computed quality scores and user opinions about quality scores). The following experimentation setting was considered: four different methods for quality benchmarking and five different experiments (for five different messengers).

Every user opinion record involves two sections: (1) the user opinion about quality score, which was called the absolute quality score, and (2) the quality context. The quality context includes the factors that can affect or relate to user opinion about quality scores. Age, gender, and other data were gathered from the users stating their experiences with the messengers (the scores also included bug-freeness and error-freeness, perceived UI complexity, rationality of routines, score of usability, etc.). The value options for all scores in the questionnaires were defined in a 5-point Likert scale [124] (the Likert scale has been used in various domains of software engineering such as [125]). Figure 11 shows the structure of a user opinion record.

Fig. 11
figure 11

The structure of a user opinion record

Definition 1

Ni = number of user opinion records about Appi

Definition 2

$$\begin{aligned} & Error\_Percentage\left( {App_{i} ,\;Method_{j} } \right) \\ & \quad = \frac{{\sum\nolimits_{k = 1}^{{N_{i} }} {\left| {Benchmark\_Computed\_Quality\left( {App_{i} ,\;Method_{j} ,\;Context_{k} } \right) - User\_Quality\_Opinion_{k} } \right|} }}{{N_{i} }} \\ \end{aligned}$$

The KARB manifestation for this compliance situation:

The system under compliance

Messenger apps

CC

Quality form users’ viewpoint

CR

Messenger apps must be in the proper quality level for their users

CRUL

Messenger apps must get the proper grade-points from the user-quality-viewpoint benchmark

CCONs

App, Messenger, Proper user-quality-grade-point

CSYMs

APP, MSR, PUG

Formal specifications of CRUL

X [IS-A] APP(MSR) \(\Rightarrow\) O[PUG(X)]

Table 1 reports the evaluation results (in the above-mentioned experimentation setting). Accordingly, the data-driven KARB method reduced the error percentage significantly. Figure 12 shows the error reduction curves (for five experiments). The average curve for these five curves indicate a pseudo-Sigmoid form. In other words, the hybrid DD-KARB method (with combination of semantics-awareness and data-drivenness) is more effective than solo methods and can compute a good estimation for messenger application user quality scores. Therefore, DD-KARB can be considered a method for quality benchmarking in this technical context.

Table 1 Evaluation results (for Experimentation-Plan-ID-1)
Fig. 12
figure 12

Error reduction curves for five different experiments and their average

Discussion and conclusion

The first benchmark uses the simple average of the IR-QUMA context data (Fig. 11) as an estimator for quality scores, whereas the second benchmark is based on expert judgments about the quality of messengers. Moreover, the third benchmark is based on an initially voided-filled DD-KARB rules that obtain their weight values from a hill-climbing optimization algorithm for reaching a local minimum of error. These rules compute an estimate for the user-quality-grade-point of the messenger app, based on context data of each opinion record (i.e. an estimate for absolute-quality-score from the other fields of Fig. 11. For further details please review the associated java code in Additional file 1). The last benchmark, but the first in precision, is based on the DD-KARB rules that obtain their weight values from two sources: (1) IR-QUMA data values and (2) a lightweight state-space-checking procedure for finding good fitting parameters for the IR-QUMA dataset and the DD-KARB ruleset. Figure 13 depicts some results of this fitting procedure.

Fig. 13
figure 13

The optimization performance of the fitting-solving-procedure in the execution of the DD-KARB benchmark in Experiment-1 (Telegram). Every point represents the error of one state in the total state space. The fitting-solving-procedure outperformed the random baseline and found a state near the exhaustive minima

Therefore, the last benchmark incorporates these method features in a hybrid manner including semantics-awareness (by KARB), data-drivenness (by the DD part of DD-KARB and IR-QUMA data), and the fitting solution (by finding fitting parameters with a lightweight state-space-checking procedure).

The intended semantic landscape of this problem (i.e. quality measurement of messenger apps) involves more than 10 semantic theories. Without the semantic framework which the KARB solution provides, it would be impossible to focus on the most relevant parts of the wide semantic landscape of this complex problem. Without using the KARB rules which act as a kind of declarative dimensions in this problem, data-driven solving procedures and verification and model-checking methods would be unable to escape from the “state space explosion” [126] in this landscape.

However, the procedures and methods escaped from the “state space explosion” with the help of KARB in DD-KARB. A 3-min process on a conventional PC (Windows + Java + Intel Core i7 Processor) was successfully able to solve a good fitting 10K-order complexity space (IR-QUMA) to a 10G-order value-space of weight values in the DD-KARB rules of the experiment.

It is concluded that the hybrid nature of the DD-KARB method (in the KARB Solution) can help solve some complex compliance problems in a lightweight manner and yield good results (in terms of a low-error compliance-level quality estimator).