1 Introduction

Qualitative comparative analysis (QCA) is a well-known data analysis method based on set theory (e.g., Ragin 2000, Schneider and Wagemann 2012, Thomann and Maggetti 2020). In QCA, data points are transformed into either fuzzy or crisp sets, representing the membership status of a case within a condition or an outcome. During QCA, set-conditions and set-outcomes are organized into truth tables and minimized to produce the solution formulas (e.g., Duşa 2018, Duşa and Thiem 2015). It is also important to highlight that recent developments in configurational comparative analysis bottom-up aggregative strategies to isolate solution formulas are employed, as in Coincidence Analysis (CNA–e.g., Baumgartner and Ambühl 2020) or through eQCA or CCcubes algorithms (Duşa and Thiem 2015). While the underline mechanisms are different, both the standard truth table minimization and the bottom-up algorithmic aggregation, reach the same goal of identifying subset relationships. More technically, when the set-theoretic membership of a condition Xi (or conjunction of conditions \({X}_{i\cap \mathrm{k}}\)) is less than the set-theoretic values of an outcome Yi, we obtain an indication of a sufficient relationship between conditions and outcome. Whenever the set-theoretic value of a condition Xi (or a disjunction of conditions \({X}_{i\cup \mathrm{k}}\)) is larger than the outcome Yi, we obtain an indication of a necessary relationship between conditions and outcome.

Relationships of sufficiency become more established only when simultaneous subset relationships (from now on SSR) are ruled out. SSRs usually appear when a condition Xi (or conjunction of conditions \({X}_{i\cap \mathrm{n}}\)) is a subset of an outcome Yi, and a non-outcome ~ Yi while the solution also exhibits contradictory cases that are identical in Xi but differ in the outcome Y.

As it has been widely pointed out in the configurational comparative methods literature, SSRs should be ruled out as they constitute causally non-sensical solution formulas (e.g., Mello 2021). To provide a heuristic example, it would be like saying that a given virus is the source of a given disease and simultaneously the cause of its absence. Concerned by the SSR’s issue, Schneider and Wagemann (2012) introduced the proportion reduction inconsistency formula (PRI) parameter, which in conjunction with the analysis of consistency shows whether a solution X is a subset of the outcome Y and ~ Y.

QCA literature commonly associates the concept of SSR with the presence of contradictory cases. However, from a strict set-theoretic perspective, SSR arises whenever a condition Xi is both a subset of an outcome Yi, and simultaneously a subset of a non-outcome ~ Yi, even in the absence of contradictory cases. This type of relationship, which can be referred to as an Ambivalent Subset Relationship (ASR) to distinguish it from the QCA standard definition of SSR, is usually considered as irrelevant sufficiency by QCA literature. However, as further argued here, it can pose a potential serious issue if these patterns arise from spurious relationships (e.g., Veri 2019; Braumoeller 2017). Yet as discussed below, trivialness in ASR occurs when the regularity pattern isolated by fsQCA marginally includes cases in which solution formula Xi is solely a subset with the outcome Yi and massively includes cases in which membership in the solution formula Xi is in a subset with the outcome Yi and ~ Yi. This suggests a potential misspecification in the research design, such as the inclusion of narrowed-defined conditions, or the inclusion of spurious conditions that lack causal relationships with the outcome of interest. It may also stem from miscalibration of conditions, wherein the understanding of membership or non-membership status is either too narrow or too broad.

While PRI can capture possible SSRs, its reliability in detecting ASR remains a subject of significant criticism. In this respect, Veri (2019) observed that PRI values are de facto inefficient in detecting ASR, given that the cases that are in ambivalent subsethood are not accounted for in the PRI’s formula itself. As discussed in this manuscript this criticism becomes relevant when fsQCA is applied to large-N cases in which the number of ambivalent cases largely outnumbers the number of consistent cases. This is because, in large-N case studies, a large concentration of ambivalent cases might indicate a spurious or incomplete relationship, as the regularities observed across conditions may be primarily attributed to the presence of ambivalent cases rather than typical cases. Consequently, there are circumstances in which PRI values are considered satisfactory despite the presence of ASRs.

As argued in this article, ASR produces specific data patterns recognizable by a regular distribution of the datapoints along the Y-axis of the XY plot. This pattern becomes clear mainly when fsQCA is applied to large-N cases in which data-pattern can be detected. As with any data pattern, also ASR can be detected, in this case through a test of data-pattern differences which focuses on the analysis of the empirical cumulative distribution function (eCDF) of the datapoints of a solution formula and compare it with the eCDF of a randomly generated solution term expressly generated to produce ASR.

Considering this context, this article introduces the two-sample ASR test based on the so-called DTS statistics (Dowd 2020). The test captures the similarity between the eCDF of an observed solution term E and a simulated solution F which has been artificially generated to be asymptotically in ASR. This method allows us to compare two independent samples eCDF, considering the weighted integral of their differences. In simpler terms, the test proposed here compares whether the distribution pattern of a solution term's data matches the characteristics of a randomly generated solution term designed to exhibit the same data patterns of an ASR. As in any other statistical test, it is assumed that the null hypothesis considers that the solution E is equal to solution F, or that there is no significant difference between an asymptotical ASR solution and an observed solution. If this were not the case, we would have sufficient evidence supporting the credibility of the alternative hypothesis that the solution terms are not in ASR. In addition similar to other statistical tests, the power of the test increases with the number of observations. Therefore, it is most suitable for application in large-N samples, although the bootstrapping procedure already provides a certain level of robustness in medium-N samples. Then such evidence should be complemented with other existing analytical tools, such as other robustness checks (e.g., Eliason and Stryker 2009; Veri 2019), to establish the robustness of the sufficiency relationship, or checked considering set multi-method research approaches as suggested by Schneider and Rohlfing (2013), to rule out possible spurious subset relationship.

Considering these concerns, we build the paper as follows. First, we describe the core concepts for defining ASR, then we provide an overview of the limitations of PRI in detecting ASR in large-N samples. This would allow us to explicit the null hypothesis and introduce the test that allows one to check whether an empirical solution term converges into ASR. An example will follow the technical explanation of the proposed test, while the supplementary material it is possible to find the R coding to run the test itself.

2 Core concepts

2.1 Sufficiency and necessity

QCA's core focus is to identify sufficient and necessary relationships. A causal factor is usually called a necessary condition when the outcome cannot occur without such a condition. This also signifies that the presence of the condition does not guarantee the presence of the outcome. In contrast, when the presence of a condition always leads to the occurrence of an outcome, we observe a relationship of sufficiency. In this case, an outcome can appear when the condition is absent (e.g., Blatter and Haverland 2012).

The presence and absence of conditions and outcomes are represented in QCA through set theoretic logic. In this case, when a condition is a superset of the outcome (i.e., the outcome always appears in presence of a condition) we observe a relationship of necessity. In contrast, when a condition is a subset of an outcome (i.e., a condition always appears in presence of the outcome) we observe a relationship of sufficiency.

From a technical perspective, sufficiency and necessity can be defined as sets with a superset/subset membership function. Indeed, considering the membership function μA(x) that defines X’s value in the real unit interval for any set. The relationship of sufficiency is indicated by the relationship of subsethood between a condition \({X}_{i}\)(or conjunction of conditions \({X}_{i\cap \mathrm{n}}\)) and the set of an outcome \({Y}_{i}\) as shown below:

$$X_{i} \subseteq Y_{i}$$
(1)
$$X_{{i \cap {\text{k}}}} \subseteq Y_{i}$$
(2)

The relationship (2) considers that the condition \({X}_{i\cap \mathrm{k}}\) is a conjunction of one or more conditions following \({X}_{i\cap \mathrm{k}}:{ X}_{i}\cap {X}_{k}\). Similarly, a relationship of necessity is indicated by the relationship of supersethood between a condition \({X}_{i}\) (or following the DeMorgan law a disjunction of condition \({X}_{i\cup \mathrm{k}}\)) and an outcome \({Y}_{i}\) as follows:

$$X_{i} \supseteq Y_{i}$$
(3)
$$X_{{i \cup {\text{k}}}} \supseteq Y_{i}$$
(4)

Furthermore, in this case, the relationship (4) considers that the condition \({X}_{i\cup \mathrm{k}}\) is a disjunction of one or more conditions following \({X}_{i\cup \mathrm{k}}:{ X}_{i}\cup {X}_{k}\).

2.2 Cases membership and definition

Schneider and Rohlfing (2013) introduced the idea of an enhanced plot for sufficiency in which each case is defined accordingly to six zone memberships (Fig. 1).

Fig. 1
figure 1

Enhanced plot for sufficiency

While typical cases (cell 1) confirm the relationship of sufficiency by displaying a solution formula that is in a subset relationship with an outcome; deviant cases consistency in kind (cell 3) disproved such relationship as their scores in the solution formula are bigger than the outcome. Finally, while considered as in subset relationship with the outcome, the deviant cases for coverage (cell 6) and individually irrelevant cases (cell 5) are neither disproving nor proving a relationship of sufficiency as the former display an outcome but not a solution formula, while the latter displays neither an outcome nor a solution formula and therefore are logical or empirical irrelevant for the relationship of sufficiency.

2.3 Ambivalent cases

According to fsQCA literature, SSR occurs when a configuration qualifies as consistent with both an outcome and its complement due to harbouring deviant cases consistency in kind. Schneider and Wagemann (2012) in this respect stated that with SSR at least one relation must contain at least one true logically contradictory case in a way that consistency scores for a given solution formula in the outcome Y and ~ Y are high and remarkably similar.

However, while this is empirically correct, it omits the pure set-theoretic aspect of a simultaneous subset relationship, which refers to the presence of what we define here as ambivalent cases.

In mere set-theoretic terms, ambivalent cases are cases in which the degree of membership in the solution formula is smaller to be in subset relation with the outcome and the non-outcome. As highlighted by Fig. 2, ambivalent cases fall into the area of individually irrelevant cases and partially cover the area of deviant cases for coverage and are usually dismissed by the literature as irrelevant to the solution formula.

Fig. 2
figure 2

Ambivalent cases within XY plot

In this respect, for example, we can think of an outcome score of 0.2 (case B, Fig. 2) in which a condition score is a subset when the value is < 0.2 (e.g., 0.1), such value will be in SSR as it is smaller than the outcome 0.2 and the non-outcome 0.8. A similar relationship might also appear when the outcome is tendentially high, as in case A of Fig. 2 where we have an X value of 0.1, which is simultaneously a subset with an outcome of 0.8 and a non-outcome of 0.2.

In more formal terms, ambivalent cases are cases in which scores in the condition X (or a set of conditions \({X}_{i\cap \mathrm{n}}\)) is simultaneously a subset of the outcome \({Y}_{i}\) and the non-outcome \({\sim Y}_{i}\) as follows:

$$X_{i \cap k} \subseteq \, Y_{i}$$
$$X_{{i \cap k}} \subseteq Y_{i} \wedge X_{{i \cap k}} \subseteq \sim Y_{i}$$
(5)

Each data point (case) that has the characteristics of Eq. (5) is considered an ambivalent case. While the case with the following exclusive relationship \({X}_{i\cap \mathrm{k}}\subseteq {\sim Y}_{i}\) is considered a counterexample (or a deviant case in consistency in kind or degree) and cases that are exclusively defined as \({X}_{i\cap \mathrm{k}}\subseteq {Y}_{i}\) are considered non-ambivalent (or typical cases), ambivalent cases are cases which score in X is always simultaneously a subset of Y and ~ Y.

2.4 Implications of ambivalent subset relationship in QCA

Cases with ambivalent outcomes are often overlooked in QCA literature because they are deemed empirically or logically irrelevant to the presence or absence of causal sufficiency. This perspective is coherent within a case-based QCA approach in which causal inferences are drawn on the foundation of cases (e.g., Thomann and Maggetti 2020) and therefore the researcher can isolate and furtherly make sense of the meaning of the solution formula in the light of a deep knowledge of each case of the sample, including the ambivalent ones. As consequence, as QCA is originally conceived and applied, to small- and medium-N cases, such instances are often dismissed as simple cases without a significant outcome.

Nevertheless, as discussed above, from a pure set-theoretic perspective, cases with ambivalent outcomes are the only true cases that are simultaneously a subset of the outcome Y and the non-outcome ~ Y, as their score in X is in subset relationship with both Y and ~ Y.

The potential issue of the presence of ASR arises considering two instances: (1) when research design misspecification may result in having an absolute majority of ambivalent cases covered by the solution formula, (2) when the researcher does not apply a case-based approach to infer on causal relationships and therefore s/he does not rely on substantive knowledge on cases.

  1. 1.

    Research design misspecification refers to situations where the design of a research study does not adequately account for all the relevant conditions that may influence the outcome. This can occur due to inadequate selection of conditions, or measurement error (e.g., Schwab 2013). In the context of QCA, research design misspecification may occur if spurious or irrelevant conditions are included in the analysis, if conditions do not accurately reflect the true sufficient relationship, or if conditions are too narrowly defined, leading to an incomplete understanding of the causal relationship. Measurement error in QCA refers to the miscalibration of conditions, where membership and non-membership sets are either too narrowly or too broadly defined, which can lead to incorrect conclusions about the relationship between conditions and outcomes. From a technical perspective, regularities identified by QCA might potentially be considered spurious in the light of a regular pattern that is mainly identifiable as an ambivalent case pattern as displayed in Fig. 2. Yet, this pragmatically means that an entire solution formula is valid for only a handful of cases while the majority remain unexplained. In this case, there is a possibility of encountering model misspecification. To elucidate the spurious relationship, let us consider a hypothetical analysis aimed at identifying the conjunction of conditions under which a fire happens. While there are various factors to consider, such as the presence of a heat source and inflammable material, our hypothetical unfocused researcher also took into account the colour of the material. This is because dark materials are characterized by absorptivity property, meaning they have the ability to absorb more heat due to their lower reflectivity of light. Although the colour of the material is not determinant of the outcome of catching fire (F), the mis-specified research design leads to a solution formula that highlights that dark colour (D) is an insufficient but necessary part of a condition of a sufficient condition to produce fire (F), in presence of electric short circuits (S) and inflammable material (I).

    $$D*S*I \to F$$
    (6)

    After Boolean aggregation, we also know that materials that are not dark (~D), inflammable (I) and located nearby a short circuit (S) will also catch fire and therefore will display the outcome F. However, such a range of cases will be scored non-membership status in the solution formula (6) as their solution formula is ~D*S*I. In the same fashion, a dark (D) or not dark material (~D) that is not inflammable (~I) or is not nearby to a short circuity (~S) will not result in a fire (~F).

Essentially as displayed in Fig. 3, an INUS condition with the addition of a spurious part will result in a small number of cases being covered as typical cases and a larger number of cases that are in an ambivalent case position.

Fig. 3
figure 3

Spurious solution formula

A second scenario, more likely to appear in empirical research is when the researcher considers a narrow understanding of conditions. Using the same example of fire mentioned above, a narrow condition would be to take instead of a broad and general understanding of ‘inflammable material’, a specific material, such as fabric, paper, and so forth. If for example, the distracted researcher will focus his attention only on wallpapers, he will observe that short circuits and wallpapers are a sufficient INUS condition for a fire to happen. Within this example, we will observe the same pattern described in Fig. 3, where only a few observations will fall into the typical case area and the majority will be located in the ambivalent case area. Finally, the partial correctness of the solution formula may also stem from model misspecification due to the miscalibration of one or more conditions. For instance, when analysing the flammability of materials, if we establish the 0.5 calibration threshold from the material that is ‘extremely inflammable’, we would consequently narrow down the range of materials that are considered ‘very inflammable’ or ‘quite inflammable’ to a much smaller set, which would produce a larger number of cases deviant in coverage as they result to display an outcome, but not displaying membership on condition ‘inflammable’.

  1. 2.

    While QCA can be applied to large-N cases (e.g., Greckhamer et al. 2013), it has been originally conceived as a case-based method applicable to small or medium-N cases. While the risk of research design misspecification is lower in cases with small and medium-N sizes due to the in-depth knowledge of the researcher over each individual case, it cannot be ruled out that such misspecification may arise in the presence of large-N samples where case knowledge is not a prerequisite for analysis. In this context, the researchers mainly rely on regularities that can be isolated by the QCA algorithmic process and have little knowledge of each specific case. By having a higher concentration of ambivalent cases along the Y-axis plot, the solution formula indicates the presence of a suspect underlying regularities in which the few cases resulting in the typical cases area might be there more by chance than by a causal link.

    In essence, if a pattern of aggregation of cases within the ambivalent case zone emerges, the researchers should at least raise some doubt about the possibility of having stumbled upon an incomplete or incorrect relationship. As the regularities pattern identified by the QCA minimization algorithm is not linked to typical cases but mainly linked to ambivalent cases.

To provide a graphical representation of spurious or mis-specified set-theoretic relationship, in Fig. 4, we simulated an ASR solution term, which has been created considering 100 randomly generated data conditions and outcomes with values comprise between 0 and 1 with uniform distribution characteristics. The dataset has then been minimized by following the QCA procedure considering truth table analysis and including only rows with a minimal score of PRI > 0.8 and consistency > 0.9 (which are well above the suggested values of 0.7—Greckhamer et al. 2018).

Fig. 4
figure 4

Random solution 1 XY plot and respective Kernel Density plot

While the naked eye may mistakenly think that the left XY plot in Fig. 4 is not in ASR; a kernel density plot (right of Fig. 4) visually seems to support the presence of an ASR relationship by highlighting strong clustering patterns along the Y-axis where the ambivalent cases are located.

3 Existing parameters to determine simultaneous subset relationships.

Despite the rich literature on assessing QCA solution formulas’ causal relationship (e.g., Braumoeller 2015, 2017, Eliason and Stryker 2009, Veri 2018, 2019), most of such literature is silent on the idea of ASR itself.

Braumoeller (2015, 2017) for example introduces two tests: one focuses only on the presence of counterexamples, and the second on levels of consistency value. By using Ragin’s consistency values (Ragin 2006) and considering the presence of counterexamples, the tests calculate various test statistic values by rearranging the observed data points in all possible ways. While these tests may potentially uncover SSR relationships, they do not guarantee the detection of ASR relationships. As discussed previously, ASR is not contingent on the presence of counterexamples and is not the primary focus of Ragin’s consistency parameter.

Similarly, Eliason and Stryker (2009) focus on the concept of fuzzy distance between datapoints and the entire diagonal line of the XY plot. However, their attention is specifically drawn to the values in the solution formula that closely approach the outcome score, either slightly above or below it. This parameter is used as a goodness-of-fit test, with an emphasis on detecting potential inconsistencies in the values, rather than verifying SSR or ASR relationships.

Veri’s (2018, 2019) parameters of fits are designed to uncover robust solutions that might ultimately be subject to false-negative outcomes. However, while the new parameter of consistency can reveal cases that do not adhere to ASR, it does not necessarily indicate the presence of ASR.

As it is possible to observe, none of the methodological propositions focuses on ASR distributional assumptions or accounts for the unbalanced presence of ambivalent cases. This is also the case for the PRI formula, displayed here below, which, in combination with the parameter of consistency, is specifically designed to detect SSR rather than ASR:

$$PRI = \frac{{\sum {\min (X_{{i \cap k}} ,Y_{i} )} - \sum {\min (X_{{i \cap k}} ,Y_{i} \sim Y_{i} )} }}{{\sum {X_{{i \cap k}} } - \sum {\min (X_{{i \cap k}} ,Y_{i} \sim Y_{i} )} }}$$
(7)

As already noted by Veri (2019) or Schwelnuss (2013), in the PRI formula, only counterexamples negatively weigh the final PRI score by deflating the size of the nominator through the \(\sum \mathrm{min}({X}_{i\cap \mathrm{k}},{Y}_{i},\sim {Y}_{i})\). In contrast, ambivalent cases do not contribute to the size of the nominator at all, and they do not appear in any part of the PRI formula. As a consequence, it would be possible to have solution terms with high PRI and being in ASR, despite the PRI of the non-outcome being very low.

The different weights that PRI gives to different data points can be visually observed in Fig. 5 where the PRI formula has been plugged in to evaluate the relative weight of each data point to the PRI's nominator.

Fig. 5
figure 5

PRI calculation

Yet, it is possible to observe ambivalent cases count 0, while the relative negative weight of counterexamples depends on the distance of each counterexample to the diagonal line. Thus, despite the massive presence of ambivalent cases, the PRI value only dramatically drops in the presence of counterexamples (left plot of Fig. 5) and raises the score when such counterexamples are absent (right plot of Fig. 5).

As already pointed out above, while this is empirically correct, PRI can be employed to identify SSR but not ASR.

4 Ambivalent subset relationship in QCA: the null hypothesis

QCA is a method that is either associated with thick interpretive stances or employed in a more empiricist fashion considering the regularities theory of causation. What differentiates these two methodological stances is a lack of agreement on how to treat and consider cross-case regularities revealed by consistent set-theoretic relationships. Interpretive scholars, such as Rutten (2022), suggest that truth tables mainly model possibilistic uncertainty which should be subject to substantial interpretation. In contrast, this vision is challenged by other scholars (e.g., Baumgartner 2015) who point out that QCA can track causal regularities by considering causes as Boolean difference-makers of their effects (Veri and Barrowman 2022). Essentially, while interpretive scholars consider regularities only as an indicative feature and not as evidence of a causal relationship (e.g., Rutten 2022); empiricist scholars believe that QCA is a method that allows one to scrutinize causal dependency (Baumgartner 2015).

Without formally participating in this debate, it is possible to individuate at least two points of certainty concerning QCA that should bring together interpretive and empiricist scholars.

The first point of certainty is that QCA is a method that identifies solution formulas through regularities stances. Then the researcher can either interpret solution formulas as a manifestation of causation or actual causation. Yet, it is an algorithmic process that isolates dataset cross-case patterns.

The second point of certainty is that some of the cross-case patterns isolated during truth table minimization can potentially be either in SSR or ASR. While for a pure formal algebraic perspective, SSR constitutes a set relationship, from a causal perspective SSRs are neither empirically nor critically interpretable as they constitute logical paradoxes. Similarly, while ASR can empirically be interpreted, they can hide insidious traps as they might be the product of spurious or incomplete INUS conditions as discussed above. Consequently, truth tables, through either SSR or ASR, can produce cross-case patterns that are spurious, given that they are linked to an algorithmic minimization process and a specific skewed data shape resulting from research design misspecification.

These two points of agreement are also a source of at least two important implications.

  1. a.

    First, the possibility of the occurrence of ASR is real.

  2. b.

    Second, we can acknowledge that ASR is a specific recognizable data pattern and like any other data pattern it can also be algorithmically simulated and isolated.

In simpler words, given that truth-table analysis is agnostic about causality and finds cross-case regularities even in a random dataset (e.g., Rutten 2022), it is crucial to identify the ASR pattern that is produced by the minimization algorithm alone. This is particularly true when a purely inductive research design is implemented and empirical instances, rather than theoretical considerations, are used to isolate data regularities. This can result in a narrowly empirically defined calibration that is poorly suited to a large-N dataset being analyzed.

In more technical terms, ASR identification can be operated considering a comparative test between a selected simulated ASR dataset and the empirical results of QCA. The test would allow one to relate and compare a simulated ASR solution F with empirical results E. Given ambivalent cases are distributed along the Y-axis of the XY plot, simulated ASR solution term F can be generated by using a uniform distribution function (based on expression 8) considering asymptotic ASR (with thousands of data). Successively, the asymptotic ASR is used to test whether the empirical solution term E exhibits the same uniform distributional characteristics of a simulated ASR solution F. Through this procedure we will have the certainty that we will compare an ASR solution formula within which data are massively skewed toward ambivalent set-theoretic zone, with an empirical solution formula which level of skewness results to be uncertain due to a relatively smaller number datapoints.

It is essentially a matter of comparing the empirical study’s data pattern with a certain spurious data pattern. More straightforward, ambivalent subsethood is an issue of data clustering pattern within the XY plot, which can be detected considering a data cluster shape dependency analysis.

More technically, data cluster shape dependency analysis is possible through the analysis of CDFs equality (e.g., Irizarry 2019). The CDFs equality approach would allow us to artificially simulate asymptotic spurious ASR patterns which are derived through the minimization of randomly generated uniform conditions and outcomes. The simulated ASR has the same INUS characteristics (same number of conjunctions of conditions and outcome) as the observed solution's formula. This becomes the referent against which the eCDF of the empirical solution formula is checked. In this respect, we can test whether the eCDF of the empirical solution is equal to the CDF of the simulated asymptotic ASR dataset (Remilard and Scaillet 2009).

Considering the CDFs equality approach, we can express the null hypothesis as follow:

H 0

There are no differences between the data shape of a randomly generated asymptotic ASR solution formula and an empirically observed solution formula.

As in any test of difference, the null hypothesis aims to test whether QCA solution terms converge into an ASR data shape.

More formally, ambivalent cases’ cardinal size can be isolated in the solution formula through the α-cut value of a solution formula. The α-cut value is the cardinal value that defined the size of a solution formula that is below and above the value of the outcome Y (Stoklasa et al. 2017; Veri 2019). This value can be defined as follow:

$$SUPP^{\alpha } \left( {X_{i \cap n} \subseteq Y_{i} } \right) = \frac{{{\mid }X_{i \cap n} [Y_{i} ]_{\alpha }^{F} {\mid }}}{{{\mid } X_{i \cap n} {\mid }}}$$
(8)
$$DISP^{\alpha } \left( {X_{{i \cap n}} \subseteq Y_{i} } \right) = \frac{{{\mid }X_{{i \cap n}} [ \sim Y_{i} ]_{\alpha }^{F} {\mid }}}{{{\mid }X_{{i \cap n}} {\mid }}}$$
(9)

The α-cut level allows one to identify the size of a specific α-level in a solution \({X}_{i\cap \mathrm{n}}\) within the outcome \({Y}_{i}\) and the non-outcome \({\sim Y}_{i}\). The \({SUPP}^{\alpha }\) and \({DISP}^{\alpha }\) are defined by Stoklasa et al. (2017) as respectively the degrees of support and disproof of the solution formula \({X}_{i\cap \mathrm{n}}\) as a subset of \({Y}_{i}\). Thus, an α-cut level corresponds to the size of a fuzzy set below the imaginary α-cut. For example, \({SUPP}^{0.5-1}\) expresses the size of X that is above the fuzzy score of 0.5 in Y. Similarly, a \({DISP}^{0.5-1}\) expresses the size of X that is below the fuzzy score of 0.5 in Y.

In a context where ambiguous cases prevail, we would expect to see a pattern of equivalence between the cardinal value of α-cut levels in \({SUPP}^{05-1}\) and \({DISP}^{05-1}\). However, to evaluate the distributional equivalence of cases, it is essential to consider a parameter beyond the α-cut level, as the α-cut level does not provide information regarding the distributional properties of the values Xi in Yi, nor whether such values are in the ambivalent case area.

Mathematically speaking, this would be possible by analysing the CDF of an ambivalent solution formula, wherein the distributional patterns are identifiable by a uniform distribution, along the different values of the α-cut levels.

$$F_{{X_{R} }} \left( x \right) = \Pr \left( {X_{R} \le x} \right) = \left\{ {\begin{array}{*{20}l} {0 \quad if\; x \le 0} \\ {x \quad if\; 0 < x \le 1} \\ {1 \quad if x > 1} \\ \end{array} } \right.$$
(10)

As pointed out by the literature on fuzzy logic, the boundary curves of the α-cuts of a fuzzy number X can be related to its eCDF (e.g., Hesamian Chachi 2015). This principle has also been confirmed through Monte Carlo Simulation by Sadeghi and Fayek (2010). Then for a solution \({X}_{i\cap \mathrm{n}}\) the α-cuts function refers to its eCDF \({\widehat{\mathrm{F}}}_{{X}_{i\cap \mathrm{n}}}\left(x\right)\), as followed:

$$\hat{F}_{{X_{i \cap n} }} \left( x \right) = SUPP^{\alpha } \left( {X_{i \cap n} \subseteq Y_{i} } \right)$$
(11)

In this respect, a random solution formula \(X_{R}\) will have a specific eCDF as follows:

$$\hat{F}_{{X_{R} }} \left( x \right) = SUPP^{\alpha } \left( {X_{{R_{i} \cap R_{n} }} \subseteq Y_{Ri} } \right)$$
(12)

\({\widehat{\mathrm{F}}}_{{X}_{R}}\left(x\right)\) and \({\widehat{\mathrm{F}}}_{{X}_{i\cap \mathrm{n}}}\left(x\right)\) are the eCDF of the simulated and observed distribution, which under Glivenko-Cantelli theorem converge into their respective CDF \({F}_{X}\left(x\right)\) and \({F}_{{X}_{i\cap \mathrm{n}}}\left(x\right)\)(Glivenko 1933; Cantelli 1933):

$$\mathop {\lim }\limits_{n \to \infty } | \hat{F}_{X} \left( x \right) - F_{X} \left( x \right)| = 0$$
(13)

Consequently, the eCDF converges into the CDF as follows:

$$\hat{F}_{{X_{R} }} \left( x \right) \to F_{{X_{R} }} \left( x \right)$$
(14)

and

$$\hat{F}_{{X_{i \cap n} }} \left( x \right) \to F_{{X_{i \cap n} }} \left( x \right)$$
(15)

Then, given the null hypothesis considers that there are no differences between the random eCDF of \({X}_{R}\) and the eCDF of the empirical solution formula \({X}_{i\cap \mathrm{n}}\) as follow:

$${ }\hat{F}_{{X_{R} }} \left( x \right) = \hat{F}_{{X_{i \cap n} }} \left( x \right)$$
(16)

We also know that the CDF of \(F_{{X_{{i \cap {\text{n}}}} }} \left( x \right)\) has the same characteristics as the CDF of \(F_{{X_{R} }} \left( x \right)\):

$$F_{{X_{R} }} \left( x \right) = F_{{X_{{i \cap {\text{n}}}} }} \left( x \right)$$
(17)

Moreover, as observed above, given that we aim to check for simultaneous subset relationships, the null hypotheses will also imply that the eCDF of \({X}_{i\cap \mathrm{n}}\) converges into a uniform distribution as follows:

$$\hat{F}_{{X_{i \cap n} }} \left( x \right):\Pr \left( {X_{i \cap n} \le x} \right) = \left\{ {\begin{array}{*{20}l} {0 \quad if x \le 0} \\ {x \quad if 0 < x \le 1 } \\ {1 \quad if x > 1} \\ \end{array} } \right.$$
(18)

If \({\widehat{\mathrm{F}}}_{{X}_{i\cap \mathrm{n}}}\left(x\right)\) converges into \({\widehat{\mathrm{F}}}_{{X}_{R}}\left(x\right),\) we can infer that is the product of algorithmic bias and a true simultaneous subset solution formula.

5 The two-sample ASR test

The test introduced in this manuscript is conceptually straightforward. Given that ASR solution formulas have a specific pattern recognizable by a skewed distribution of the outcome along the Y-axis of the plot, and a score value of the solution term \({X}_{i\cap \mathrm{n}}\) smaller than the outcome Yi; we can then test whether such a pattern is present in empirical solution terms. In this respect, we will employ a two-sample test which is used to determine whether two underlying cumulative distributions are the same or not. This is a well-known technique in fuzzy logic that has widely been employed to assess whether a fuzzy variable is a random variable (e.g., Hesamian and Chachi 2015) or, for example, to create unidimensional fuzzy sets using different fuzzy samples (e.g., Nikolova et al. 2015).

The two-sample test determines whether the differences between an empirically observed eCDF of a solution formula are statistically significantly different to the eCDF of a simulated solution formula which presents ASR characteristics. In this case, the test does not show any statistically significant differences, it means that both eCDF are equal. Consequently, there are not discernable differences between a simulated asymptotic ASR eCDF and the observed empirical solution eCDF. Essentially, the test allows us to check whether an empirical study’s results converge in the distribution patterns of a spurious simulated asymptotic ASR’s eCDF, or less technically whether the shape patterns of an empirical solution are similar to the shape patterns of an asymptotic ASR solution term.

The simulated dataset F will be set to have the same number of conjunctions of conditions and outcome of the empirical observation E and be composed of randomly generated uniform fuzzy sets.Footnote 1 This would guarantee the ASR distribution of the simulated solution F by considering the same number of conditions involved in the empirical solution E. Besides, we employ a large number of cases (i.e., 10 k) which allows us to individuate the asymptotic ASR eCDF (Chung and Romano 2013, Sadeghi and Fayek 2010). Our H0 refers to the idea that the eCDF of a solution formula F̂ corresponds to the eCDF of its respective simulated solution \(\hat{E}\) .

$$H_{0} :\hat{E} = \hat{F}$$
(19)

The test statistic is based on DTS statistics, a powerful measure to isolate similarities between two-sample cumulative distribution functions (Dowd 2020).

The DTS statistic merges Vaserstein’s (1969) concept of distance between two eCDF:

$$wass = \mathop \int \limits_{ - \infty }^{\infty } |{\text{E}}_{{\left( {\text{x}} \right)}} - {\hat{\text{F}}}_{{\left( {\text{x}} \right)}} {\mid }dx$$
(20)

And Anderson Darling’s (1954) idea that the variance between eCDFs distances is not stable:

$$AD = \mathop \sum \limits_{Xx} \frac{{|{\text{E}}_{{\left( {\text{x}} \right)}} - {\hat{\text{F}}}_{{\left( {\text{x}} \right)}} {\mid }}}{{{\hat{\text{D}}}_{\left( x \right)} \left( {1 - {\hat{\text{D}}}_{{\left( {\text{x}} \right)}} } \right)}}{ }$$
(21)

where Ê is the eCDF of the simulated solution formula \({X}_{R}\), \(\hat{F}\) the eCDF of the empirical solution formula \({X}_{i\cap \mathrm{n}}\).and \({\hat{\text{D}}}\) the eCDF of the combined samples.

DTS two-sample test statistic compares two samples through a reweighted integral of the distance between two eCDF as follows:

$$DTS = \mathop \int \limits_{ - \infty }^{\infty } \frac{{|{\text{E}}_{{\left( {\text{x}} \right)}} - {\hat{\text{F}}}_{{\left( {\text{x}} \right)}} {\mid }}}{{{\hat{\text{D}}}_{\left( x \right)} \left( {1 - {\hat{\text{D}}}_{{\left( {\text{x}} \right)}} } \right)}}{ }dx$$
(22)

Dowd’s (2020) simulations show that DTS results are more stable than other existing tests (such as the Kuiper, the Mises von Cramer, the Wasserstein, or the Anderson–Darling tests) considering mean and variance shifts or inflation.

Essentially, as depicted in Fig. 6, DTS statistics calculates an integral between an asymptotic ASR eCDF curve and the empirical solution formula eCDF. The integral is then weighted to account for the point of maximum and minimum distances between the two eCDF (Fig. 6).

Fig. 6
figure 6

DTS test

The analysis is run with R twosamples package (Dowd 2020), considering a bootstrap algorithm to compute an estimate p̂ of p and also account for the small/medium size. The test’s main idea is that if H0 is rejected, then the eCDF of the observed study solution differs from the eCDF of the respective simulated sample.

To facilitate the researcher to implement this test, in the annexe is possible to find the R code details of the example reported in this paper. A specific R package is also in working progress.

6 Illustrative example

The DTS test is applied to two simulated illustrative examples. The first example is based on a simulated dataset of 100 cases built with five random conditions and one outcome generated considering uniformly distributed fuzzy set scores to guarantee highly skewed ASR distribution as a result of QCA. The five conditions and the outcome are then subject to a QCA. The truth table is minimized by including rows in which PRI and consistency are equal to 1 as in standard QCA. We decided to set the PRI above 0.8 and consistency above 0.9 to check the performance of the test when PRI and consistency levels results are respectively high in the outcome Y and low in the non-outcome ~ Y.

The second example is built to produce a non-ASR solutions formula. As such we generate a dataset with 100 random cases with one outcome and three conditions with normal distribution function and equal mean. Conditions and outcomes are then transformed considering the logistic transformation proposed by QCA package (Duşa 2022). To keep the outcome tendentially bigger than the conditions, the outcome is generated considering a smaller standard deviation than the conditions, while the calibration threshold for the fuzzy score of 0.5 is set lower to a lower value than the one used for the conditions to have the majority of cases of the outcome with membership above 0.5. This would allow us to artificially create non-ASR solution terms given all the conditions fuzzy values are tendentially lower than the outcome fuzzy value. The minimization process is run using QCA with conservative PRI and consistency levels.

As it is possible to observe from Fig. 7, example 1 data shape reflects ASR patterns with data to be gathered along the Y-axis. In contrast data in example 2 are horizontally clustered along the top end of the X-axis which excludes the existence of ASR given all the data points are in a non-ambivalent area of the plot.

Fig. 7
figure 7

Plot for random example 1 and random example 2

The solution terms found within example 1 and example 2 are then tested against an ASR artificial solution built with the same INUS characteristics of each solution term but asymptotically defined as ASR through the configuration of 10 K data points. The asymptotic ASR presents perfect ASR characteristics.

As displayed in Fig. 8, the solution term eCDF (represented by the step line) is compared with the asymptotic eCDF (represented by the curve).

Fig. 8
figure 8

eCDF asymptotic versus solution terms

The test reveals whether the solution terms converge into the asymptotic ASR curve, as in the case of Example 2 (left plot of Fig. 8) or it does not converge as in the case of Example 1 (right plot of Fig. 8).

As displayed in Table 1, the two-sample test can recognize ASR considering the dataset randomly generated (i.e., Example 1). Indeed, none of the p-value scores in the hypothetical example is significant (p > 0.05). In contrast, while PRI and consistency values showed the absence of SSR, it is not indicative of the presence of an ASR despite the random nature of the dataset and the uniform distribution of the data.

Table 1 Consistency, PRI, DTS-test: examples 1 and 2

The proposed two-sample test also can recognize solution terms that eCDF is distributed non-uniformly that are in non-ASR as in the case of example 2. In this case, the two-sample test shows that this solution does not converge into a uniform distribution (p-value < 0.05) and therefore does not present a simultaneous subset relationship. It is also important to highlight that, in example 2, also PRI reveals non-ASR with PRI for the outcome Y = 1 and PRI for the non-outcome ~ Y = 0.

In light of these results, we can observe the complementarity of PRI, consistency and the proposed DTS test. While PRI and consistency allow the identification of standard SSR, the proposed DTS test provides additional strength to the results by identifying possible ASR and therefore spurious or incomplete elements of a solution formula.

7 Conclusion

This paper introduces a new test for detecting ambivalent subset relationships based on DTS statistics. The test allows identifying solutions that are in ASR and non-ASR by identifying possible convergence between the eCDF of solution terms that are empirically observed and the eCDF of simulated solution terms.

This approach complies with QCA ontological commitments of being a case-of-effect method by suggesting to the researcher the solution terms automatically produced by QCA top-down or bottom-up algorithms.

The DTS test is complementary and not substitutive of PRI and consistency values. Indeed, while PRI and consistency values can indicate possible SSR the DTS test allows one to further explore the results in the light of possible ASR.