Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter provides an overview of the laboratory experiments in this study and outlines the numerous methodological considerations for the application of fsQCA, a modification the QCA method. A description of the in-basket simulations and decision aids used in the laboratory experiments is provided, followed by a, step-by-step description of the research procedure.

4.1 Design of the Laboratory Experiments

The study was originally designed as a laboratory experiment involving 96 participants in 12 sessions with (see numbers 1–12 in the columns labelled “Cell #” in Table 4.1.). This would have resulted in a total of 384 decisions, since each respondent would have completed decisions for each of the four in-basket simulations.

Table 4.1 Initial research design: 12 configurations of conditions (384 units/96 participants)

A total of 153 MBA students responded to the invitation and attended the decision-making laboratories, but due to incomplete responses three completed in-baskets simulation cases were rejected, resulting in 150 cases in the study (see Table 4.2). The experiments consisted of either groups with four members per group making interactive decisions or groups comprising four individuals making individual decisions. The study was executed 10 times to allow the opportunity to test and retest, replicate and adjust. The number of participants in each group and the number of individual participants is shown in Table 5.1 below.

Table 4.2 Research design: configurations of conditions & number of units

The study utilised a series of four in-basket simulations and role-plays simulating decision-making scenarios. Three decision categories (Human Resource Management, Marketing, and General Management) were tested in four in-basket simulations, combining simulated interactions (SIs) as well as independent thought. Each participant received four in-basket problems to investigate, analyse and resolve.

Participants were asked for decisions on business issues such as the selection of marketing media exposure, pricing, key account management, key talent development, and event venue selection. The problems ranged from low cognitive difficulty to high cognitive difficulty as per Bloom’s (1956) taxonomy of learning objectives. All four simulations were pre-tested with two groups, involving six senior faculty in the marketing and management disciplines and three to four senior business executives in private enterprise. A post-test only design was used to confirm or contradict the asymmetrical relationships between the antecedents of competencies and incompetencies in executive decision-making.

As Table 5.1 displays, the original plan for the study was to test the impact of four conditions, resulting in 2k = 24 = 16 (k = number of conditions) configurations. Only 12 configurations could logically be considered, since treatments of individual participants would not practically allow for the inclusion of a devil’s advocate (DA) role-player in the decision-making process. Each of the 12 configurations of conditions investigates the impact on a minimum of 8 participating MBA students or units. The Boolean algorithms and numbers are displayed in Table 4.2 below.

The implemented laboratory experiment involved 150 MBA alumni and current MBA students at four universities in New Zealand. Each participant completed the two-hour simulation in the laboratory. Each of the participants received an information sheet and was briefed about the procedures and prepared for the group, or individual, decision process. Every participant completed a post-test questionnaire to collect demographic and attitudinal data and was debriefed after completion. In alignment with ethics requirements, all participants are given the opportunity to opt out and attend a further debriefing meeting after all experiments were completed. Not a single participant took up the invitation to attend the second debriefing meeting, but all participants indicated the wish to receive the research results. The briefing sheet, information sheet and debriefing sheet can be found in Appendix B.

4.1.1 Administration of the Experimental Treatments

Potential participants were invited to participate in different laboratories at the different campuses at different points in time, exposing between 8 and 64 participants to the treatments any one point in time. The laboratories were held at 10 different times between January 2012 and April 2012, starting at Auckland University of Technology in Auckland, and ending with Victoria University MBA students in Wellington, New Zealand. The researcher took meticulous care to ensure that the instructor, support material, instruments and physical context remain almost exactly the same during the course of the experiments. Conditions were meticulously recorded before, during and after each of the experimental laboratories.

As participants arrived for the experiment, they received a set of materials (in-basket simulations and decision aids and support materials, collated into pre-packaged sets) encoded by treatment code (see Table 4.2 above). Note that the cells in this study alternate between individual (~group/I) and group treatments, where a cell is a group of people who received exactly the same set of materials, with the same configuration of treatment conditions, and is represented by a treatment code, e.g. INF1. . The tilde ~ sign indicates “not” in Boolean algebra and is explained in more depth in paragraph 4.2.6 on page 131). In not-group (~group) cells, participants worked on their own, without assistance from or interaction with other participants.) All participants received printed (competency or incompetency) training matter and four in-basket simulations (and additional support material) for consideration (see Appendix C for examples of the decision aids and written training materials). All decision sheets and demographic sheets were coded with the treatment code, but participants were not made aware of the meaning or position of these codes (this code/terminology is not used in any of the instructions for the participants).

Every participant received a set of the same four in-basket simulations with the same four business scenarios and problems to solve. The problems under consideration ranged from low cognitive difficulty to high cognitive difficulty (Bloom, 1956). Decision-makers were provided with printed (competency or incompetency) training matter as decision aids for the four in-basket simulations (and additional support material) for consideration. During each 2-h laboratory experiment, four training methods were probed: goal-based scenario (GBS) including simulated interaction (SI); group interactive decision-making (G) (Schank, Berhman, & Macpherson, 1999); devil’s advocates (DA) black hat thinking (De Bono, 1976); decision-matrix training through the Boston Consulting Group matrix (BCG) and knowledge-based teaching aids. Each in-basket had one main cased-based decision to be made. Participants received a finite range of possible answers from which they could select their preferred choice—the one they would recommend to their prospective clients.

In the groups (indicated with the code “G” in Tables 4.1 and 4.2 above), problem solving was done via group interaction (instructions provided in Appendix B). Where SI was part of the treatment, four role-players were identified and participants’ roles were pre-allocated (for detailed descriptions of the roles, see Appendix C for instructions and descriptions of the roles of Vice President (VP): Marketing, VP: Sales & Advertising, VP Operations and VP: Talent & Development). The pre-allocated roles were initially hidden from all prospective participants when they entered the laboratory and only become known once they opened the packs and found the props (i.e. a sash and a button indicating their role). For those groups where DA dissent was indicated (coded “D” in Table 4.2), all participants were provided with an instruction sheet (see Appendix B) and one member of the group received a black hat, a coat button, and a red sash to wear as visible reminders of his/her role to provide caution and highlight potential issues and difficulties with group suggestions. Participants exposed to the GBS treatment condition (indicated with B; those participants who were not exposed to GBS are coded with N) received instructions (see Appendix B) based on the work of Schank, Fano, Jona, and Bell (1993). The research propositions were tested in SI, a form of role-playing (Armstrong, 2006) for all groups where GBS (coded B in Tables 4.1 and 4.2) was indicated. Green (2002, 2005) reports 57 % less forecast errors relative to expert judgement forecasts when participants use SI. According to Armstrong, “simulated interaction is particularly useful in conflicts such as … buyer/seller negotiations, union management relations [and] legal cases” (Armstrong, 2006, p. 9). Since the focus of this study is the development of soft skill competencies such as reasoning and other sense-making heuristics, this forecasting method will be a useful teaching method and decision aid from which it is reasonable to expect a high level of accuracy (See the detailed discussion of internal and external validity Sect. 4.3 below).

Since configurations of the conditions are investigated, not all participants were exposed to the same four training methods. Some learners/participants were only exposed to KBT materials. The KBT competency and incompetency training aids deserve special attention and are discussed in Sect. 4.1.2 below.

Where simulated interaction is part of the treatment, four role-players (Vice President (VP): Marketing & Sales, VP: Accounting, VP: Talent & Development, and VP: Operations) are identified and participants were pre-allocated (at random) to the roles. In some cases the role of Operations Manager was replaced by an alternative fourth role, i.e. the DA. Clear briefs were provided to prepare participants for these roles (see Appendix B). Problem solving was done in isolation for cells with individual participants (~group). In this case, individual participants were be instructed to “wear different hats” when considering alternative decisions. Physical props (such as hats and buttons) were provided to identify the role-players. Groups resolved problems employing SI or role-play, but where no GBS or SI was in the configuration of conditions, groups were left to their devices and natural instincts for interaction. All groups received brief instructions to facilitate group interaction, whether they were exposed to GBS and SI or not (see Appendix B).

4.1.2 Competency and Incompetency Teaching Aids

Decision-makers are often unconsciously incompetent and use ineffective heuristics. Teaching tools or developmental aids that aid in overcoming conscious and unconscious debilitating habits and tools are keen interest in the present study. In a bid to overcome decision-makers’ unconscious incompetence; unconscious childhood biases; implicit cultural training; “leaps to conclusion”; and other competency reducing or debilitating factors, KBT competency training aids were provided to some participants. The laboratory experiments include competency aids which highlighted the context and relevant information and advised against groupthink, consensus and unnecessary complexity (i.e., suggested “dropping tools”) but did not provide additional facts or improved information to support the decision-makers’ decision processes or procedures (the competency and incompetency teaching aids can be found in Appendix C, and differ substantially for each of the in-basket simulations). Some participants (unbeknownst to them) received deliberate incompetency training and decision aids, to act as a placebo. Incompetency aids covered content traditionally taught in business school courses such as the BCG matrix, priority weighting matrices, market share, and customer and profit orientation. For further discussion of the use of incompetency training in organisations and in formal instruction see Woodside (2012).

4.2 Application of Qualitative Comparative Analysis (QCA) as Method

QCA identifies and studies a specific outcome, along with the combinations of causally relevant antecedents affecting that outcome (Ragin, 2008c; Rihoux, 2006; Woodside & Zhang, 2012). Defining the outcome(s) of interest to a study is the most important aspect, more important than either selecting cases or configuring the conditions (variables) that distinguish one case from another (Jordan, Gross, Javernick-Will, & Garvin, 2011). The application of QCA as a research methodology involves numerous procedures which are addressed in this section. Figure 4.1 outlines the terms and abbreviations used in the following discussion.

Fig. 4.1
figure 1

QCA nomenclature (Adapted from Gross, 2010)

For a detailed guide of the 15 dialogues the researcher has to follow along the QCA approach, See Rihoux and Lobe’s (2008, pp. 221–242) detailed guide and 15 steps as Fig. 4.2 illustrates.

Fig. 4.2
figure 2

QCA and the concrete steps and feedback loops (Adapted from Rihoux & Lobe, 2008, p. 238)

4.2.1 Definition of the Outcome of Interest

The first step, culminating from the literature review during which likely variables are identified, is the definition of the outcome. This critical first step assists in the identification of cases with sufficient representation of the each of the sought outcome(s). The characterisation of outcome is specifically limited to decision- and sense-making competencies and decision confidence. Decision competence for the four in-basket simulations was theoretically grounded, as set out in Chapter 2. In addition, the validity of this selection was reviewed by senior management executives and senior scholars with extensive experience and theoretical knowledge in the disciplines of general, human resource (HR), key account, and events management. They concurred that the simulations had verisimilitude and that the outcomes accurately reflected decision competency, noting that decision competency is complex and challenging and is likely to differ substantially by age, education level, managerial experience level, and decision strategy and/or the exposure to a range of andragogical treatments. Since participants were all MBA students or recent MBA alumni (who had graduated less than 3 years prior to the study), careful deliberation by the experts and deliberate analysis of participants’ age, education level and experience resulted in unanimous agreement that the conditions of age, education level and managerial experience can be combined as a single condition. Further, scholars involved in the pre-tests questioned the ability of any instrument to be sensitive enough to “detect the impact of a single learning experience such as a simulation” on a student’s ability, given a lifetime spent as learner (Anderson & Lawton, 2009, p. 206). Since QCA is not studying net effect but the impact of several causal conditions on a well-defined outcome within a specific context, this concern is realistic but not relevant to the nature and intent of this particular study.

4.2.2 Selecting Cases

The definition of outcome(s) is followed by an iterative process of selecting cases and conditions to ensure that the selected set of cases exposed to the configuration of causal conditions exhibit the range of outcomes (Fig. 4.3).

Fig. 4.3
figure 3

Research design and process (Adapted from Gross, 2010; Jordan et al., 2011)

4.2.2.1 A. Type of Cases

A case is effectively the unit of analysis of this research and according to Kent (2009, p. 194), “each case [can be seen] as a particular combination of characteristics—as a configuration. Only cases with identical configurations can be seen as the ‘same type of case’”. For the purposes of this research, the proposed case unit of analysis is an MBA student with a specific level of managerial experience who is exposed (in controlled laboratory studies) to a specific combination of andragogical conditions. Each case is selected to represent a variety of ages, genders, educational levels, experience levels. In addition each case had, due to their participation in the laboratory, been recently exposed to a finite selection of decision support aids, including theoretical frameworks and extracts from peer-reviewed journal articles. The “Truth Table” (see Table 4.14) shows the number of cases (frequency) that possess the logically possible combination of “causal” conditions likely to affect the outcome of interest, in this case the participants’ decision competency or incompetency.

According to Byrne and Ragin (2009), it is desirable in selecting cases for inclusion to achieve sufficient variety in both conditions and outcome in order to ensure robust analysis. Although this may appear to be improper manipulation of the data set, the resulting heterogeneity of condition and outcome is appropriate for QCA methods, since the method’s logic is not probabilistic. QCA considers causality—it does not consider whether more or fewer cases exhibit certain characteristics—which “contributes to the richest possible explanation of relationships among the widest array of data” (Gross, 2010, p. 40). The real interest of this study is in the existence of a specific combination of implicants and the resulting outcomes within the context, hence the pursuit of maximum heterogeneity in types of cases selected, where implicants are those conditions which remain after all superfluous conditions are removed and only the most parsimonious solutions, which leads to the outcome, remains (Rihoux & Lobe, 2008).

4.2.2.2 B. Number of Cases

Different variants of QCA are more suited to certain data set sizes (see Fig. 3.1). QCA literature avoids rigid data set size requirements, since data set size is closely linked to the studied outcome and the number of conditions considered likely to affect the outcome (see the next section). A further important consideration when determining data set size is the researcher’s ability to gain sufficiently rich and empirically intimate knowledge about each individual case (Berg-Schlosser & De Meur, 2009). In a workshop on practical considerations for QCA, Fiss (2011) offers valuable advice regarding the ratio of cases to variables to ensure that “real data” can be distinguished from “random data” and warns against situations where the ratio of cases drops below tested thresholds. Fiss’s (2009) suggested ratios are shown in Table 4.3.

Table 4.3 Ratio of causal conditions to cases

Set size and resulting data space grow exponentially with each additional independent condition and thus the number of possible combinations of conditions quickly exceeds the number of empirically observed combinations (Ragin, 1987; Rihoux, 2006). In addition, authors point out that cases that display all logically possible combinations “might be unlikely to occur in practice or be hard to detect or measure” because “size decreases the chance that very logically possible combination will have an empirical referent” (Fielding & Warnes, 2009, p. 281). Berg-Schlosser and De Meur (2009, p. 27) point out the QCA algorithm can produce robust results “even with large amounts of empty data space”, thus non-observed cases, called “logical remainders”, are not objectionable and have been justified (Ragin & Rihoux, 2004; Rihoux, 2006). Authors suggest small-N data sets require between 1 and 4 cases, intermediate-N sets in the range between 5–10 or 6–100, and large-N sets to exceed 100 cases (Ragin & Rihoux, 2004; Rihoux & De Meur, 2009). When applying csQCA—where variables can only assume binary values (0 or 1)—a total of 2n (n = number of conditions) data sets are required for analysis. For the mvQCA method, the number of possible configurations is calculated by considering the number of values possible for each condition, and multiplying said number with the value for each of the variables (Ragin & Rihoux, 2004). This study involves ten conditions: one 7-value condition (age_c), one 6-value condition (educ_c), three 4-value conditions (age, man_exp, conf_c and chng_c) and five 2-value conditions (gender, group, devil, gbs and comp) resulting in 7 × 6 × 4 × 4 × 4 × 2 × 2 × 2 × 2 × 2 = 86,016 possible configurations of conditions. For this study, the five of the eight variables have binary values, thus assisting in keeping the data space manageable and the number of cases for this study well within the range for either mvQCA or fsQCA, and the case size suggested by Fiss. (2-Value conditions are also called crisp sets and for a more detailed explanation of 4-value and 6-value conditions see Table 4.4. For the calibrated values of conditions in this study, see Tables 4.5, 4.6 and 4.7.)

Table 4.4 Crisp set and fuzzy set variables
Table 4.5 Crisp set scoring (values) for dichotomous conditions
Table 4.6 Statistics: Calibration of fuzzy sets for antecedents (demographics and experimental treatments)
Table 4.7 Fuzzy set scoring (values) for the measured antecedents: age, education and experience

4.2.3 Selecting Causal Conditions

“The key philosophy of QCA as a technique is to (start) by assuming causal complexity and then (mount) an assault on that complexity” (Ragin, 1987, x). As a third step in the research design process, the researcher populates the raw data table, in which each case displays a combination of conditions and an outcome or outcomes.

4.2.3.1 A. Identifying Conditions

“Conditions are the variables that distinguish one case from another … and may influence the outcome under analysis” (Jordan et al., 2011, p. 1162). The selection process is an important part of the QCA methodology; it is generally grounded in theory and is likely to be an iterative process. To select initial causal conditions for consideration and analysis, Amenta and Poulsen (1994) and Yamasaki and Rihoux (2009) recommend five alternative strategies. (1) The comprehensive approach where the full array of possible factors is considered in an iterative process. (2) The perspective approach, where a set of conditions representing two theories are tested in the same model. (3) The significance approach, where the conditions are selected on the basis of statistical significance criteria. (4) The second look approach, where the researcher adds one or several conditions that are considered as important although dismissed or overlooked in a previous analysis. (5) The conjunctural approach, where conditions are selected based on joint interactions among theories which predict multiple causal combinations for a certain outcome. This study applied the second strategy, where theories are tested in the same experimental model. The preliminary list of conditions posited at the outset of the study was:

  • Age (age)

  • Gender (gender)

  • Education level (educ)

  • Experience in management (man_exp)

  • Confidence (conf)

  • Group interaction (group)

  • Simulated interaction in goal-based scenarios (gbs or GBS)

  • Inclusion or absence of the devil’s advocate (devil)

  • Competency training materials (comp)

  • Incompetency decision aids (incmp or ~ comp).

All conditions have been previously identified by scholars and tested with practitioners as significant influences on competency or incompetency. The first three conditions (age, educ, man_exp) as well as the inclusion of a DA role-player (devil) merit further explanation (see Sect. 4.2.3.3).

4.2.3.2 B. Number of Conditions

Researchers advise against too large a number of conditions, as it adds complexity to the logic space, thus making it difficult to interpret the results. Berg-Schlosser and De Meur (2009, p. 28) recommend keeping the ratio between number of conditions and number of cases balanced and offer the following guidance: “The ideal balance is not a purely numerical one and will most of the time be found by trial and error. A common practice in an intermediate-N analysis (say 10–40 cases) would be to select from 4 to 6–7 conditions”. Given the moderate to large number cases (N = 150 for this study), having ten conditions specific to each in-basket simulation (group, gbs, devil, comp, age, gender, man-exp; baski, confi and chngi) is considered acceptable.

Berg-Schlosser and De Meur (2009) suggest various procedures such as discriminant analysis to identify strong bivariate relationships, and factor analysis to create composite conditions, where multiple conditions contribute to the same dimension. This study implements QCA procedures and Boolean algebra to determine the least number of factors that account for the common variance of the three variables of age, education level and level of managerial experience. These composite calibrated factors are indicated with the labels age_c; educ_c and man_exp_c. The set theoretic methods on which the fsQCA procedures are based enable researchers to investigate configurations of causal conditions with causal paths represented in Boolean algebraic form, thus enabling redundant variables to be identified and deleted (Ragin, 1987, 1994, 2000), resulting in parsimonious equations.

4.2.3.3 C. Alternative Conditions for Future Consideration

As with the list of outcomes, the QCA conditions were reviewed by experienced educationalists and management practitioners. The experts suggest additional or alternative conditions to expand the study: unconscious deliberation (and/or delayed decision-making); providing learners with checklists composed by experts; the impact of a decision-coach providing situational feedback and additional training as a complement to the heuristics (e.g. take-the-best [Gigerenzer & Goldstein, 1996]). Such investigations would require additional data fields and more detailed case data to accommodate all possible configurations of conditions and should be repeated with pre- and post-test results (temporal data sets required to detect the influence of time lapsed on the deliberation and decision-making outcomes); they clearly reasonable and worthwhile directions for future research but were beyond the scope of this investigation. Also, since this study is interested in a selection of causal paths, and QCA investigates causal conditions on a pre-defined outcome—in contrast to net effect investigation by statistical methods—investigation of the suggested causal conditions can be taken up by further studies at a later stage.

An obvious variable for consideration in management decision competency is ethnicity. Although ethnicity data has been gathered for each case, this study is purely interested in the efficacy of particular andragogical methods on decision competency or incompetency for MBA students in general. The possible effect of cultural conditions on decision (competency or incompetency) outcomes as well as their impact on decision confidence could be analysed in future research projects.

4.2.4 Scoring Cases: Conditions and Outcomes

Once the outcomes, conditions and cases are determined, the researcher collects raw data and assigns values for each QCA variable (see Appendix D for the raw data). The allocated scores designate the degree of membership to a predetermined set, in contrast to a variable approach which attempts to place each case on a continuum of relative values. A score of 1 indicates full membership of the set, and a score of 0 indicates non-membership or exclusion of a variable. If only 0 and 1 is indicated (as in the presence of absence of a treatment condition such as the DA, this set of values is called a crisp set. FsQCA and mvQCA permit both binary values (0,1) and multiple threshold values (see Table 4.4). The researcher must be able to clearly and transparently justify all threshold values on theoretical or empirical grounds to ensure reliability of the study and its results (Rihoux & De Meur, 2009).

Table 4.4 captures two aspects of diversity: difference in condition and difference in degree to which the condition is present or not present, and illustrates the general idea behind fuzzy sets. In the three-value fuzzy set an extra value is added to the crisp set, namely 0.5. This value indicates membership of cases that are neither fully in nor fully out of the set in question (e.g. payment of an invoice may be neither quick—less than 30 days, nor long—more than 60 days, so in this example 45 days may be given the mid-level value of 0.5). The table sets out different levels (four-, six-, and continuous) of fuzzy sets, each respectively more finely tuned to the level of membership than the one before. All fuzzy sets of three values or more utilise levels above and below the “crossover point” of 0.5 and the two qualitative states of “fully in” and “fully out”. The researcher calibrates data using substantive knowledge of each case, as well as theoretical knowledge (Ragin, 2009) to determine the number of values in the fuzzy set. The researcher purposefully calibrates each condition to indicate “the degree of membership to a well-defined and specified set” (Ragin, 2008a, b, c, d, p. 30).

For this study some conditions are clearly dichotomised, such as group (participants were either in a group or not); gbs, devil, comp (incmp = ~comp). Participants either received this type of decision support aid or received the incompetency training aids. No participant received neither and thus a simple crisp set membership of 1 = full inclusion and 0 = full exclusion (Ragin, 2007a) will suffice. Crisp scores for the four treatment antecedents (used inter-changeably with the term conditions) are set out in Table 4.5. Note that according to fsQCA methods the absence of a condition is labelled with a tilde (~) and its value is 1- (value of the present condition). Thus ~ group = 1-group. So if the score for a particular case is (say) 0.99 for its group condition, then the ~ group value for that case is 1−0.99 = 0.01. Note that for this study ~ comp = incmp; cases that did not receive competency training decision aids in all cases received incompetency decision aids. Thus 1-comp = incmp = ~comp. For the condition gender, males received the crisp score of 1, whilst female participants (~male) = 1-male = 1−1 = 0 = female.

In contrast to the crisp sets above, the antecedent conditions age (age), education (educ) and managerial experience (man-exp) can be characterised in terms of differences in degree. It is important to note that calibration of fuzzy sets is not merely positions of each case relative to another; it is a calibration relative to a standard. The standard is either a generally agreed upon or conventional standard (e.g. poverty standards set by the United Nations); or a standard based on “accumulated substantive knowledge … that resonates appropriately with existing theory” and is thus set by the researcher (Ragin, 2007a, p. 7). According to Ragin (2007a, p. 17), these groupings can be “preliminary and open to revision” based on increased understanding and dialogue between the cases and the findings. In this case the target set is defined as students with a postgraduate qualification (note that some participants were still in the process of acquiring a MBA degree) with more than 5 years’ management experience.

Each of the variables in the raw data is calibrated using the fsQCA programme and the sub-routine of the “indirect method of variable calibration” (Ragin, 2008a, b, c, d, p. 84). The researcher specifies three values for calibrating the scale: the original value covering 95 % of the data, 50 % of the data values and 5 % of the data values. Table 4.6 provides the original statistics and the calibrated values of the treatment and measured antecedents of this study. Table 4.7 provides an overview of the calibrated values as performed by the fsQCA software. Full details for each case can be found in the Truth Table in Appendix D.

4.2.5 Calibrating the Outcome: Decision Competency or Incompetency

The central focus of this study is that decision-making competencies improve substantially when participants receive support by using SI to extract directive feedback from peers in groups; overcome deference when prompted to dissent by peer-enacted role-playing (e.g. DA); and place themselves mentally within the context either in action learning-by-doing through experiential learning, through role-play, or by envisaging the context of the enactment of the decision. The study investigates previous research findings (e.g. Armstrong & Brodie, 1994; Spanier, 2011) suggesting that incompetency training is effective in increasing incompetency in executive decision-making and outcomes, and attempts to confirm and extend these prior findings through the analysis of empirical data.

The definition and understanding of decision competency or incompetency (broadly termed decision success and coded as success_c in the data and truth tables) has been vastly aided by scholars such as Gigerenzer, Boyatzis and Mintzberg. The standard educational measure of success and commonly acceptable level of pedagogical success is a pass mark—a student needs to achieve above 50 % in a test or examination to be seen as “having successfully completed the assessment event.” Unfortunately real-life business decisions are not so easily assessed as “right” or “wrong.” Therefore, decision competency/incompetency as an outcome for cases in this study is remarkably fuzzy and not merely dichotomous as in “yes, successful” or “no, not successful.” Tables 4.8 and 4.9 illustrate the fuzzy set score for two different calibrations of overall decision success. Reflecting the traditional view of educators that a pass mark is at least 50 % of the total marks possible, this study ascribes success according to the degree to which participants have supplied “best/correct” answers for each of questions in the four in-baskets simulations, as identified by the experts.

Table 4.8 Fuzzy scoring for outcome condition: Overall decision competence (success-tot)
Table 4.9 Fuzzy scoring for outcome condition: overall decision competence (bool_success)

In the in-basket simulations, therefore, participants had to have selected the best/correct answer for two of the four simulations. The first outcome (success_tot) is aggregated over all four simulations using the median and the scale is calibrated using the QCA sub-routine to calibrate fuzzy scores. Overall decision success is calculated in the second outcome (bool_success) by applying Boolean algebra, which delivers the minimum value over the four decision outcomes for four in-baskets or minimum (Xi); where X is the crisp score for each separate in-basket and is each of the 4 in-basket answers.

Two additional implicants are considered for decision success/failure, namely (1) the participants’ confidence in their decisions and (2) their likelihood to change their decision “should you be asked to review them in two weeks’ time”. Participants are asked to indicate their confidence in the recorded decision on a Likert scale of 1–4, with 1 = “not very confident” and 4 = “very confident; and the likelihood of changing their mind on another Likert scale of 1–4, with 1 = “very likely to change” and 4 = “I will not change my decision at all. I will stick to my current decision”. These confidence (confi) and likelihood to change (chngi) outcomes were recorded and captured separately by each participant for each of the in-basket simulations (Tables 4.10, 4.11, 4.12, 4.13).

Table 4.10 Fuzzy scoring for outcome antecedents for in-basket simulation 1
Table 4.11 Fuzzy scoring for outcome antecedents for in-basket simulation 2
Table 4.12 Fuzzy scoring for outcome antecedents for in-basket simulation 3
Table 4.13 Fuzzy scoring for outcome antecedents for in-basket simulation 4

4.2.6 Constructing the Truth Table

The next step after calibrating the conditions and outcome(s) is to construct a “truth table” (Ragin, 2007b). In a truth table (see Table 4.14) variables are no longer isolated or distinct aspects of cases, but are treated as components of configurations that still allow for the retention of the uniqueness of each case.

Table 4.14 Extract of calibrated data in the Truth Table

Each row in the truth table represents a unique configuration of conditions with a single threshold value for each condition and each outcome for that case. The truth table lists all logically possible combinations of conditions and the outcomes displayed by each case (in this case each participating MBA student). It sorts the cases by the combinations of causal conditions they exhibit, using reasonable subsets of these conditions, from “recipes that seem especially promising” (Ragin, 2008a). As described earlier, all possible logical combinations of causal conditions are considered, even when no empirical instances are present in the study (Ragin, 2008a). The number of configurations is 2k where k is the number of causal conditions; k = 10 for this study, resulting in 1024 configurations. When no observed empirical case is present it is termed logical remainders. There are three basic operations the software performs: negation; logical OR, and logical AND (Ragin, 2009, p. 94).

Negation

The tilde sign (~) indicates negation and is calculated as follows:

$$ \left(\mathrm{Membership}\;\mathrm{in}\;\mathrm{set}\; not\hbox{-} A\right)=1-\left(\mathrm{membership}\;\mathrm{in}\; set\;A\right),\;\mathrm{also}:\sim \mathbf{A}=\mathbf{1}-\mathbf{A}. $$

In this study, for example, negating the set of participants with high age transforms the set to not-high age (i.e. younger participants). For crisp set membership the scores thus change from 1 to 0 and from 0 to 1. For fuzzy set membership, full membership of 0.99 will be negated to 0.01. The only score that does not change is that of maximum ambiguity, 0.5. The tilde (~) indicates either the absence of the treatment (for group, comp, devil and gbs) or, for measured antecedents (age, experience and education), lower levels.

Logical AND

The intersection of two or more sets is calculated by logical AND (Ragin, 2009, p. 96). The QCA software determines the minimum membership score for each case in the sets that are combined. Logical AND statements of all possible combinations determine a new fuzzy score by finding the lowest value of the antecedents in the model (statement) when a statement combines two or more antecedent conditions. For case 14 in Table 4.14, for example, the score for group AND devil AND comp AND gbs is equal to the min{0.99; 0.01; 0.01; 0.33} = 0.01. In Boolean algebra, the mid-level dot (•) indicates logical AND. The model group • gbs • comp • devil → success would thus indicate the presence all four treatment conditions. It would read: the treatment conditions group AND gbs AND competency training AND devil’s advocate dissent leads to success.

Logical OR

The union of two or more sets is calculated by logical OR, which is determined by calculating the maximum score in each of the component sets and reflects the degree of membership of each case in the union of sets. For case 14 in Table 4.14, for example, the score for group OR devil OR chg1_c would be = 0.99.

4.3 Validity of the Method, Procedures and Treatments

Scientific researchers demand rigor and verisimilitude in experimental methods (Campbell & Stanley, 1963; Salmon, 2003). Validity tests for research methods give an indication of how well the experiment and the instruments used in the experiment measure a given characteristic, given a certain set of circumstances and a certain set of research participants. From this definition, one measurement or “assessment technique may have many types of validity and a unique validity for each circumstance and group or items assessed” (Burns & Burns, 2008, p. 425). Cook and Campbell (1979) observe that two main types of validity are taken into account for research studies: internal and external. “Internal validity refers to the approximate validity with which I infer that a relationship between two variables is causal or that the absence of a relationship implies the absence of cause. External validity refers to the approximate validity with which I can infer that the presumed causal relationship can be generalized to and across alternate measures of the cause and effect and across different types of persons, settings and times” (p. 37).

The focus of this study is on experimental educational simulation in the form of GBS, role-plays or SI and in-basket simulations and a plethora of literature covers the validity of these techniques. A comprehensive list of 21 validation concepts can be found in the work of Feinstein and Cannon (2002), ranging from algorithmic validity to plausibility, representational validity, and verification. The authors conclude that the lexicon of simulation validation research “can be roughly understood in terms of two basic dimensions: game development versus application and internal versus external validity” (p. 430). They then define the following terms: “the developmental system represents issues regarding the actual development of a simulation game, drawing on principles of representational validity. The educational system represents issues involving the learning process, as the game is actually applied in a teaching environment, drawing on principles of educational validity. Internal validity, roughly speaking, addresses the extent to which a simulation functions in the intended manner. External validity asks whether the internal functioning corresponds to relevant phenomena outside the simulation” (p. 430).

4.3.1 Internal Validity

4.3.1.1 A. Conceptual Validity and Fidelity

Feinstein and Cannon (2002), representational validity relates to the level of realism presented to the learner, or fidelity. Hays and Singer (1989, p. 50) define fidelity as: “the degree of similarity between the training situation and the operational situation which is simulated. Is a two dimensional measurement of this similarity in terms of (1) the physical characteristics, for example visual, spatial, kinaesthetic, etc.; and (2) the functional characteristics, for example the informational, stimulus, and response options of the training situation”.

There are opposing views in the literature on the need for a high level of fidelity. Some earlier studies found that higher levels of fidelity ensure effective training or enhanced learning (Feinstein & Cannon, 2002; Kibbee, 1961), whilst others found that higher levels of fidelity hinder learning in novice trainees due to overstimulation, and that lower levels of fidelity assist in focusing on the generalisable principles of the training (Alessi, 1988; Cannon, 1995).

Feinstein and Cannon (2002) argue for construct validity rather than fidelity, empirical validity or realism. They maintain that conceptual validity is essentially a level of theoretical accuracy between the system it models and the simulation and is commensurate with a set of objectives: “Construct validity implies that the relationship between variables is correct, but they can be more subjective and modelled by any number of heuristic devices” (p. 433). This incorporates face validity, plausibility or verisimilitude—the degree to which the evaluator or user perceives the simulation to “ring true”.

“The second form of internal validity, addresses the degree to which game participants understand the game and play it with insight … referred to as educational validity” (Feinstein & Cannon, 2002, p. 435). Parasuraman (1981) questions the extent to which student decisions are influenced in the intended manner by the simulation design. To be internally valid, the educational simulation needs to provide students with a simulation modelling the real business phenomenon in order to develop managerial insights and decision-making skills. According to Norris (1986, p. 447), the internal validity of simulation modelling represents “the educational value of simulations in teaching specific material to participants. Many other researchers equate internal validity with the educational effectiveness of the simulation (Bredemeier & Greenblat, 1981; Norris, 1986; Pierfy, 1977; Wolfe, 1985). Cannon and Burns (1999) suggest using the three taxonomies of educational objectives as cognitive (thinking), affective (feeling) and psychomotor (acting) patterns to evaluate the design and performance of the simulation for testing conceptual validity. The extent to which the three educational taxonomies can be observed determines the conceptual validity of the simulation. According to Feinstein and Cannon (2002, p. 435), “to achieve internal educational validity, game participants would have to discern the phenomena of being modeled”.

But, as Feinstein and Cannon (2002) point out, internal validity does not necessarily equate to educational effectiveness. They provide an example where students are taught via a simulation with high verisimilitude. The game simulates a set of desirable responses, but the overall principle derived by the students is not educationally sound: “For instance, in the interest of teaching the effect of advertising in consumer markets, a game might emphasize the advertising function and end up teaching students that advertising is always the primary key to marketing success. The game would be internally valid but externally disastrous!” (Feinstein & Cannon, 2002, p. 426) (Fig. 4.4).

Fig. 4.4
figure 4

The faces of simulation game validation (Adapted from Anderson, Cannon, Malik, & Thavikulwat, 1988)

Several steps were taken during the development of the four experimental treatments to test and confirm that participants would perceive the in-basket simulations as (a) realistic and (b) likely to be encountered during real workplace experiences by real-world executives. A series of steps was followed: (1) an extensive literature review to find validated and cases used in prior studies/extant literature; (2) in-basket simulations were designed based on the researcher’s and supervisors’ personal experiences as practitioners in marketing and as managers; (3) experts reviewed the simulations for realism and confirmed both the likelihood of encountering such decision scenarios in real-life situations; (4) the simulations were pre-tested with MBA students and experienced practitioners to ensure verisimilitude and that instructions were read and interpreted as intended; (5) all highlighted procedural issues were addressed; and (6) the training support materials were revised. Further details of these six steps are set out in paragraph B on page 138. Figure 3.7 shows a model for the research process of this study adapted from the “Degrees of Freedom Analysis” (DFA) model described by Woodside (2011a, b, p. 245) for considering group decision-making in organisational behavior (OB) (Fig. 4.5).

Fig. 4.5
figure 5

Step-by-step research process for group decision-making in organisational behavior (OB) (Adapted from Woodside, 2010, p. 245)

4.3.1.2 B. Procedures to Ensure Realism, Fidelity and Construct Validity

Procedures to ensure validity for the laboratory experiments consisted of three distinct and consecutive phases: (a) development and design; (b) pre-testing and pilot; and (c) main field test.

Development and Design

An extensive literature review delivered useful guidance in terms of the design of games, simulations, and GBS creation. In addition, the researcher pursued “dialogic validity” (Anderson & Herr, 1999, p. 16; Newton & Burgess, 2008, p. 26) by supplementing theoretical guidelines with informal conversations and open-ended interviews with scholars and practising management development consultants. These practitioners are actively using role-plays, simulations and in-basket simulations as training and development tools in their own business practices as well as in their own action research within their training institutions. Dialogic validity was thus achieved.

Construct Validity

Construct validity refers to the vertical inter-relationship between an unobservable construct (conceptual combination of concepts) and an ostensible measure of it, which is at an operational level. Peter (1981, p. 133) refers to the development of constructs in marketing research and states: “Although marketing has little in the way of fully developed, formally stated scientific theories, such theories cannot develop unless there is a high degree of correspondence between abstract constructs and the procedures used to operationalize them. Because construct validity pertains to the degree of correspondence between constructs and their measures, construct validity is a necessary condition for theory development and testing.”

This study pursues construct validity by building pre-determined and pre-validated constructs from the seminal and conceptual work of scholars such as Simon, Armstrong, Gigerenzer, Schank and Schwenk to explain the behavior of students and practitioners involved in the managerial decision activities. It is common practice by marketing scholars to seek constructs and nomenclature from other disciplines and to borrow “constructs and theoretical propositions relating to them” (Peter, 1981, p. 133).

In addition, constructs have two recognised types of meaning. The first type, namely systemic meaning (Kaplan, 1967), refers to the fact that interpretation of the construct is determined by the theory in which it is grounded. Thus, to understand incompetency training as a construct, readers will have to understand training theory (andragogy and pedagogy), in which the concept is embedded. Construct validity and systemic meaning was tested with marketing scholars, management practitioners and educationalists and validity established to the satisfaction of the researcher and the main beneficiaries of the study. The second type of meaning, namely observational meaning, refers to the ability of the construct to be operationalised. Again this validity was tested with three members of each of the beneficiaries, that is, MBA teachers, MBA graduates and MBA students. Once again, expert scholars and the researcher were satisfied that operational meaning was achieved to a very high degree.

Pre-test

To enhance ecological validity and verisimilitude, the in-basket simulations were pre-tested with MBA students and marketing and management practitioners currently employed in the roles and functions portrayed in the in-basket simulations. (Note: these MBA students did not participant in the laboratory experiments.) Two types of pre-tests were done: (1) a time-controlled pre-test with current MBA students and (2) an off-site, self-timed, uncontrolled, self-administered test completed by practitioners. After the time-controlled pre-test, participating MBA students completed the demographic section of the survey and the participants were debriefed. The debriefing focused particularly on: (1) the simplicity and comprehensibility of the instructions; (2) realistic time allowance (to complete the reading, study the decision aids, consider an opinion and complete the decision forms); (3) verisimilitude or realism of the simulations; (4) complexity and relative comprehensiveness of the provided information; (5) the presence of escalating decisions from lower order to higher order decision-making activities; and (6) motivation and enthusiasm to complete all sections of the written questions and (7) practicality of procedural issues.

To deliberately avoid favouring one of the contending alternative theories (Woodside, 2011a, b) contained in the multiple choice answers, all data collection forms, in particular the sections with alternative answers, were designed and tested with research experts. In line with the suggestion by Woodside (2011a, b, p. 247) “to achieve bias reduction of questioning” …independent experts checked the decision alternatives (multiple-choice answers) as well as the sequence of answers in the questionnaire. In addition, “to allow for objectivity and verifiability in the data collection and analysis, the actual survey forms used to collect data is available for independent examination” (Woodside, 2011a, b, p. 247).

The initial in-basket simulations were subjected to a series of pre-tests with practitioners and scholars in the field and revised. The pre-tests revealed that changes were required to word-choice in order to clarify instructions, The question sequence was changed and formatting issues such as structure and lay-out of multiple choice answers and the 4-point Likert scales were resolved (Cox, 1980; Likert, 1932. A few minor changes were made to the actual simulation descriptions. The time allocated for self-study and case reading (both the competency and incompetency training materials); analysis; group discussions; and recording of decisions were tested and adapted. For example, the time allowed for self-study was lengthened from 15 min to 20 min; the time allowed to record decisions was reduced from 7 min to 5 min. Pre-tests established that individuals responding to the four in-basket simulations took less than an hour and thus half the time of configurations of conditions where group interactive decisions are required. It was determined that all participants in the pre-test interventions could quite comfortably complete the full experiment within the allotted time of 2 h.

4.3.1.3 Conducting Fieldwork: Main Test

Instructors

An experienced administrator is necessary to manage the implementation phase of the experiments. “One of the largest potentially confounding factors is the instructor” (Anderson & Lawton, 2009, p. 206). The literature suggests two ways to control for the impact of the instructor (Anderson & Lawton, 2009; Gosen & Washbush, 2004). One way is to keep the instructor constant throughout the study; an alternative method is to use a large group of instructors and to randomly allocate them to the test and control groups. Since instructors were to be selected from faculty members with already heavy service and teaching responsibilities, and the selection was further complicated by our inability to offer enticing rewards, the second option was discarded in favour of having a single instructor.

The administrator commits substantial amounts of time to prepare to deal with the 10 laboratory experiments and deal with the 150 participants and the complications related to the 12 different configurations of conditions. In addition to the requirement of a substantial amount of time, Anderson and Lawton (2009) identified two further considerations when nominating the instructor: (1) bias and (2) the competence of the instructor. The researcher ultimately selected a single professional consultant, well-versed in role-play and in-basket simulations and well-regarded as a facilitator by past students and current colleagues. This selection ensured time-commitment and competence. The study relied on the professional calibre of the nominated instructor and thorough briefings and debriefing to monitor and control for bias were implemented. As additional preparation, the facilitator was involved in all of the pre-tests. She followed carefully written and pre-tested instructions (see Appendix B) to the letter for every one of the ten laboratories to ensure consistency for all three phases: Introduction, Experiment, and Debriefing (also see the AUTEC-approved forms in Appendices A and B). The researcher’s supervisor acted as observer with the special responsibility to monitor behavioral and attitudinal biases in the instructor.

A single administrator implemented the study and briefed and debriefed all participants. Ten separate experiment laboratories were held to accommodate the demanding schedules of the MBA students and alumni and to administer the experiment at business schools further afield. Participants could self-select which of the experimental laboratories to attend within their local university or they could travel to a nearby campus, if the particular date of the laboratory suited them better.

In order to minimise instructor bias, the instructor read the brief and debrief from prepared documents. All instructions to the participants were in writing and all competency and incompetency training support material were only provided in printed document format. The facilitator had clear instructions not to interact with the participants, provide feedback, or any additional training or insights, other than to indicate the elapsed time. The time was kept with the aid of an alarm clock which was used in every lab. The experiment is highly structured into five clear sections. The first is a self-study period of 20 min where participants got the opportunity to study the full set of four in-basket simulations as well as the decision aids. Thereafter the facilitator structured the remaining time into four sections of 25 min to allow a 15-min group interaction phase, a 5-min decision recording phase, and an additional 5-min phase to prepare the next simulation. These phases were announced verbally as well as by ringing a small bell. In all but one laboratory, individuals and groups worked in the same room and individuals were briefed to follow their own time-frame. In all cases individuals completed the full experiment well before the groups. In approximately 15 % of the group cases, the groups completed their discussions and decision recording before the chiming of the bell. All individual participants completed the full experiment well within the two hours allocated.

Additional Considerations Regarding Internal Validity

With regard to internal invalidity factors, the researcher considers the degree to which the experimental treatment causes change(s) in specific experimental settings. Prior research (Campbell, 1957; Dimitrov & Rumrill, 2003) identifies eight categories of variable which need to be controlled, namely: history, maturation, mortality, instrument decay, testing and pretesting effects, statistical regression towards the mean, selection of participants, and interactions of factors (e.g. selection and maturation). This study followed a post-test only design with control groups as set out in Figs. 4.1 and 4.2 above. In basic post-test only experimental designs, one or more experimental groups are exposed to a treatment or research intervention and the resulting change is compared to one or more control groups who did not receive the treatment.

Woodside (1990, p. 230) highlights two requirements to control sources of invalidity in true experiments: (1) two or more comparisons of subjects (individuals or groups) either exposed or not exposed to the interventions; and (2) “randomized assignment of participants to treatment exposure and to no treatment exposure (i.e. control) groups.” Woodside expands on the issue of amount and allocation of participants’ assignment by pointing out that enough subjects must be randomly assigned to ensure that treatment and control groups are very similar in all aspects (including demographics and psychographics) before the treatment conditions are administered. To respond to requirement (1), the experiment was repeated 10 times with more than eight participants in each of the cells. Further, each of the treatments is contrasted by a group or individuals who do not receive the treatment, also consisting of eight or more participants. To respond to requirement (2), randomisation was carefully managed to ensure that participants self-assigned to the treatments, without prior knowledge of which treatment they were about to receive. Further randomisation was achieved by ensuring that each laboratory at the different campuses covered a random selection of the treatments, thus enabling randomisation across the different university campuses (see the section below for additional clarification of random sampling for this study).

Sampling and Randomised Allocation

Subject pools of MBA students, MBA alumni, advanced postgraduate management students and executives-in-training (on executive management or HR short courses) studying in the Faculty of Business and Law at Auckland University of Technology (AUT), the University of Waikato in Hamilton, Victoria University in Wellington and Massey University in Palmerston North served as participants in these experiments. A total of 153 learners participated in the study. Participants assigned themselves randomly to the alternative treatments. Since our interest is in the efficacy of education methodologies on managerial decision-making competencies, the choice of sample group was based on two factors. The most important factor was the likelihood that participants would exhibit a need for and therefore interest in managerial decision-making competencies to ensure commitment and a good level of interest, as well as active, enthusiastic (even dedicated) participation. A second sampling consideration was that learners need a comparable, basic level of understanding and experience in managerial decision-making through prior training. (Self-assessed levels of experience were recorded prior to the experiment as part of the demographic data to be collected and the selection criteria of MBA programmes presupposes a certain level of business knowledge and experience). A concerted effort was made to select MBA students who had completed the compulsory papers, but random allocation to all 12 treatment cells negated the need to be overly concerned with the participants’ prior level of knowledge. In addition, prior knowledge was captured as two measured antecedents (i.e. educ and man_exp) and was thus given full consideration during the analysis and interpretation of the findings.

Random Allocation to Cells

The following procedures were followed. Encoded sealed envelopes containing the instructions, in-basket simulations, decision aids, and simulation props such as buttons, badges and sashes were placed on round tables with four sets per table. As students entered the laboratory, they self-selected which table to sit at. At this point of the experiment there were no visible signs as to which treatment participants would be exposed to. Students participating on an individual basis, although seated in groups of four, worked on their own with no interaction with the other students at the table. Students whose self-selection allotted them to the group treatment all received the same treatment at the same table (one group). In cases where participants were exposed to the GBS treatment, each participant received a unique briefing document and set of props over and above the general instructions, in-basket simulations and decision aids. Each pack in every envelope for both groups and individuals was encoded with a unique identifier code to ensure that the data capturer and data analyst could accurately determine which configuration of conditions the participants were exposed to. At no point were the codes disclosed to or discussed with any of the participants. Codes remained secret and hidden throughout the experiment and only the data capturer linked case codes with the unique code of each participant.

To assist with generalisability and comparative groups, subjects were randomly allocated to one of 12 different cells. In line with fsQCA, the four dichotomies (i.e. groups • ~ groups; competence training • ~ competency training; DA • ~ DA; GBS cases • ~ GBS cases) presented 81 groupings or initial configurations.

Fit Validity

An important test for the validity of a research instrument or theoretical model is “fit validity or performance validity” (Wright, 1999). This study of causal complexities as they relate to decision competence and decision confidence relies on QCA modelling, which is based on set theoretic relations and subset relations. Two quantitative measures to assess the level of correspondence between the theoretically assigned conditions, and the anticipated outcomes, as posited by Ragin (2006a, b, c), are consistency and coverage. These metrics rate the “goodness of fit”.

Cases are precisely assessed by their degree of consistency with the subset relation. This allows the researcher to “establish and assess individual case’s degree of consistency with the outcome” (Ragin, 2009, p. 120). The following formula determines the degree of consistency (Ragin, 2008c, p. 99): Consistency (Xi ≤ Yi) = ∑[min (Xi, Yi)]/∑(Xi), where Xi is the degree of membership in set X; Yi is the degree of membership in outcome set Y; (Xi ≤ Yi) is the subset relation under consideration and indicates the lower of the two values. If all the values of condition Xi are equal or less than the corresponding values of the outcome Yi, the consistency is 1, signifying full consistency. A further measure of consistency comes from the work of Rihoux and De Meur (2009):

$$ \begin{array}{ll}\hfill & \mathrm{Consistency}=\\ {}& \frac{\mathrm{Number}\;\mathrm{o}\mathrm{f}\;\mathrm{cases}\;\mathrm{f}\mathrm{o}\mathrm{r}\;\mathrm{which}\;\mathrm{both}\;\mathrm{a}\;\mathrm{given}\;\mathrm{condition}\;\mathrm{a}\mathrm{nd}\;\mathrm{o}\mathrm{utcome}\;\mathrm{a}\mathrm{r}\mathrm{e}\;\mathrm{present}}{\mathrm{Number}\;\mathrm{o}\mathrm{f}\;\mathrm{cases}\;\mathrm{f}\mathrm{o}\mathrm{r}\;\mathrm{which}\;\mathrm{o}\mathrm{nly}\;\mathrm{the}\;\mathrm{o}\mathrm{utcome}\;\mathrm{is}\;\mathrm{present}}\hfill \end{array} $$

Ragin (2004, 2006c) suggests that substantive grounds are limited for observed consistency scores below 0.70. Values for consistency should ideally be at least 0.75 (Ragin, 2006c; Wagemann & Schneider, 2007) to indicate useful models (also called paths or solutions). In contrast, coverage is a gauge of the empirical relevance or importance of configurations of conditions (Ragin, 2006c, p. 301; Woodside & Zhang, 2012) and is expressed as:

$$ \begin{array}{l}\mathrm{Coverage}\left({\mathrm{X}}_{\mathrm{i}}\le {\mathrm{Y}}_{\mathrm{i}}\right)=\sum \left( \min \left({\mathrm{X}}_{\mathrm{i}}{\mathrm{Y}}_{\mathrm{i}}\right)\right)/\sum \left({\mathrm{Y}}_{\mathrm{i}}\right)\;\mathrm{OR}\\ {}\mathrm{Coverage}=\frac{\mathrm{For}\;\mathrm{a}\;\mathrm{given}\;\mathrm{outcome},\;\mathrm{number}\;\mathrm{of}\;\mathrm{cases}\;\mathrm{containing}\;\mathrm{a}\;\mathrm{given}\;\mathrm{solution}\;\mathrm{term}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{cases}\;\mathrm{with}\;\mathrm{the}\;\mathrm{given}\;\mathrm{outcome}}\end{array} $$

When coverage is too small (below 0.2) then there are numerous ways to achieve the outcome and the studied configuration of conditions does not do a useful (“good”) job of explaining the link between high membership of the configuration of conditions (Xi) and high membership of the outcome (high Yi) (Ragin, 2006c).

A “good fit” in QCA is indicated by the coverage and consistency of the multiple configuration models. Only models that are useful—those where high configuration set membership is associated with high outcome membership, where the consistency is above 0.70, and the coverage scores are between 0.2 and 0.6—are useful and thus covered in the findings of this research. Thus, fit validity can be accurately assessed and achieved. In some cases the fit may be limited and the models thus only marginally useful. Coverage metrics indicate the relative explanatory strength of each configural model (Wagemann & Schneider, 2007) and are thus useful to compare the relative explanatory ability of paths or models. Woodside, Hsu, and Marshall (2010, p. 794) note that “fsQCA coverage values are analogous to effect size estimates in statistical hypothesis testing.” Coverage and consistency for each configuration of conditions and suggested predictive model is assessed and recorded in Chaps. 5, 6, and 7. Woodside (2010, 2013) prompts marketing scholars not to consider fit validity in isolation; it needs to be considered alongside the predictive validity of tested models, and this is covered in the next section.

4.3.2 External Validity: Equifinality and Predictive Validity

A basic goal of scientific study is to provide credible, reliable and generalisable theoretical explanations for real-life behavior. In contrast to internal validity, external validity is the extent to which the treatment effect is generalizable across populations or transferred to other populations and other contexts beyond the specific research settings, treatment variables and measurement instruments (Burns & Burns, 2008). Research studies list several threats to external validity: selection biases and its interaction with treatment effects; the effect of pretesting on participants’ reaction; reactive effect of experimental procedures; and multiple-treatment interference (For a thorough discussion and examples of threats to internal and external validity, see Campbell, 1957; Campbell & Stanley, 1966).

In the early literature on simulations, external validity was related to realism (Kibbee, 1961). Later the concept of verisimilitude (the perception of reality by evaluators and participants) was heralded as more important. But since verisimilitude will differ for each unique participant, and in order to move away from the perceptions of individuals, researchers looked for a more testable hypothesis of external validity. Some authors offer suggestions and prescriptions for designing and implementing valid simulation research. Cannon and Burns (1999) propose linking career success or performance measurement to the simulation experience. The key question for external educational validation according to these authors is: “how well does the educational process actually work in teaching real-world skills?” (p. 43). Wolfe (1976, p. 412) refers to external validity as the transferability of “academic insights into useful and effective real-world orientations, perceptions and business career practices”.

Gosen and Washbush (2004, p. 273) term the ability to generalise the learning effects to students’ careers as “transfer-internalization validity”. However, Norris (1986) argues that career success is individual-based and that the success measures in the simulation and in real business will be differentially affected. Using career success as validation is further compromised by the variables associated with career success such as personal motivation, career opportunities, praise, job satisfaction, and other subjective criteria identified by Wolfe and Roberts (1986). The authors highlight the difficulty in testing for significant variations in success when these subjective criteria are employed. According to Wolfe and Roberts (1986) salary increases and promotions—although complicated by inter- and intra-company transfers, organisational differences, external economic and political factors, confidentiality of information, and other industry factors—are considerably better indices. The validity of the research is further complicated by the need to rely on self-reports, with the concomitant risk of bias.

Feinstein and Cannon (2002) return to the importance of the perception of reality—verisimilitude, believability and plausibility. Although these terms do not directly represent scientific validity, but only the perception of it, they “tend to increase the level of external validity” (p. 437) as indicators of motivation and insight, which are directly related to both internal validity and stimulating students to learn. This in turn increases productive learning of managerial and decision-making skills and therefore increases external validity.

This study employs fsQCA using Boolean algebra as its research method and analysis technique. Techniques in fsQCA deal with cases in a configurational, comparative way, where the integrity of each case is retained and cases are considered a complex combination of properties. QCA conveys a particular conception of causality using Boolean algebra as well as visual tools in the form of Venn diagrams for a “dialogue between the theory and the data” (Ragin, 1987) in order to understand and interpret results. “Multiple conjunctural causation rejects any form of permanent causality and stresses equifinality (many paths can lead to the same outcome AB → Y; AB + CD → Y)” (Rihoux & Ragin, 2009, p. 8). FsQCA recognises asymmetrical relationships, where low values for X associate with low and high values for Y (Woodside, 2011a, b). In addition, this study considers a combination of antecedents and causal conditions, where no one factor is likely to be sufficient for the ideal outcome. For fsQCA as a set of techniques, “modest generalization” can be achieved but “permanent causality is not assumed” (Rihoux, 2006, p. 9). In order for the models resulting from QCA to be valid, they need to be able to go beyond description and predict additional cases and achieve modest generalisation (Armstrong, 1991; Berg-Schlosser & De Meur, 2009; McClelland, 1998). As tools of scientific inquiry, theory and constructs are deemed adequate when they can be used to make observable predictions of untested cases or events.

McClelland’s (1998) advice to researchers is to consider the critical question: Does a model predict an outcome or dependent variable in additional samples—samples not included in the original data sets used to test the theory or models? In other words, does a model have “predictive validity”. Gigerenzer and Brighton’s (2009b) study finds multiple regression analysis (MRA) models to be of extremely good fit, but these models perform relatively poorly when predictive validity is considered. In other words, when models resulting from MRA and traditional methods are tested for accuracy on a separate set of data not analysed as part of the original data, the models generally perform less well. The dominant practice in management and marketing literature is to present only best-fit models “but doing so is bad practice” (Woodside, 2010, p. 9). “Testing for predictive validity with hold out samples is always possible and doing so substantially increases the added value for both empirical positivistic and interpretative case studies” (p. 9). Although Ragin (2008c) does not consider predictive validity, it is considered critical by Armstrong (1991) and Gigerenzer and Gaissmaier (2011). This study recognises the importance of predictive validity but due to its exploratory nature, it includes only fit validity. That is, this study is the first application the researcher is aware of applying Boolean algebra to a laboratory experiment testing various ways to achieve high decision competence and high decision confidence.

4.4 Constructing Conjunctive Recipes

Now that the fsQCA method and the logical procedures have been outlined, closer links between the research propositions (as set out in Sect. 3.2.1) and the possible models are set out in Table 4.15. Refer to Sect. 4.1 for interpretation of the Boolean algorithms.

Table 4.15 Propositions and related configural causation models

4.5 Ethical Considerations

4.5.1 Principles of Partnership, Participation and Protection

According to Cohen, Manion, and Morrison (2000), it is critical to protect the identity of all participants. To achieve this principle of anonymity, certain protocols were followed throughout the research process.

All prospects’ and participants’ rights were respected by adhering to four key principles: competence, voluntarism, comprehension, and full information (Cohen & Manion, 1994). To adhere to the principle of competence, information was provided to assist participants in making informed decisions during all stages (before, during and after) committing to participate (see the advertisement, information sheets and final step sheet in Appendices B and C). Students who agreed to participate, completed AUT Ethics Committee-approved consent forms. Participants’ privacy and confidentiality was and will be kept secure and will not be made available to any third party.

4.6 Summary

The rest of the study book has the following structure. Chapter 5 presents the analysis of the data and configural models for overall decision competence and decision confidence. Chapter 6 presents the QCA procedures, data analysis and interpretation of the findings for the four separate in-basket simulations. Chapter 7 then investigates decision incompetence and doubt, and Chap. 8 covers implications for practitioners and scholars, limitations of this study, and suggestions for future research.