An attempt to conduct an experiment with 330 US-American MTurks failed due to server memory capacity with the Amazon AWS “t2.micro 1 GiB”. After upgrading the server to 32 GiB of working memory with the Amazon AWS “t2.2xlarge 32 GiB”, an experiment with 180 US-American MTurks was conducted, from which data of 87 participants was used. As estimated, more than 50 % of participant data was lost due to connection errors, incorrect raw data, participants leaving the experiment or participants playing in a game group with one or more bots. CPU capacity reached 55 % during the experiment, and it is not advised to try larger numbers of participants with mentioned settings.

29 female and 58 male participants aged 33.16 years on average were analyzed. From 87 participants 9 reported having conducted the experiment before, 63 reported not having conducted the experiment before, while 15 participants did not answer to this question. By comparing MTurk ID tables all 9 participants, who reported having conducted the experiment before, were part of the 330-participants experiment, which crashed before the ToE stages were reached. Therefore, all participants were included. Participants from the example experiment mentioned before were not included.

Chapter 6 will analyze all hypotheses in according sub-chapters, beginning with testing variables for parametric or nonparametric distribution.

6.1 Testing For Nonparametric Distribution

Variables were tested for nonparametric distribution using the One-Sample Kolmogorov-Smirnov Test. Each null hypothesis stating that the variable was distributed normally was rejected for 11 variables with high significance, being listed in figure 6.1. Distribution of individual expertise occurring with equal probability was rejected at the 0.01 level of significance. The list includes variables for individual expertise, routine consistency, all logic proportions from the well-defined stages, all logic proportions from the ill-defined stages, all expected states from the ill-defined stages and the logic marker index. Using Shapiro-Wilk testing, all 12 null hypotheses stating normal/parametric for the same variable distributions were rejected with very high significance (p = 0.000). For this reason, distributions are considered being nonparametric, and therefore, with exception of Hypotheses 2, nonparametric analyses are used.

Figure 6.1
figure 1

Source own source

Test results for nonparametric distribution of variables.

6.2 Expertise Rank and Logic Proportion

Hypothesis 1: The higher the individual expertise rank, the higher the logic proportion “ToH total” is.

Individual expertise rank was categorized either being „low“, „medium“ or „high“. Agents who failed more than one ToH game due to the timer running out were always part of the “low” expertise rank. Agents who completed 4 or more ToH games in 7 steps were part of the “high” expertise category. Agents who completed two or three ToH games in 7 steps were part of the “medium” category. Agents who completed one or no ToH game in 7 steps were part of the “low” category.

33 agents were part of the „low“ expertise group, 16 agens were part of the „medium“ expertise group and 38 agents were part of the „high“ expertise group.

„ToH total“ is the proportion of ideal routine strategy steps used in all ToH games, with exception of the first game. Spearman’s rho showed a correlation significant at the 0.01 level (2 tailed), as shown in figure 6.2.

Figure 6.2
figure 2

Source own source

Correlation results of expertise and ideal routine strategy in “well-defined” stages.

Agents with low ToH expertise (0) had a mean index of 0.4463 ToH total (std. error 0.0338, std. deviation 0.1943). Agents with medium expertise (1) had a mean index of 0.7231 ToH total (std. error 0.03646, std. deviation 0.1458). Agents with high expertise (2) had a mean index of 0.9080 ToH total (std. error 0.0172, std. deviation 0.1063). Figure 6.3 shows specifics as a box-plot diagram.

Kruskal-Wallis H shows group differences in ToH total index by ToH expertise to be highly significant (H(33, 16, 38), H = 60.604, p = 0.000).

Hypothesis 1 is therefore confirmed. Differences in routine logic deviation correlate significantly with the ToH total index and differences are significant. Means of high expertise participants and low/medium expertise participants vary significantly in terms of logic proportion “ToH_total”.

Figure 6.3
figure 3

Source own source

Boxplot results of expertise levels and logic proportion during „well-defined“ stages.

6.3 Environmental Change and Human Error

Hypothesis 2: Change of goal rod during ToH and ToE games in the 4th level leads to the first actions in the same level deviating from the ideal path.

In order to confirm or not confirm this hypothesis, all first actions of all six ToH games were analyzed, whether or not this first move was an “ideal” move by F-L. This analysis excludes NB-L, as not a single ToE game was started by any of the 87 participants via an ideal NB-L move. The hypothesis was not analyzed for ToE games as too many factors influenced individual behavior aside from the goal rod change, making a statistical analysis questionable. The hypothesis was then modified to:

Hypothesis 2: Change of goal rod during ToH games in the 4th level leads to the first actions in the same level deviating from the ideal path.

As shown in table 6.1 (own source), not ideal first moves from ToH games one to three sunk from 45,98 % (n = 87) to 30,59 % (n = 85). With the introduction of goal rod change in ToH game 4, the not ideal first move proportion had risen to 51,16 % (n = 86), even being higher than the initial “mistake” proportion.

Mean average proportion of not ideal first moves of 0.4180 (std. deviation 0.0685, std. error 0.0278) differs significantly from 0.5116 (51.16 %) with p = 0.020. Mean average proportion of not ideal first moves do not significantly differ from the second highest value 0.4598 (45.98 %) with p = 0.195.

Modified Hypothesis 2 is therefore confirmed. Mistake rates on the first action in game 4, where the goal rod was changed, differed significantly from mean average mistake proportion.

Table 6.1 Impact of “macrostructure shift” on decision-making performance. Source own source

6.4 Information Conditions and Logic Deviation

Hypothesis 3: Participants in the N-IC condition show the highest logic proportions in ToE levels one to three, expressed by “ToE parts 1”, followed by proportions of D-IC participants, then C-IC, G-IC and R-IC.

Logic proportion is an index representing the proportion of actions being routine logic actions. The lower the index is, the higher the agent deviated from its routine strategy. The index „ToE parts 1“ refers to the first three ToE games, which could be solved in 7 steps by sticking to the framed logic. The anticipated order by hypothesis 3 was: N-IC > D-IC > C-IC > G-IC > R-IC.

18 agents were part of the N-IC condition (6 groups), 24 agents were part of the G-IC condition (8 groups), 15 agents were part of the D-IC condition, 15 agents were part of the R-IC condition (5 groups) and 15 agents were part of the C-IC condition (5 groups). This was true for all hypotheses.

Mean average ToE parts 1 index of the N-IC was 0.7113 (std. error 0.6772, std. deviation 0.2873), with a range of 0.8. Mean average ToE parts 1 index of the G-IC was 0.7596 (std. error 0.0580, std. deviation 0.2841), with a range of 0.75. Mean average ToE parts 1 index of the D-IC was 0.6429 (std. error 0.0689, std. deviation 0.2666), with a range of 0.8. Mean average ToE parts 1 index of the R-IC was 0.9179 (std. error 0.0508, std. deviation 0.1966), with a range of 0.36. Mean average ToE parts 1 index of the C-IC was 0.7685 (std. error 0.0508, std. deviation 0.1966), with a range of 0.3636. Figure 6.4 shows the box-plot data.

Figure 6.4
figure 4

Source own source

Boxplot results of logic proportion during “metastable” conditions over all information conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC.

Kruskal-Wallis H shows significant differences between conditions regarding the ToE parts 1 index, with (H(18, 24, 15, 15, 15), H = 10.119, p = 0.038).

Hypothesis 3 cannot be confirmed. The observed order of ToE parts 1 by information condition is R-IC > C-IC > G-IC > N-IC > D-IC, while the conditions‘ differences by this index were measured to be significant. The „routine information condition“ shows the lowest routine logic deviation, while the „dissolution information condition“ shows the highest routine logic deviation during the first three ToE games.

6.5 Complete Logic Proportions Over Information Conditions

Hypothesis 4: Participants in the N-IC condition show the highest total ToE logic proportion values, followed by proportions of D-IC participants, then C-IC, G-IC and R-IC.

The index „ToE parts total“ refers to the all six ToE games. The anticipated order by hypothesis 4 was: N-IC > D-IC > C-IC > G-IC > R-IC.

Mean average ToE total index of the N-IC was 0.7000 (std. error 0.0551, std. deviation 0.2339), with a range of 0.7218. Mean average ToE total index of the G-IC was 0.7409 (std. error 0.0580, std. deviation 0.2841), with a range of 0.6923. Mean average ToE total index of the D-IC was 0.7148 (std. error 0.0611, std. deviation 0.2366), with a range of 0.6768. Mean average ToE total index of the R-IC was 0.7970 (std. error 0.0475, std. deviation 0.1839), with a range of 0.5584 Mean average ToE total index of the C-IC was 0.7609 (std. error 0.0546, std. deviation 0.2114), with a range of 0.6205. Figure 6.5 shows the box-plot data.

Kruskal-Wallis H shows no significant differences between conditions regarding the ToE total index, with (H(18, 24, 15, 15, 15), H = 2,408, p = 0.661).

Hypothesis 4 cannot be confirmed. The observed order of ToE total by information condition is R-IC > C-IC > G-IC > N-IC > D-IC, while the conditions‘ differences by this index were not significant. The „routine information condition“ shows the lowest routine logic deviation, while the „dissolution information condition“ shows the highest routine logic deviation during the first three ToE games. However, the differences by this index were not significant.

Figure 6.5
figure 5

Source own source

Boxplot results of logic proportion during “ill-defined” conditions over all information conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC.

6.6 Expected States and Logic Proportion

Hypothesis 5: The higher expected states proportion values with respect to routine strategy during all ToE conditions, the higher logic proportion values are.

Expected states proportion is an index referring to the proportion of actions that were followed by the expected outcome, with respect to the actions’ routine logic. The higher the expected state proportion the lower the expected state deviation. The lower the expected state proportion the higher the expected state deviation. Hypothesis 5 therefore assumed low expected states proportion to correlate with low logic proportion values, and high expected state proportion to correlate with high logic proportion values.

Just like logic proportion indexes there exist three expected states proportion indexes: “ToE_X_tot” refers to the expected states in all six ToE games. “ToE_X_parts1” refers to the expected states in the first three ToE games. “ToE_X_parts2” refers to the last three ToE games. All three expected states indexes were compared to all three logic proportion indexes, being ToE total, ToE parts 1 and ToE parts 2.

Spearman’s rho correlation was significant at the 0.001 level (2-tailed) between all expected states and logic proportion indexes. Figure 6.6 sums up all mentioned data.

Figure 6.6
figure 6

Source own source

Correlation results between expected states and logic proportion.

Hypothesis 5 was confirmed; expected states correlations with logic proportion were found to be highly significant.

6.7 Expected States and Logic Marker Proportion

Hypothesis 6: The higher expected states proportion values with respect to routine strategy during all ToE conditions, the lower the logic marker proportion values are.

Logic marker is an index representing the proportion of ToE actions of an agent which were not “captured” by any logic index. From the perspective of this thesis’ model, such actions can be regarded as “random”. It was expected that the agents who experience many actions to be followed by their expected outcome, would stick to some logic being framed by the model. In other words, it was expected that agents who experience seemingly “random” outcomes would also behave randomly. The higher the logic marker index is, the more “random” the agents behaved. The lower the logic marker index, the more this thesis’ model can make sense of its behavior. Therefore, high expected states proportion was anticipated to lead to low logic marker values and therefore “less random behavior from the model’s perspective” (Figure 6.7).

Figure 6.7
figure 7

Source own source

Correlation results between expected states and logic marker.

ToE_X_tot correlation with logic marker values was significant at the 0.01 level (2-tailed). ToE_X_parts1 correlation with the logic marker values was significant at the 0.05 level (2-tailed). ToE_X_parts2 correlation with the logic marker values was significant at the 0.01 level (2-tailed). Figure 6.7 sums up the results.

Hypothesis 6 was confirmed. All expected states indexes correlations with the logic marker index were either significant (p = 0.024) or highly significant (p = 0.000).

6.8 Complete Expected States Over Information Conditions

Hypothesis 7: Expected states proportion values during ToE parts 1, ToE parts 2 and ToE total in R-IC are the highest, followed by G-IC, C-IC, D-IC and lastly N-IC.

The anticipated order of expected states proportion values was: R-IC < G-IC < C-IC < D-IC < N-IC.

Mean average ToE_X_total index of the N-IC was 0.4435 (std. error 0.0573, std. deviation 0.2431), with a range of 0.7358. Mean average ToE_X_total index of the G-IC was 0.5322 (std. error 0.0490, std. deviation 0.2401), with a range of 0.7407. Mean average ToE_X_total index of the D-IC was 0.5322 (std. error 0.0490, std. deviation 0.2401), with a range of 0.7407. Mean average ToE_X_total index of the R-IC was 0.5171 (std. error 0.0486, std. deviation 0.1882), with a range of 0.68 Mean average ToE_X_total index of the C-IC was 0.4076 (std. error 0.0620, std. deviation 0.2401), with a range of 0.6552. Figure 6.8 shows the box-plot data.

Differences by ToE_X_total in all five conditions were not significant according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 4.766, p = 0.312). Nevertheless, the observed order by this index was G-IC > R-IC > D-IC > N-IC > C-IC.

Figure 6.8
figure 8

Source own source

Boxplot results of expected states during “ill-defined” stages over information conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC.

Mean average ToE_X_parts1 index of the N-IC was 0.5374 (std. error 0.0672, std. deviation 0.2853), with a range of 0.8571. Mean average ToE_X_parts1 index of the G-IC was 0.6620 (std. error 0.0537, std. deviation 0.2633), with a range of 0.8667. Mean average ToE_X_parts1 index of the D-IC was 0.5730 (std. error 0.0536, std. deviation 0.2075), with a range of 0.5826. Mean average ToE_X_parts1 index of the R-IC was 0.6983 (std. error 0.0464, std. deviation 0.1797), with a range of 0.6750 Mean average ToE_X_parts1 index of the C-IC was 0.5119 (std. error 0.0666, std. deviation 0.2579), with a range of 0.6971.

Differences by ToE_X_parts1 in all five conditions were found to be significant at the 0.1 level according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 8.944, p = 0.063). The observed order by this index was R-IC > G-IC > D-IC > N-IC > C-IC.

Mean average ToE_X_parts2 index of the N-IC was 0.3920 (std. error 0.0525, std. deviation 0.2227), with a range of 0.6774. Mean average ToE_X_parts2 index of the G-IC was 0.4667 (std. error 0.0455, std. deviation 0.2228), with a range of 0.7. Mean average ToE_X_parts2 index of the D-IC was 0.4655 (std. error 0.0537, std. deviation 0.2078), with a range of 0.7. Mean average ToE_X_parts2 index of the R-IC was 0.4210 (std. error 0.0566, std. deviation 0.2192), with a range of 0.7 Mean average ToE_X_parts2 index of the C-IC was 0.3585 (std. error 0.0653, std. deviation 0.2528), with a range of 0.6389.

Differences by ToE_X_parts2 in all five conditions were found to be not significant according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 3,874, p = 0.423). The observed order by this index was G-IC > D-IC > R-IC > N-IC > C-IC.

Hypothesis 7 was not confirmed. Observed order by expected state proportion differed between ToE_X_total, ToE_X_parts1 and ToE_X_parts 2, while only ToE_X_parts1 differed between conditions with low significance (p = 0.063).

6.9 Routine Consistency

Hypothesis 8: Routine consistency index is the lowest in R-IC, followed by G-IC, C-IC, D-IC and N-IC.

The routine consistency is the proportion of all actions during the ill-defined stages falling into the routine logic category (either F-L or NB-L), where dir/nodir/ideal or not distinguished. Actions that do not fall into any category are added to the total amount of actions. The higher the routine consistency, the more actions by an agent fall into the routine strategy category. The lower the routine consistency the higher an agent’s routine volatility. Since it was anticipated that agents would switch their strategy in the R-IC the most, this condition was anticipated to show the lowest routine consistency. The anticipated routine consistency order was N-IC > D-IC > C-IC > G-IC > R-IC.

Mean average routine consistency of the N-IC was 0.6511 (std. error 0.0491, std. deviation 0.2081), with a range of 0.72. Mean average routine consistency of the G-IC was 0.7250 (std. error 0.0490, std. deviation 0.2400), with a range of 0.69. Mean average routine consistency of the D-IC was 0.7140 (std. error 0.0615, std. deviation 0.2382), with a range of 0.68. Mean average routine consistency of the R-IC was 0.7853 (std. error 0.0533, std. deviation 0.2066), with a range of 0.63 Mean average routine consistency of the C-IC was 0.7607 (std. error 0.0544, std. deviation 0.2108), with a range of 0.62.

Differences by routine consistency in all five conditions were found to be not significant according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 5.018, p = 0.285). The observed order by this index was R-IC > C-IC > G-IC > D-IC > N-IC.

Hypothesis 8 was not confirmed. The routine consistency did not differ significantly over all information conditions, and the observed order by routine consistency differed from what was anticipated.

6.10 Fundamental Strategy and Group Performance

Hypothesis 9: The lower the fundamental index the lower game group performance.

The fundamental index shows the proportion of group decisions, where all agents agreed upon, which disk to move. The lower the proportion, the higher the number of steps were expected to, represented by the variable “performance_toe”. Again, “performance_toe” is the number of steps saved by a group solving all ToE games. However, if a game group failed to solve a ToE stage in time (3 minutes), the number of steps saved does not represent the number of steps required to solve a ToE stage.

If this was ignored, Spearman’s rho showed the correlation between the fundamental index and the number of steps saved for a group attempting to solve all ToE games to be significant at the 0.01 level (2-tailed), with p = 0.002. Therefore, the lower the fundamental index was, the higher the variable “performance_toe”.

However, the number of steps required to solve all ToE games is not represented by “performance_toe”. For this reason, the “solved” variable was included, which marks group games, which were solved. However, the variable “solved” was unreliable, marking game group games which were not solved by action, but by failing to solve them in time.

Therefore, hypothesis 9 was not confirmed. The lower the proportion of group actions, where all agents agreed upon which disk to move, the more steps it took to solve all ToE games, however, the number of steps required did not represent group performance.

6.11 Group Expertise and Logic Proportions

Hypothesis 10: Lower inter-condition group expertise rankings lead to lower logic deviations proportions.

Group expertise is calculated by individual expertise levels of one game group (see table 5.4). It was assumed that group expertise correlates with group behavior and therefore impacts logic deviation. When information conditions are disregarded, group expertise seems to highly correlate positively with the proportion of routine strategy actions over all information conditions (N-IC, G-IC, D-IC, R-IC, C-IC) and all ill-defined system states (metastable, instable). The higher the deviation proportion index, the less an agent deviated from its routine from the well-defined stages. Group expertise correlated significantly and positively at the 0.01 level with the deviation proportion index of all ill-defined stages (ToE tot, p = 0.001), with metastable ill-defined stages (ToE 1, p = 0.000) and correlated significantly and positively at the 0.05 level with the deviation proportion index of instable ill-defined stages (ToE 2, p = 0.028). To avoid confusion it should be noted again that this means that this analysis, on first sight, can be interpreted as: the higher the group expertise, the less the group deviates from its routine strategy, which was learned during the well-defined stages.

However, these results were considering 87 individuals that are surrounded by the according group expertise. It is debatable whether or not these results are valid, as group expertise has to be considered to be the result of an entire group, which is facing different information conditions. Therefore, the following analysis is more precise, considering groups as a whole and the according information conditions.

In the N-IC condition, which held 18 participants amongst 6 game-groups, 9 agents were part of a game-group with a group expertise of “3”. Three agents were part of a game-group with a group expertise of 5, of 7 and of 9 respectively. Kruskal-Wallis H (7.066) showed the difference of ToE total indexes amongst the game group expertise in N-IC to be of low significance, with p = 0.070. Spearman’s rho measured the correlation between N-IC group expertise surrounding an agent, and the agent’s ToE total index to be significant at the 0.05 level (2-tailed), with p = 0.015.

In the G-IC condition, which held 24 participants amongst 8 game-groups, 6 agents were part of a game-group with group expertise of “1” and “9”. Three agents were part of a game-group of group expertise “2”, of “3”, of “8” and of “10” respectively. Kruskal-Wallis H (12.951) showed the difference of ToE total indexes amongst the game group expertise in G-IC to be significant, with p = 0.024. Spearman’s rho measured the correlation between G-IC group expertise surrounding an agent, and the agent’s ToE total index to be significant at the 0.01 level (2-tailed), with p = 0.009.

In the D-IC condition, which held 15 participants amongst 5 game-groups, three agents were part of a game-group with group expertise of “1”, of “2”, of “5”, of “7” and of “8” respectively. Kruskal-Wallis H (11.387) showed the difference of ToE total indexes amongst the game group expertise in D-IC to be significant, with p = 0.023. Spearman’s rho measured the correlation between D-IC group expertise surrounding an agent, and the agent’s ToE total index to be not significant, with p = 0.113.

In the R-IC condition, which held 15 participants amongst 5 game-groups, three agents were part of a game-group with group expertise of “2”, of “5”, and of “9”, respectively. 6 agents were part of a game-group with group expertise of “10” Kruskal-Wallis H (8.221) showed the difference of ToE total indexes amongst the game group expertise in R-IC to be significant, with p = 0.042. Spearman’s rho measured the correlation between R-IC group expertise surrounding an agent, and the agent’s ToE total index to be not significant, with p = 0.209.

In the C-IC, which held 15 participants amongst 5 game-groups, three agents were part of a game-group with group expertise of “8”. 6 agents were part of a game-group with group expertise of “3” and “9” respectively. Kruskal-Wallis H (2.663) showed the difference of ToE total indexes amongst the game group expertise in C-IC to not be significant, with p = 0.264. Spearman’s rho measured the correlation between C-IC group expertise surrounding an agent, and the agent’s ToE total index to be not significant, with p = 0.758.

Results for hypothesis were mixed, as N-IC and G-IC showed very significant relations between group expertise and logic deviation proportions, as well as solid differences regarding overall logic deviations. D-IC barely touched significance at the 0.1 level for correlation between group expertise and logic deviations, but has shown highly significant difference regarding overall logic deviation. R-IC and C-IC results showed no significant correlation between group expertise and logic deviation, but groups in R-IC differed significantly regarding overall logic deviation. The latter supports the hypothesis and shows the high context dependency, which is regarded as natural, due to the high complexity of this analysis.

Hypothesis 10 cannot be clearly confirmed considering all details and can only be confirmed partially. However, results are regarded as promising enough that the correlation between group expertise and logic deviation can be drawn. After thorough consideration hypothesis 10 is therefore confirmed, and will be discussed in more detail in chapter 7.

6.12 Gender Effects

While no significant differences regarding performance between female and male agents in NPS was measured (Chlupsa & Strunz, 2019; Strunz & Chlupsa, 2019), which even held true for all country-origins (Strunz, 2019), adaption efficiency to more effective strategies had shown gender effects in behavioral experiments (Casal et al., 2017).

Hypotheses that potentially relate to strategy adaption efficiency are analyzed for gender effects. It is hypothesized that no significant gender effects will be found at all, as NPS performance, free of gender effects, is regarded as most fundamental for all forms of strategy adaption.

All 87 participants consisted of self-reported 29 female and 58 male participants.

Boxplot figure 6.9 shows that no significant gender effect testing hypothesis 1 seems to be visible.

Strategy adaption efficiency during well-defined stages is implicitly expressed by ToH expertise. As agents who fail to adapt their strategy during the well-defined stages to the new goal rod position will have a lower chance of falling into the high or medium expertise category.

Spearman’s rho shows significant correlation at the 0.01 level between expertise and well-defined logic proportion (ToH total) for all 29 female participants. Spearman’s rho shows significant correlation at the 0.01 level between expertise and well-defined logic proportion for all 58 male participants. Therefore, no gender effect was found for hypothesis 1.

Figure 6.9
figure 9

Source own source

Boxplot graph showing no gender effect between expertise and well-defined logic proportion: 1 = female, 2 = male.

Analyzing hypothesis 2 for gender effects, not ideal first moves proportion by female participants during stage 1 was identical with not ideal first moves during the stage 4 (44,83 %), where this performance does not significantly differ from the mean (sum of rel. not ideal divided by 6) of overall not ideal first moves (41.38 %), with p = 0.174. The results are summarized in table 6.2.

Not ideal first moves by male participants during stage 4 reached their maximum (54,39 %), which differed from the mean from not ideal first moves (42.00 %) at the 0.05 level with p = 0.013. The results are summarized in table 6.3. Female participants outperformed male participants regarding strategy adaption with goal rod changes during well-defined stages. Not ideal first move proportions are marked bold at game stage 4, where the goal rod change takes place and the former strategy has to be adapted efficiently.

Table 6.2 Impact of “macrostructure shift” on female decision-making performance. Source own source

Whether or not a gender effect was found for hypothesis 2 is debatable, as sample sizes differ greatly and are limited in their statistical validity. For both sexes, a global or local maximum of not ideal first moves was reached during stage 4. However, numbers have shown that female participants outperformed male participants regarding adaption to a “sudden” goal rod change, which required immediate, effective and efficient change of strategy.

This results suggest that, contrary to the findings of Casal et al. (2017), there can be particular cases where female participants are more likely to adapt their strategy efficiently although this result must be considered cautiously since the small sample size of the female group in this experiment. Whether or not this observation was enough to be regarded as a gender effect required further analysis, perhaps by inclusion of reflection times and greater sample sizes.

Table 6.3 Impact of “macrostructure shift” on male decision-making performance. Source own source

Analyzing for gender effects in hypothesis 3, logic deviation proportion results for the metastable ill-defined stages are shown in boxplot figure 6.10.

Figure 6.10
figure 10

Source own source

Logic deviation during metastable ill-defined stages: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male.

While deviation does not directly translate to a more efficient strategy, meta-stable stages benefit from sticking with well-defined strategies, as the metastable stages can be experienced as “well-defined” levels. For female participants, Kruskal-Wallis H showed weak significant differences at the 0.1 level (p = 0.091) amongst information conditions.

Differences amongst the information conditions regarding logic deviation in the metastable stages were less significant amongst male participants (p = 0.156). Mann-Whitney U shows no significant differences between female and male deviation distances in metastable stages (p = 0.401).

As Mann-Whitney U shows no significant differences between female and male deviation distances amongst all ill-defined stages (p = 0.543), hypothesis 4 is not analyzed in further detail.

Regarding hypothesis 5, expecting a positive relationship between expected states proportions and logic deviation proportions, for both female and male participants, all expected states indices and all logic deviations indices correlated at the 0.01 significance level without exception. Figure 6.11 shows boxplot results of expected states proportion for all ill-defined stages.

Figure 6.11
figure 11

Source own source

Expected states proportion during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male.

Mann-Whitney U does not show significant differences regarding any expected state proportion (ToE X tot: p = 0.746, ToE X 1: p = 0,438, ToE X 2: p = 0,759).

Therefore, no significant gender effects were found for hypothesis 5. Regarding the logic marker analysis for hypothesis 6, Mann-Whitney U shows no significant difference regarding “strategy randomness” between sexes (p = 0.389). Boxplot figure 6.12 shows logic marker results for all information conditions.

Figure 6.12
figure 12

Source own source

Logic marker results during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male.

To avoid confusion, it should be noted that the higher the logic marker index results, the more random a participant behaved. Spearman’s rho results are as follows: For female participants, expected states index results of all ill-defined stages (ToE X tot) correlated at the 0.01 level with logic marker results; expected states index results of metastable ill-defined stages (ToE X 1) correlated at the 0.05 level (p = 0.023) with logic marker results; expected states index results of instable ill-defined stages (ToE X 2) correlated at the 0.01 level with logic marker results. Results for male participants were slightly different. For male participants, correlation between expected states indices and logic marker results were significant at the 0.05 level for all ill-defined stages (ToE X tot, p = 0.017) and for instable ill-defined stages (ToE X 2, p = 0.024), but failed to show significant correlation for metastable ill-defined stages in isolation (ToE X 1, p = 0.397).

Therefore, small differences between female and male participants regarding “randomness” in their behavior was found during the metastable ill-defined stages. It seems that random behavior during metastable ill-defined stages are less explainable by (supposedly) personal expectation amongst male than amongst female participants. However, since all “random” logic forms are not framed by the experiment’s model, influences stemming from other sources cannot be excluded and are, in fact, unknown. Thus, whether this was a true gender effect remains, at least, uncertain for hypothesis 6.

As described above, no significant differences between sexes regarding expected state proportion was found. Gender effects for hypothesis 7 are therefore disregarded.

As for hypothesis 8, routine consistency does not differ significantly between sexes according to Mann-Whitney U (p = 0.732). Boxplot figure 6.13 shows routine consistency (strategy volatility marker 1) of both female and male agents over all conditions.

Minor differences can be seen in the C-IC, however, whether or not this difference is related to gender cannot be clearly derived, especially as this information condition is the most complex with regards to public information content. In addition, the boxplot graphic does not differentiate between different ill-defined system states, being metastable and instable.

Thus, no significant gender effects were assumed for hypothesis 8.

Figure 6.13
figure 13

Source own source

Routine consistency results during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male.

For hypothesis 9, both fundamental index and game group performance were considered. However, game group performance cannot be analyzed, as raw data does not offer a reliable way to filter successfully solved stages. However, the fundamental index implicitly relates to the proportion of some group having used an effective strategy. From 29 game groups, 2 game groups were female only, 10 game groups were male only and 17 game groups were mixed with female and male participants. Female-only game group with game group ID 65 was part of the N-IC and female-only game group with ID 68 was part of the R-IC condition. While no correlation between information condition and results of fundamental index was found (Spearman’s rho of p = 0.429), female and male only groups are sorted by conditions first.

Results for female-only game group with ID 65 (N-IC) showed that 32 % of all game group actions collectively agreed upon, which disk to move.

Results for female-only game group with ID 68 (R-IC) showed that 95 % of all game group actions collectively agreed upon, which disk to move.

From the 10 male-only game groups, game group 15 and game group 35 were part of the N-IC conditions. Male-only game group 43 was part of the R-IC condition.

Results for male-only game groups (in N-IC) showed that 59 % (game group 15) and 85 % (game group 35) of all game group actions collectively agreed upon, which disk to move.

Results for male-only game group 43 (R-IC) showed that 92 % of all game group actions collectively agreed upon, which disk to move.

Kruskal-Wallis H showed no significant difference between mixed, female-only and male-only results regarding fundamental index (p = 0.602). Fundamental index average of mixed groups was 0.7506 (SD = 0.1950), average of female-only groups was 0.6350 (SD = 0.3451), average of male-only groups was 0.7810 (SD = 0.2048). Figure 6.14 shows boxplot results of fundamental indices.

Figure 6.14
figure 14

Source own source

Fundamental index results for mixed sexes (0), female-only (1) and male-only (2) game groups.

Therefore, no significant gender effect regarding hypothesis 9 was found. The final hypothesis 10 considers group expertise. Kruskal-Wallis H shows no significant differences between mixed, female-only and male-only groups regarding group expertise (p = 0.720). Figure 6.15 shows boxplot results for group expertise in mixed, female-only and male-only game groups.

Gender effects for hypothesis 10 regarding correlation between group expertise and logic deviations were tested for mixed-gender, female-only and male-only game groups. This analysis was done without considering different information conditions, as this was not considered to be relevant for gender effects analysis.

For mixed-gender groups Spearman’s rho correlation between group expertise and all ill-defined logic proportions (ToE tot) was significant at the 0.05 level (p = 0.011). For the two female-only groups Spearman’s rho showed significance at the 0.05 level (p = 0.017). For the ten male-only groups Spearman’s rho showed significance at the 0.05 level (p = 0.014). Therefore, gender effects are disregarded for hypothesis 10. A detailed discussion follows in chapter 7.

Figure 6.15
figure 15

Source own source

Group expertise results for mixed sexes (0), female-only (1) and male-only (2) game groups.