The experiment’s purpose was to create a decision-making domain related to a VUCA domain—where agents had to solve a complex problem—and to analyze how their behavior changed when provided different global information. Another major aspect of the experiment was to “train” the agents first in decision-making in isolation (routine-strategy), followed by randomly grouping them into a “game” of three agents afterwards. The agents were unable to communicate, did not receive information about the former actions of their co-agents, but were always able to collectively see the outcome of their shared control over the game. The following research questions are to be answered:

  1. 1)

    Does public information about environmental change (“You are sharing control with humans!”) favor change of routine-strategy, when such new environmental conditions do not influence the routine-strategy’s performance?

  2. 2)

    Does influence of environmental change (Middle rod is goal rod.) on routine-strategy’s performance favor change of routine-strategy?

  3. 3)

    Will deviation distance from routine-strategy depend on the type of public information, i.e. information about man-made uncertainty will lead to higher deviation from routine-strategy than from unspecified uncertainty (no further public information)?

  4. 4)

    Will public information about hidden rules favor overcoming parts of the routine-strategy?

  5. 5)

    Is group performance in the complex problem-solving game dependent on individual decision-making expertise in routine-strategy, when the routine-strategy statistically benefits the group’s performance in the game where no communication is possible?

5.1 Derivation of Hypotheses

The first research question of this thesis asks, whether information provided in G-IC, D-IC, R-IC and C-IC influences participants in changing their strategy, which they used to solve the ToH games during ToE games 8, 9, and 10. ToE games 8, 9, and 10 can be solved with certainty in seven steps, when all game group agents stick to the framed logic, and can be solved with high probability in seven steps, when all game group agents mostly stick to the framed logic. Therefore, sticking with the framed logic during the first three ToE games will solve the GDM problem under uncertainty efficiently. As experimental results showed that individual strategies are mostly only altered by environmental change, if such change had an influence on the strategy’s performance, it was assumed that when participants had proven “framed logic” routine and ToH expertise, participants of such a game group will unlikely change their strategy.

The participants' “routine” was derived from the proportion of either “Framed Logic” or “No Border Logic” level-solving moves or actions. An action solving a ToH game is by definition always either solved via a “Framed Logic” or “No Border Logic” action, and can never be both. When neither a “Framed Logic” nor “No Border Logic” action solved a level, it was because the timer ran out. If a player listed more “Framed Logic” (F-L) or more “No Border Logic” (NB-L) values at actions, which solved a ToH level, the according logic was assumed to be the routine strategy. When a participant listed an equal proportion of F-L and NB-L actions that solved ToH levels, the routine strategy was regarded as unclear and therefore reported as “mixed” (Mx-L).

ToH expertise of each individual was expressed by an index, and was the result of the participant performance measured during ToH games 2 to 7.

ToH expert knowledge levels or ToH expertise was measured by looking at different parameters collected from ToH levels 2 to 7:

  • How many ToH levels were solved?

  • Did participants solve at least one ToH game in 7 steps?

  • What was the least number of steps required in any ToH game?

  • How many ToH games were solved with 7 steps?

  • How many steps in total were required to solve the ToH games?

Ideally, if an agent solved all six ToH levels (excluding the first game) in 7 steps using F-L, this agent would have proven the highest amount of expertise, and would have shown the F-L to be the routine-strategy. If an agent solved all ToH levels in 7 steps using NB-L, this agent would also have proven the highest amount of expertise, and would have shown the NB-L to be the routine-strategy.

Expertise in F-L routine was expected to have the side-effect that game groups with high levels of F-L expert knowledge would solve the first three ToE levels with higher efficiency. The ToE rules were therefore not expected to affect strategy performance that stem from a framed strategy routine. ToE rules were however expected to influence strategy performance that stem from a no border strategy routine. Information about routine strategy and expertise levels was saved for each participant. Table 5.1 (own source) shows all data mentioned above by example, to express routine strategy used, and according expertise.

Table 5.1 Results of example experiment for explanation, part 1

In order to create expertise categories, a pretest with 30 participants, all being US-American MTurks, was conducted, using the identical setup as being used in the main experiment. Three participants had idled throughout the entire experiment, and were not considered. The remaining 27 US-American MTurks’ results regarding routine strategy were used, and according information about expertise is summarized in table 5.2 (own source).

Average values of strategy proportion regard 189 level-solving actions from 1623 actions in total. From 189 level-actions by 27 participants, only 4 NB-L actions solved a game, performed by three distinct players. By definition all 27 players were using F-L as their routine strategy.

17 out of 27 participants completed all ToH games in time, not failing a single game. 8 out of 27 participants failed at one ToH game, by not completing it in time. Two out of 27 participants failed at three ToH games, by not completing them in time.

24 out of 27 participants completed at least one ToH game in 7 steps. Three out of 27 participants failed to solve at least one ToH game in 7 steps, having required 8, 9, and 10 steps during their best games.

Table 5.2 Results of example experiment for explanation, part 2

Results regarding the amount of achieved 7-steps games are listed in the following table 5.3 (own source), 15 out of 27 participants achieved between three and 6 perfect 7-steps ToH games. Only one out of 27 participants managed to solve all ToH games with 7 actions.

When participants who failed to solve at least one ToH game are excluded, the remaining 17 participants required 53.53 actions in total to solve all 6 ToH games. It took the two participants who failed to compete at least one ToH game in 7 steps 65 and 72 steps to complete all stages.

Table 5.3 Results of example experiment for explanation, part 3

No participant who solved either 5 or 6 ToH games with 7 actions failed to solve a single ToH game. Only one participant who solved 4 ToH games with 7 actions failed to solve at least one ToH game. Two participants who solved three ToH games with 7 actions failed to solve at least one ToH game. Five participants who solved two ToH games with 7 actions failed to solve at least one ToH game. Two participants who solved either none or just one ToH game with 7 actions failed to solve at least one ToH game. No participant who required 60 or more steps in total to solve all ToH games managed to solve more than two games with just 7 actions.

From these results, three expertise categories are established using the number of ToH games solved in 7 steps and number of failed ToH games. The highest expert rank is assigned to participants, solving 4 or more ToH games with 7 actions, not having failed more than one ToH game. The medium expert rank is assigned to participants solving two or three ToH games with 7 actions, not having failed more than 1 ToH game. The low expert rank is assigned to participants solving none or one ToH game with 7 actions.

By this definition all 27 out of 27 participants were assigned the routine strategy “F-L”. 10 out of 27 participants from the pretest were assigned an expert rank of “high”, 10 out of 27 participants were assigned a “medium” expertise, and 7 out of 27 participants were assigned “low” expertise.

The 7 low expertise (L) participants collectively failed to solve 8 ToH games in total. The 10 medium expertise (M) participants collectively failed to solve 5 ToH games in total. The 10 high expertise (H) participants collectively failed to solve only one ToH game. From these 27 participants only 15 produced valuable data, as 12 participants were either part of a bot-agent game group, disconnected or were part of a game group with players who idled throughout the single player phase. From these five game groups, two game groups were in the G-IC, two were in the D-IC, and one group in the R-IC. The according expertise levels are listed in table 5.5 (own source).

From the small pretest alone, only 50 % of data could be used for analysis. Therefore, a rather large number of participants was expected to be required for the main experiment. It was estimated that for more than 180 game groups per condition, about 6000 human agents were required. Accordingly, 300 participants were expected to produce 10 game groups per condition. Even with 6000 human agents, analyzes would have still been limited by many factors, being discussed in chapter 5.

As participants will be assigned to a bot agent after 5 minutes of waiting time due to ethical reasons, a game group that contained a bot-agent and was part of any other information condition other than N-IC was considered as a “deception” condition. This is because participants of such game groups were informed about playing with “human agents”. Therefore, game groups having bot-agents in any information condition other than N-IC were considered “deceptive” and were excluded fully from data analysis. In order to enhance chances of filling game groups with human agents, the main experiment was divided into several parts, with each part collecting US-American MTurks at different day times.

Due to the pretest results group expertise levels were expected to be mixed; from five game groups, four showed distinct levels of group expertise. Ten different group expertise levels are possible, rated as “1” for “L, L, L” and “10” for “H, H, H”. Group expertise was expected to be normally distributed, confirming experimental studies that while repetition leads to better strategy use, each participant differs greatly in their individual ability to learn ToH rules (Janssen, De Mey, Egger, & Witteman, 2010).

The group expertise level is calculated by individual expertise levels. The order, by which the expertise group ratings are listed, favors “group quality over individual quality”. In other words, a group consisting of 2 L experts and 1 H expert ranks lower in group expertise than a group with 1 L expert and 2 M experts. Table 5.4 shows all possible group expertise rankings resulting from individual expertise.

Table 5.4 Group expertise rated as an integer in order from individual expertise levels. Source own source

Another order of preference that should be noted was L, H, H (8) over M, M, H (7). From a set theoretical viewpoint L, H was preferred over M, M. However, M, M, M (6) was preferred over L, M, H (5), where in this context M, M was preferred over L, H. Therefore, from a purely logical viewpoint, a contradiction exits. The reason why M, M, M was preferred over L, M, H is because of consistency of skill in this group, as one single participant, who behaved “less than wise” was able to derail an entire group strategy. This might seem to be a weak argument then for the preference of group expertise 8 over 7, however, to acquire expertise level H requires very high precision in ToH decision-making. A group with expertise level 8 consists of two highly skilled experts, rendering the possibility of “less than wise behavior” of one single participant less likely. Of course, the order of group expertise still is debatable, but thorough thought was certainly put into its creation.

Table 5.5 Results of example experiment for explanation, part 4

Coming back to the first research question, several variables were identified. Public information is either lacking in the N-IC or comes in four distinct forms in the G-IC, D-IC, R-IC or C-IC. Environmental conditions are all such circumstances that lie outside of the agent’s control. Interpretations are not regarded as being part of the environmental conditions, even when “wrong interpretations” are facilitated by environmental conditions, as explained by two examples: as participants of the N-IC are not informed about playing with other agents, it is expected that participants of the N-IC interpreted outcomes that deviated from their expectation stemming from “error”, such as software bugs, glitches, randomizing variables, wrong inputs, and not due to human influences; as participants of the G-IC are not informed about there being next to chance of obtaining the true hidden rules, it is expected that participants of the G-IC interpreted outcomes that deviated from their expectation stemming from “error”, such as bad expertise of co-agents, human mistakes or “bad cognitive skill” by co-agents.

The distinct information in each IC are considered public information and being part of the environmental conditions, however, their interpretations are considered as being in control of each agent. Therefore, “public information about environmental change” is part of environmental conditions, lying outside of the agent’s control, whereas their interpretation and ultimately their impact on the individual’s behavior is considered to be part of each agent’s control.

Change stemming from environmental conditions are considered as being interpreted either as environmental or social influences. Environmental influences were defined as all influences which are not “man-made”. Social influences were defined as all influences which are “man-made”. It was expected that environmental influence interpretations (EI-I) led to participants trying to maximize control over expected outcomes by sticking their routine strategy. It was expected that social influence interpretations (SI-I) led to participants trying to maximize control over expected outcomes by deviating from their routine strategy. The fluent transition from deviation distances stemming from EI-I and SI-I are explained by listing all information conditions.

In the N-IC participants were expected to interpret deviations from expected outcomes mostly stemming from environmental influences, as the N-IC participants were not informed about there being human co-agents.

In the G-IC participants were expected to interpret deviations from expected outcomes mostly stemming from social influences, as the G-IC participants were not informed implicitly that no agent was able to “outsmart” the hidden rules, other than by sticking to the regular single player rules.

In the D-IC participants were expected to interpret deviations from expected outcomes stemming less from social influences than in the G-IC, as the D-IC participants were implicitly informed that all agents were “still putting their trousers on one leg at a time” and that looking for “patterns” to “outsmart” the hidden ruleset was a waste of time. D-IC participants were expected to interpret deviations from expected outcomes stemming less from environmental influences than in the N-IC, as D-IC participants still knew that they had “some control” over the outcomes, and in fact they did.

The algorithm was written in such a way that each participant always had the chance of decisive impact on the group action’s outcome, and always had some impact on the group action’s outcome, while never having a chance of full control over the outcome, as order of inputs were decisive. Even if the entire algorithm was known, communication would be required in order to synchronize order of inputs with other co-agents, to obtain full control over the group action’s outcome. Although not entirely impossible, this thesis expects no game group to optimize control over game group outcomes. When the goal rod was the right rod, a game group could only “seemingly” optimize game group output. When the right rod was set to be the goal rod, a game group could solve ToE in 7 steps, with each individual agents sticking to the F-L, disregarding order of inputs. This was not the case when the goal rod was the center rod. When the center rod was the goal rod, by F-L the optimal move was “S1”, with “S1, S1, S1” resulting in the small disc. The only realistic way of doing so without communication was if a game group stuck to a certain “rhythm”, meaning that order of information was stable, and at least one participant provided an input outside of F-L at the right moment. It was expected that such a dynamic decision-making equilibrium would not be observed.

In the R-IC participants were expected to behave similar to G-IC participants, if R-IC participants (mostly) did still use directional buttons; should R-IC participants (mostly) refrain from using the directional buttons, then greater deviations than in the G-IC were expected. As the environmental condition “The directional buttons do not influence the game at all” will never influence any strategy performance, some participants in the R-IC were expected to still use the directional buttons due to routine strength. In other words, routine strength of pressing directional buttons was considered to dominate deviations from routine logic in some cases. Due to routine strength it was expected that participants who refrain from using the directional buttons, were still using them in some cases. R-IC participants were expected to deviate more from their routine strategy than N-IC, and more than D-IC, due to SI-I.

In the C-IC participants were expected to behave similar to D-IG participants when directional buttons (mostly) were used, and greater deviations were expected when directional buttons (mostly) were not used. C-IC participants were expected to deviate more from their routine strategy than N-IC participants, less than G-IC participants, and less than R-IC participants.

In order to formulate the according hypotheses, deviation distance from routine strategy has to be defined and expressed by an index in the following. For now all mentioned expected deviation distances (dd) in each condition are ordered as follows:

  • dd(N-IC) < dd(D-IC) <= dd(C-IC) < dd(G-IC) <= dd(R-IC).

Therefore, the greatest deviations from routine strategy were expected in the R-IC and the least deviations from routine strategy were expected in the N-IC conditions.

The greatest expected distance from routine strategy using the pretest was expected to be observed between D-IC and R-IC, as no N-IC data was available. In order to create the deviation distance from the routine strategy, several steps have to be taken. This is to be explained by example of the pretest again, using data of two game groups.

First the proportion of ToH routine logic actions to the total amount of ToH actions were measured in two ways, being “ToH total” and ToH parts”. A ToH total index of e.g. 0.6842 with ToH routine F-L means that ToH games’ actions from level 2 to 7 were in 68.42 % of the cases F-L actions. A ToH parts index of “1,0 / 0,5” means that ToH games’ actions from level 2 to 4 were in 100 % of the cases F-L actions and from level 5 to 7 were in 0.5 % of the cases F-L actions.

Since in ToH game 5 the goal rod changed from being the right rod to being the center rod, most players failed to solve ToH game 5 as efficiently as ToH game 4, as players would use their level 4 strategy to begin level 5 with actions that deviate from the ideal path. The position of the goal rod was considered being a change of environmental conditions which affects a participant’s former routine strategy. Therefore, F-L has sub-routine strategies regarding the position of the goal rod. This effect was also expected in ToE games, since in game 4 the goal rod changed from being the right rod to being the center rod.

The highest proportion of ToH logical actions was achieved by participant 7, who was ranked with high expertise. Lowest ToH logic proportion was achieved by participant 8, who was ranked with low expertise. It was expected that expertise rank and ToH total were to correlate, leading to the first two hypotheses. All hypotheses, dependent and independent variables are to be listed in the following sub-chapter.

5.2 Hypotheses and Variables

As expertise and logic proportions were expected to correlate, and goal rod change was expected to influence performance, the first two hypotheses are as follows:

Hypothesis 1: The higher the individual expertise rank, the higher the logic proportion “ToH total” is.

Hypothesis 2: Change of goal rod during ToH and ToE games in the 4th level leads to the first actions in the same level deviating from the ideal path.

As can be seen in table 5.6, all participants in the D-IC conditions stuck closely to their routine strategy’s logic during ToE levels one to three, obtaining logic proportion levels of 95.24 %, 100 % and 100%. Even though participants were facing environmental change, this change did not influence the routine strategy’s performance. As expected, the participants did therefore not deviate from their routine strategy at all (participants 5 and 6) or not nearly at all (participant 4). It is expected that participants of the N-IC will show significantly lower values of routine logic deviation than D-IC, leading to the third hypothesis:

Hypothesis 3: Participants in the N-IC condition show the highest logic proportions in ToE levels one to three, expressed by “ToE parts 1”, followed by proportions of D-IC participants, then C-IC, G-IC and R-IC.

As expected, routine logic deviations in the R-IC condition were higher than in the D-IC condition. While playing ToE participants 4, 5 and 6 followed their routine logic in 76.27 %, 86.44 % and 74.58 % of all cases, and participants 7, 8, and 9 followed their routine logic in only 23.57 %, 24.29 % and 22.86 % of all cases. N-IC participants were expected to show even higher values in logic proportion than D-IC participants. This leads to the fourth hypothesis:

Hypothesis 4: Participants in the N-IC condition show the highest total ToE logic proportion values, followed by proportions of D-IC participants, then C-IC, G-IC and R-IC.

By example of the small sample sized pretest, routine logic deviations grew in the D-IC condition, which was expected, due to the change of the goal rod position influencing strategy performance. However, the goal rod change during ToE games has to be treated differently from the goal rod change during ToH games. During ToH games the goal rod change will influence performance due to participants e.g. not paying attention to such change, using their F-L logic which would be ideal when the goal rod is “right” not “center”. This loss in performance can be quickly corrected during ToH by becoming aware of the goal rod change and adapting the F-L to the new goal rod position. It was expected that participants who deviate from the ideal path in ToH level 5, but performed well during ToH level 4, will either keep on “trembling” throughout ToH levels 5 to 7 where the goal rod was changed to being “center” or quickly learn and adapt their F-L to the new ToH goal rod conditions. However, ToH games participants are not expected to be “surprised” be their actions’ output, measured in “expected states” deviation. Goal rod change in ToE level 4 on the other hand also influences the participants’ expected states deviation, as for example an individual action input of “S1 r” might result in the small disk “seemingly” travelling to the left or might even result in the medium or large sized disk being moved; such cases are expected to create an “expected states deviation”. Such expected states deviation can lead to new interpretation of each individual agent. The influence of the environmental condition “goal rod change” in ToE level 4 is expected to be of lower influence to ToE routine logic deviations in levels 4, 5, and 6 than the “expected states deviation” experience. In order to measure this, ToE parts 1 logic deviations are also considered, where no goal rod change is yet performed. It was expected that “expected states deviation” is a better predictor of ToE logic deviation than the environmental condition “goal rod change”, as the former is expected to lead to interpretation changes, inducing deeper uncertainty than by the latter. Therefore, expected states deviations are expected to influence ToE logic deviations in all conditions of the experiment. In addition, higher expected states deviations were considered to lead to individual behavior which increasingly is not “captured” by any logic category, leading to low “logic marker” values. The logic marker reports the amount of actions in ToE games that are “0” in any logic category divided by total amount of actions. In other words, high values of expected states deviations were expected to make participants behave “randomly” from the perspective of the experimenter.

Hypothesis 5: The higher expected states deviation proportion values with respect to routine strategy during all ToE conditions, the higher logic deviation proportion values are.

Hypothesis 6: The higher expected states deviation proportion values with respect to routine strategy during all ToE conditions, the lower the logic marker proportion values are.

Table 5.6 Results of example experiment for explanation, part 5

R-IC and C-IC are expected to create higher expected states deviation from routine strategy as these conditions “take away” the basis for reinforcing the routine strategy, i.e. by informing about the “uselessness” of the direction button. It was expected that during R-IC expected states deviation values with respect to routine strategy were higher during ToE parts 1 than in all other conditions.

As the logic deviation distance of the G-IC was expected to be higher than of the C-IC, but the expected states distance of the C-IC was expected to be higher than of the G-IC, ToE game group performance, measured in total amount of required steps to solve all six ToE games, are considered, to indicate, whether logic deviation or expected states deviation with respect to routine logic is a better predictor of ToE group performance. Expected states deviation can be considered as a measurement of “irritating” feedback when a certain logic is used and was considered to lead to fundamental interpretation changes. Expected states deviation is the result of action. Logic deviation on the other hand expresses already performed action, embedding some former expectation. High expected states deviation distance with respect to some logic is considered as “more random feedback”. By Hypothesis 5 and 6 this was considered to lead to higher logic deviation distances, and seemingly random behavior. R-IC was expected to induce radical interpretation problems, inducing participants to feel uncertain about their routine strategy. G-IC was expected to induce uncertainty by social influence, where participants would try to adapt their strategy according to certain “patterns”, ultimately adapting their strategy. In G-IC participants were expected to use different forms of logic, not just their routine logic, therefore both using F-L and NB-L, leading to a lower proportion of routine logic used than in the C-IC condition, as only one logic form can be the routine logic.

Routine consistency is the number of routine strategy actions during the ill-defined stages that fall either into the F-L or NB-L category, divided by the total amount of actions during the ill-defined stages; sub-distinguishing elements of logic forms such as dir, nodir and ideal are disregarded for the calculation of routine consistency. When an action falls neither into the F-L or NB-L category, this action still is added to the total amount of actions, by which the number of routine strategy actions during the ill-defined stages is divided. Actions that fall outside of any known logic category are measured by the logic marker. For instance, a player has developed routine logic F-L from the well-defined stages. He has used 100 actions total during the ill-defined stages, with 90 F-L actions (80 times dir, 5 times nodir, 5 times ideal) and 10 NB-L actions (4 times dir, 2 times nodir, 4 times ideal), and therefore has a logic marker of 0 (0.00 %), since all actions are part of known logic categories. The resulting routine consistency is 0.90 (90 %).

Low routine consistency in G-IC ultimately was expected to lead to greater logic deviation distance from routine logic than in the C-IC condition, and due to logic volatility, to also lead to a higher deviation of expected states with respect to the routine logic. In the C-IC condition participants were expected to “stick with one logic” as they were “discouraged” by dissolution, still being induced by a lowered form of social influence and interpretation uncertainty. The D-IC lacks the interpretation uncertainty regarding the direction buttons, and comes with a lowered form of social influence. In other words, participants in the R-IC were expected to use different kinds of logic forms or strategies, and are induced with deep uncertainty with all strategies they tried, perhaps even leading to participants actually performing actions arbitrarily. Participants in the G-IC were expected to use different kinds of strategies, without being induced with deep uncertainty. Participants in C-IC were induced with deep uncertainty, however, were expected to be less volatile in their strategy forming than in G-IC, still deviating more from their routine strategy than in D-IC.

Hypothesis 7: Expected states deviation proportion values during ToE parts 1, ToE parts 2 and ToE total in R-IC are the highest, followed by G-IC, C-IC, D-IC and lastly N-IC.

Hypothesis 8: Routine consistency is the lowest in R-IC, followed by G-IC, C-IC, D-IC and N-IC.

Group performance, measured in numbers of group actions required to solve all ToE games, depends on the order of group actions. The algorithm is implemented in such a way that when all participants of a game-group at least agree on the optimal disk to be moved, this collectively chosen disk will always be moved, and the game group will outperform randomness greatly, even with different strategies in mind on how to move the disk. However, it was expected that even this “fundamental logic” will be dissolved with inducing deep uncertainty by telling participants the truth about “the direction buttons not working”. It was expected that the proportion of actions where participants did agree on one disk, disregarding whether it was the optimal choice, was the best predictor for group performance, expressed by the “fundamental index”.

Hypothesis 9: The lower the fundamental index the lower game group performance.

Finally, it was expected that group expertise rank explains inter-condition logic deviations amongst groups.

Hypothesis 10: Lower inter-condition group expertise rankings lead to lower logic deviation proportions.

In the following, table 5.7 (own source) will list all dependent and independent variables required for all 10 hypotheses and their according hypothesis (H).

Table 5.7 Independent and dependent variables, with according hypothesis