Skip to main content

Reconsidering Meaningful Learning in a Bandit Experiment on Weighted Voting: Subjects’ Search Behavior

Abstract

This paper clarifies subjects’ search behavior of correct options behind the experimental results shown by Guerci et al. (Theory Decis 83:131–153, 2017). In the experiment, subjects were asked to choose one of two weighted voting games repeatedly and their payoffs are stochastically determined for each of their choice according to a payoff-generating function that was hidden from subjects. The main results are as follows. (1) In the additional sessions conducted for the treatment without any payoff-related feedback information, it was reconfirmed that subjects learned to choose the correct option that generates higher expected payoffs for them and generalized what they had thought introspectively in a binary choice problem to a similar but different one. (2) Feedback information about payoffs given immediately after subjects’ choice often confused their inference on the relationship between nominal voting weights and actual payoffs so that they took the win-stay-lose-shift strategy in some sessions. (3) Immediate payoff-related feedback information sometimes induced subjects to randomly choose the runs of options.

This is a preview of subscription content, access via your institution.

Fig. 1

Data Availability

All raw and processed data as well as zTree codes are available upon requests.

Code Availability

All data were processed with Excel Statistics. The software information is available at the following website, although the contents are written in Japanese. https://bellcurve.jp/ex/.

Notes

  1. Meaningful learning is also called “transfer of learning” (Cooper and Kagel [6], Cooper and Kagel [7]) or “epiphany” (Dufwenberg et al. [9].)

  2. This setting was made to avoid the complexities mentioned at the beginning of the Introduction; subjects are simultaneously learning to play a weighted voting game from interfering with the other subjects learning about the underlying relationship between their nominal voting weights and their expected payoffs.

  3. Guerci et al. [18] adopted DPI as their payoff generating function because, in all the experiments reported in Montero et al. [22], Aleskerov et al. [1], Esposito et al. [12], Guerci et al. [19], and Watanabe [27], the most frequently observed winning coalitions were MWCs.

  4. In Table 12, denote by 1-1, 1-2, and 1-3 payoff vectors (0, 60, 0, 60), (0, 0, 60, 60), and (40, 40, 40, 0), respectively when Choice 1 is chosen in Problem D and by 2-1, 2-2, and 2-3 (0, 0, 60, 60), (40, 40, 40, 0), and (40, 40, 0, 40), respectively when Choice 2 is chosen. Assume that when Choice k (\(=1, 2\)) is successively chosen, payoff vectors realize in the order of k-1\(\rightarrow \) k-2 \(\rightarrow \) k-3 \(\rightarrow \) k-1 \(\rightarrow \) \(\cdots \), and assume in addition that when the alternative option is once chosen after observing payoff vector k-i (i=1, 2, 3) and then Choice k is chosen again, the order resumes from the payoff vector next to k-i. Given that Choice 1 is chosen at Period 1, the WSLS strategy generates the following sequence of 20 choices: \(1, 2, 1, 2, 2, 2, 1,1, 2, 2, 2, 1, 2, 2, 2, 1,1, 2, \ldots \). The p value for the runs test applied to the sequence of those choices is 0.8391, and thus the null hypothesis was not rejected, although they were generated systematically with the WSLS strategy.

  5. In the mouse-tracking experiment, votes and payoffs of the committee members were “hidden” from subjects in windows on their monitors; using a computer mouse, he or she needed to bring the cursor to the windows and click on them to view the hidden information. Almost all other aspects are the same as those for the full-feedback treatment in our experiment.

  6. The question numbers are 1, 4, 7, and 10 from Set I and questions 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, and 34 from Set II.

References

  1. Aleskerov, F., Beliani, A., & Pogorelskiy, K. (2009). Power and preferences: An experimental approach. National Research University, Higher School of Economics, Mimeo.

  2. Arifovic, J., & McKelvey, R.D., & Pevnitskaya, S. (2006). An initial implementation of the Turing tournament to learning in two person games. Games and Economic Behavior, 57, 93–122.

    Article  Google Scholar 

  3. Banzhaf, J. F. (1965). Weighted voting doesn’t work: A mathematical analysis. Rutgers Law Review, 19, 317–343.

    Google Scholar 

  4. Camerer, C., & Ho, T.-H. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67, 827–874.

    Article  Google Scholar 

  5. Cheung, Y.-W., & Friedman, D. (1997). Individual learning in normal form games: Some laboratory results. Games and Economic Behavior, 19, 46–76.

    Article  Google Scholar 

  6. Cooper, D. J., & Kagel, J. H. (2003). Lessons learned: Generalized learning across games. American Economic Review, 93, 202–207.

    Article  Google Scholar 

  7. Cooper, D. J., & Kagel, J. H. (2008). Learning and transfer in signaling games. Economic Theory, 34, 415–439.

    Article  Google Scholar 

  8. Deegan, J., & Packel, E. (1978). A new index of power for simple n-person games. International Journal of Game Theory, 7, 113–123.

    Article  Google Scholar 

  9. Dufwenberg, M., Sundaram, R., & Butler, D. J. (2010). Epiphany in the game of 21. Journal of Economic Behaviour Organization, 75, 132–143.

    Article  Google Scholar 

  10. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88, 848–81.

    Google Scholar 

  11. Erev, I., Ert, E., & Roth, A. E. (2010). A choice prediction competition for market entry games: An introduction. Games, 1, 117–136.

    Article  Google Scholar 

  12. Esposito, G., Guerci, E., Lu, X., Hanaki, N., & Watanabe, N. (2012). An experimental study on “meaningful learning” in weighted voting games. Mimeo, Aix-Marseille University.

  13. Felsenthal, D. S., & Machover, M. (1998). The Measurement of Voting Power: Theory and Practice. Edward Elgar, London: Problems and Paradoxes.

    Book  Google Scholar 

  14. Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics, 10, 171–178.

    Article  Google Scholar 

  15. Gelman, A., Katz, J. N., & Bafumi, J. (2004). Standard voting power indexes do not work: an empirical analysis. British Journal of Political Science, 34, 657–674.

    Article  Google Scholar 

  16. Gilboa, I., & Schmeidler, D. (2001). A Theory of Case-Based Decisions. London: Cambridge University Press.

    Book  Google Scholar 

  17. Grant, S., Meneghel, I., & Tourky, R. (2017). Learning under unawareness. Mimeo. https://ssrn.com/abstract=3113983

  18. Guerci, E., Hanaki, N., & Watanabe, N. (2017). Meaningful learning in weighted voting games: An experiment. Theory and Decision, 83, 131–153.

    Article  Google Scholar 

  19. Guerci, E., Hanaki, N., Watanabe, N., Esposito, G., & Lu, X. (2014). A methodological note on a weighted voting experiment. Social Choice and Welfare, 43, 827–850.

    Article  Google Scholar 

  20. Hu, Y., Kayaba, Y., & Shum, M. (2013). Nonparametric learning rules from bandit experiments. Games and Economic Behavior, 81, 215–231.

    Article  Google Scholar 

  21. Meyer, R. J., & Shi, Y. (1995). Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management Science, 41, 817–834.

    Article  Google Scholar 

  22. Montero, M., Sefton, M., & Zhang, P. (2008). Enlargement and the balance of power: An experimental study. Social Choice and Welfare, 30, 69–87.

    Article  Google Scholar 

  23. Nowak, M. A., & Sigmund, K. (1993). A strategy of win stay, lose shift that outperforms tit-for-tat in the prisoner’s-dilemma game. Nature, 364, 56–58.

    Article  Google Scholar 

  24. Ogawa, K., Osaki, Y., Kawamura, T., Takahashi, H., Taguchi, S. Fujii, Y., & Watanabe, N. (2020). Conducting economic experiments at multiple sites: Subjects’ cognitive ability and attribute information. Kansai University RISS Discussion Paper Series No. 86, June 2020.

  25. Rick, S., & Weber, R. A. (2010). Meaningful learning and transfer of learning in games played repeatedly without feedback. Games and Economic Behavior, 68, 716–730.

    Article  Google Scholar 

  26. Shapley, L. S., & Shubik, M. (1954). A method for evaluating the distribution of power in a committee system. American Political Science Review, 48, 787–792.

    Article  Google Scholar 

  27. Watanabe, N. (2014). Coalition formation in a weighted voting experiment. Japanese Journal of Electral Studies, 30, 56–67.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoki Watanabe.

Ethics declarations

Conflict of Interest

The author declares that there is no conflict of interest. Eric Guerci and Nobuyuki Hanaki gave their permission to reuse the data to the author. Their permission emails can be shown to the editors of this journal.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committees and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is a revised version of Kansai University RISS Discussion Paper Series No. 62 issued in 2018. The author wishes to thank Eric Guerci and Nobuyuki Hanaki, for giving him their permission to reuse the data, and Daniel Friedman, Tetsuya Kawamura, Midori Hirokawa, Yoichi Hizen, Tatsuya Kameda, Naoko Nishimura, and Kazuhito Ogawa for their comments on that previous work which motivated him to provide these supplementary results. This research could not have been completed without the research assistance of Yoichi Izunaga, Toru Suzuki, Hiroshi Tanaka, and Yosuke Watanabe. Financial support from Foundation for the Fusion of Science and Technology (FOST), MEXT Grants-in-Aid 24330078 and 25380222, JSPS-ANR bilateral research grant “BECOA” (ANR-11-FRJA-0002), and Joint Usage/Research of ISER at Osaka University is gratefully acknowledged.

Appendices

Appendix A: Instructions

The instructions for bandit experiments are in general less informative to subjects. We attach the instructions here for showing to the readers that they were actually simple.

Instructions

Welcome! Thank you for participating in this experiment today. You will be paid 1000 JPY for your participant and an additional reward that ranges from 0 to 3200 JPY depending on your choice and performance in the experiment. At first,

  • Please follow the instructions of the experimenter.

  • Please do not take notes during this session.

  • Please remain quiet and especially do not talk with other participants.

  • Please do not look at what other participants are doing.

  • During the experiment, please maintain an upright posture without leaning on the backrest.

  • Do absolutely nothing other than the operation that you are instructed to do.

  • Please turn off your mobile phone and definitely refrain from using it.

  • If you have any questions or require assistance, please silently raise your hand.

You will be asked to repeatedly make a simple choice between two options. Imagine that you need to represent your interests within a voting committee. This committee decides how to divide 120 points among its members. The committee has three other members, and each member has a predetermined number of votes, which may be different from one to the other. The committee will make a decision only when a proposal receives the predetermined required number of votes. You will be told what is the required number of votes. If more than one proposal is put before the committee, the members cannot vote for multiple proposals by dividing their allocated number of votes. A member can vote for only one proposal, and all of his/her votes must be cast for that proposal.

You are asked to choose which of the two possible committees you prefer to join. You will be informed of the number of votes allocated to each of the four members of the committee (including you), and the number of votes required for a proposal to be approved. The number of votes you have will always be indicated with the label YOU.

Full-Feedback Treatment

There are a total of 60 periods. At each period, you have 30 s to make your choice between the two committees. If you do not make a choice within the 30 s at one period, you will receive zero points for that period. When a choice is made, the chosen committee will automatically allocate 120 points among the four members. The outcomes may vary from one period to another, but are based on a theory of decision-making in committees. Once the allocation is made, you will immediately be shown the resulting allocation. At the end of the experiment, you will be paid according to your total earnings during the 60 periods, at an exchange rate of 1 point = 1 JPY.

Partial-Feedback Treatment

There are a total of 60 periods. At each period, you have 30 s to make your choice between the two committees. If you do not make any choice within the 30 s at one period, you will receive zero points for the period. When a choice is made, the chosen committee will automatically allocate 120 points among the four members. The outcomes may vary from one period to another, but they are based on a theory of decision-making in committees. Once the allocation is made, you will be shown the number of points allocated to you. You will not see the allocations to the other members of the committee. At the end of the experiment, you will be paid according to your total points at an exchange rate of 1 point = 1 JPY.

No-Feedback Treatment

There are a total of 60 periods. At each period, you have 30 s to make your choice between the two committees. If you do not make any choice within the 30 s at one period, you will receive zero points for the period. When a choice is made, the chosen committee will automatically allocate 120 points between the four members. The outcomes may vary from one period to another, but they are based on a theory of decision-making in committees. You will not see the resulting allocation after each period. However, at the end of the experiment, you will be told the total points you have obtained during the 60 periods, and you will be paid according to the points earned over the 60 periods at an exchange rate of 1 point = 1 JPY.

If you have any questions, please raise your hand.

Appendix B: Subjects’ Comments

In additional sessions, 10 subjects participated in sequence C \(\rightarrow \) D. As shown in Fig. 1, at least 6 subjects chose the correct option in Periods 41–45. Below are 6 answers to the post-experimental questionnaire of those subjects who succeeded in meaningful learning. The answers show that they had the proper reasoning for their choice, although the questionnaire was unfortunately not structured and the answers were written in a free format. Recall that in the additional sessions, however, the correct option is Choice 1 in the first 40 periods and it is Choice 2 in the subsequent 20 periods.

Question: Which option did you mainly choose? Why did you choose that option? Please explain the reason behind your choice.

  • I realized that I could not obtain any reward without the approval of three voters including myself. Thus, in the second half, I chose the option in which it was less likely to be approved by two large voters only. (2 subjects)

  • I chose the options that have more cases where three voters could win by themselves. (3 subjects)

  • In Periods 1-40, there was a case where two large voters could collect 14 votes by themselves in Choice 2, but there was no such a case in Choice 1. Thus, I chose Choice 1. But, I sometimes chose Choice 2, because I was not sure about how 120 points would be distributed. In Periods 41-60, there were two cases where two voters exept me could collect 9 votes by themselves in Choice 1, but there was one case where two voter could collect 9 votes by themselves. Thus, I chose Choice 2 many times. But, I sometimes chose Choice 1, because I did not know how 120 points would be distributed. (1 subject)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Watanabe, N. Reconsidering Meaningful Learning in a Bandit Experiment on Weighted Voting: Subjects’ Search Behavior. Rev Socionetwork Strat 16, 81–107 (2022). https://doi.org/10.1007/s12626-022-00106-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-022-00106-y

Keywords

  • Meaningful learning
  • Bandit experiment
  • Weighted voting
  • Search behavior
  • Win-stay-lose-shift strategy

JEL Classification

  • C91
  • D72
  • D83