Abstract
We present a deep learning solution to address the challenges of simulating realistic synthetic first-price sealed-bid auction data. The complexities encountered in this type of auction data include high-cardinality discrete feature spaces and a multilevel structure arising from multiple bids associated with a single auction instance. Our methodology combines deep generative modeling (DGM) with an artificial learner that predicts the conditional bid distribution based on auction characteristics, contributing to advancements in simulation-based research. This approach lays the groundwork for creating realistic auction environments suitable for agent-based learning and modeling applications. Our contribution is twofold: we introduce a comprehensive methodology for simulating multilevel discrete auction data, and we underscore the potential of DGM as a powerful instrument for refining simulation techniques and fostering the development of economic models grounded in generative AI.
Similar content being viewed by others
References
A. Ezrachi & M. Stucke, (2020). Sustainable and unchallenged algorithmic tacit collusion. Northwestern Journal of Technology and Intellectual Property 17(2), .
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. In Proceedings of the 34th International Conference on Machine Learning 70, pp. 214–223, https://proceedings.mlr.press/v70/arjovsky17a.html. arXiv:1701.07875.
Arthur, W.B. (1999). Complexity and the economy .
Athey, S. (2019). The impact of machine learning on economics. The Economics of Artificial Intelligence: An Agenda. https://doi.org/10.7208/9780226613475-023
Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11(1), 685–725. https://doi.org/10.1146/annurev-economics-080217-053433. arXiv:1903.10075.
Axtell, R.L., & Farmer, J.D. (2021). Agent-based modeling in economics and finance: Past, present, and future. In Journal of Economic Literature
Ba, H. (2019). Improving detection of credit card fraudulent transactions using generative adversarial networks, arXiv:1907.03355.
Bonabeau, E. (2002). Agent-based modeling: Methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. USA, 99(3), 7280–7287. https://doi.org/10.1073/pnas.082080899
Cai, Z., Xiong, Z., Xu, H., Wang, P., Li, W., & Pan, Y. (2021). Generative adversarial networks: A survey toward private and secure applications. ACM Computing Surveys. https://doi.org/10.1145/3459992. arXiv:2106.03785.
Cai, R., Qiao, J., Zhang, K., Zhang, Z., Hao, Z. (2019). Causal discovery with cascade nonlinear additive noise models. In IJCAI International Joint Conference on Artificial Intelligence 2019, 1609–1615 https://doi.org/10.24963/ijcai.2019/223arXiv:1905.09442.
Calpin, J. A., Salisbury, M. R., Vitkevich, J. A., & Woodward, D. R. (2001). Extending the high level architecture paradigm to economic simulation. Computational Economics, 17(2–3), 141–154. https://doi.org/10.1023/A:1011619907538
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016) InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2180–2188. arXiv:1606.03657.
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. arXiv:1703.06490.
Chu, B., & Qureshi, S. (2022). Comparing out-of-sample performance of machine learning methods to forecast U.S. GDP growth. Computational Economics. https://doi.org/10.1007/s10614-022-10312-z
Dawid, H., & Pyka, A. (2018). Introduction special issue on evolutionary dynamics and agent-based modeling in economics. Computational Economics, 52(3), 707–710. https://doi.org/10.1007/s10614-018-9831-8
Decarolis, F. (2017). Comparing Public Procurement Auctions.
Dvison, A. C., Hinkley, D. V., & Schechtman, E. (1986). Efficient bootstrap simulation. Biometrika, 73(3), 555–566. https://doi.org/10.1093/BIOMET/73.3.555
Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1), 242–259. https://doi.org/10.1257/AER.97.1.242
Elsawah, S., Filatova, T., Jakeman, A. J., Kettner, A. J., Zellner, M. L., Athanasiadis, I. N., Hamilton, S. H. R., Axtell, L., Brown, D. G., Gilligan, J. M., Janssen, M. A., Robinson, D. T., Rozenberg, J., Ullah, I. I. T., & Lade, S. J. (2020). Eight grand challenges in socio-environmental systems modeling. Socio-Environmental Systems Modelling, 2, 16226. https://doi.org/10.18174/sesmo.2020a16226
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT Press.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 3, 2672–2680. https://doi.org/10.3156/jsoft.29.5_177_2
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved training of wasserstein GANs. Advances in Neural Information Processing Systems, 2017, 5768–5778. arXiv:1704.00028.
Hjelm, R.D., Jacob, A.P., Che, T., Trischler, A., Cho, K., & Bengio, Y. (2018). Boundary-seeking generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. arXiv:1702.08431.
Hortaçsu, A., & Puller, S. L. (2008). Understanding strategic bidding in multi-unit auctions: A case study of the Texas electricity spot market. The RAND Journal of Economics, 39(1), 86–114.
Ittoo, A., & Petit, N. (2017). Algorithmic pricing agents and tacit collusion: A technological perspective. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3046405
Jackson, P., & Lussetti, M. (2019). Extending a generative adversarial network to produce medical records with demographic characteristics and health system use. In 2019 IEEE 10th Annual Information Technology. Electronics and Mobile Communication Conference, IEMCON, 2019, pp. 515–518. https://doi.org/10.1109/IEMCON.2019.8936168
Jin, Z., Liu, W. Y., & Jin, J. (2009). Finding shortcuts from episode in multi-agent reinforcement learning. In International Conference on Machine Learning and Cybernetics, 4, 2306–2311. 10.1109/ICMLC.2009.5212219 https://consensus.app/papers/finding-shortcuts-reinforcement-learning-jin/8b70beb36aab5a539af968e028a3f3fchttps://www.semanticscholar.org/paper/c93a499bb6135ff81839583a30f8180df072e05d
Kimbrough S. O., M. Lu, & F. Murphy, (2005). Learning and Tacit Collusion by Artificial Agents in Cournot Duopoly Games. In Formal Modelling in Electronic Commerce, pp. 477–492. Springer-Verlag, https://doi.org/10.1007/3-540-26989-4_19
Kingma, D.P., & Welling, M. (2014). Auto-encoding variational bayes. In 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings. International Conference on Learning Representations, ICLR . arXiv:1312.6114.
Kingma, D.P., & Ba, J.L. (2014). Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, arXiv:1412.6980.
Klein, T. (2018). Assessing autonomous algorithmic collusion: Q-learning under sequential pricing. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3195812
Lin, Z., Fanti, G., Khetan, A., & Oh, S. (2018). PacGan: The power of two samples in generative adversarial networks. Advances in Neural Information Processing Systems, 2018(8), 1498–1507. https://doi.org/10.1109/jsait.2020.2983071. arXiv:1712.04086.
Louizos, C., Shalit, U., Mooij, J., Sontag, D. , Zemel, R., & Welling, M. (2017). Causal effect inference with deep latent-variable models, Advances in Neural Information Processing Systems 2017(Nips), 6447–6457, arXiv:1705.08821.
Lucic, M., Kurach, K., Michalski, M., Bousquet, O., & Gelly, S. (2018). Are Gans created equal? A large-scale study. Advances in Neural Information Processing Systems, 2018, 700–709. arXiv:1711.10337.
Lussange, J., Lazarevich, I., Bourgeois-Gironde, S., Palminteri, S., & Gutkin, B. (2021). Modelling stock markets by multi-agent reinforcement learning. Computational Economics, 57(1), 113–147. https://doi.org/10.1007/s10614-020-10038-w
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., & Smolley, S.P. (2017). Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 2813–2821, https://doi.org/10.1109/ICCV.2017.304arXiv:1611.04076.
Marti, G. (2020). CORRGAN: Sampling realistic financial correlation matrices using generative adversarial networks. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8459–8463. https://doi.org/10.1109/ICASSP40776.2020.9053276arXiv:1910.09504.
Mguni, D., Jennings, J., & Cote, E. M. D. (2018) Decentralised learning in systems with many, many strategic agents, 4686–4693. https://doi.org/10.1609/aaai.v32i1.11586https://consensus.app/papers/decentralised-learning-systems-many-many-strategic-mguni/c99833ba3e8450399ab5a30603a12325https://www.semanticscholar.org/paper/51808249eb4156916bf28bd36d645325e039daf3.
Milgrom, P., & Kwerel, E. (2003). Putting auction theory to work. Cambridge University Press.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Neuneier, R., Hergert, F., Finnoff, W., & Ormoneit, D. (1994). Estimation of conditional densities: A comparison of neural network approaches. Icann, 94, 689–692. https://doi.org/10.1007/978-1-4471-2097-1_162
Prechelt, L. (2012). Early stopping - But when?. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7700 LECTU 53–67. https://doi.org/10.1007/978-3-642-35289-8_5
Rashedi, N., Tajeddini, M. A., & Kebriaei, H. (2016). Markov game approach for multi-agent competitive bidding strategies in electricity market. In IET Generation, Transmission and Distribution,10(15), 3756–3763. 10.1049/iet-gtd.2016.0075 https://www.researchgate.net/publication/305627052
Riley, J., Calinescu, R., Paterson, C. , Kudenko, D. , & Banks, A. (2021). Reinforcement learning with quantitative verification for assured multi-agent policies, ICAART 2021 - Proceedings of the 13th International Conference on Agents and Artificial Intelligence 2 237–245. https://doi.org/10.5220/0010258102370245https://consensus.app/papers/reinforcement-learning-quantitative-verification-riley/6c6ebcc6ae7d512aa7a19ce277dfe024https://www.semanticscholar.org/paper/9b6acab5c052606da9d70bdf0bc04f00978b6ee1.
Rubinstein, R.Y., & Kroese, D. P. (2008) .Simulation and the monte Carlo method.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, A., Radford, X., Chen, X., & Chen. (2016). Improved Techniques for Training GANs. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 1–9). Curran Associates Inc. https://doi.org/10.1186/s12859-018-2152-zarXiv:1701.00160.
Schioler, H., & Kulczycki, P. (1997). Neural network for estimating conditional distributions. IEEE Transactions on Neural Networks, 8(5), 1015–1025. https://doi.org/10.1109/72.623203
Shafie-Khah, M., & Catãlo, J. P. (2015). A stochastic multi-layer agent-based model to study electricity market participants behavior. IEEE Transactions on Power Systems, 30(2), 867–881. https://doi.org/10.1109/TPWRS.2014.2335992
Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning,91–89, 10.2200/S00268ED1V01Y201005AIM009 https://consensus.app/papers/algorithms-reinforcement-learning-szepesvari/c62eb61c9a915e22883e1c9b3b9122e7https://www.semanticscholar.org/paper/e60f3c1cb857daa3233f2c5b17b6f111ff86698c
Takahashi, S., Chen, Y., & Tanaka-Ishii, K. (2019). Modeling financial time-series with generative adversarial networks. Physica A: Statistical Mechanics and its Applications. https://doi.org/10.1016/j.physa.2019.121261
Tellidou, A. C., & Bakirtzis, A. G. (2007). Agent-based analysis of capacity withholding and tacit collusion in electricity markets. IEEE Transactions on Power Systems, 22(4), 1735–1742. https://doi.org/10.1109/TPWRS.2007.907533
Tesfatsion, L. (2006). Handbook of Computational Economics: Volume 2, Agent-based Computational Economomics. Elsevier.
Uyarra, E., Zabala-Iturriagagoitia, J. M., Flanagan, K., & Magro, E. (2020). Public procurement, innovation and industrial policy: Rationales, roles, capabilities and implementation. Research Policy, 49(1), 103844. https://doi.org/10.1016/j.respol.2019.103844
Viehmann, J., Lorenczik, S., & Malischek, R. (2021). Multi-unit multiple bid auctions in balancing markets: An agent-based q-learning approach. Energy Economics. https://doi.org/10.1016/j.eneco.2020.105035
Waltman, L., & Kaymak, U. (2008). Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293. https://doi.org/10.1016/j.jedc.2008.01.003
Williams, P. M. (1996). Using neural networks to model conditional multivariate densities. Neural Computation, 8(4), 843–854. https://doi.org/10.1162/neco.1996.8.4.843
Wong, M., & Farooq, B. (2020). A bi-partite generative model framework for analyzing and simulating large scale multiple discrete-continuous travel behaviour data. Transportation Research Part C: Emerging Technologies, 110, 247–268. https://doi.org/10.1016/j.trc.2019.11.022. arXiv:1901.06415.
Xie, D., Zhang, N., & Edwards, D. A. (2018). Simulation solution to a two-dimensional mortgage refinancing problem. Computational Economics, 52(2), 479–492. https://doi.org/10.1007/s10614-017-9689-1
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling Tabular data using Conditional GAN, arXiv:1907.00503.
Ye, Y., Qiu, D., Sun, M., Papadaskalopoulos, D., & Strbac, G. (2019). Deep Reinforcement Learning for Strategic Bidding in Electricity Markets. IEEE Transactions on Smart Grid,1–1. https://doi.org/10.1109/tsg.2019.2936142
Yoon, J., Jordon, J., & Van Der Schaar, M. (2018). Ganite: Estimation of individualized treatment effects using generative adversarial nets. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings(2010), 1–15. https://openreview.net/pdf?id=ByKWUeWA-.
Yu, C., Liu, J., Nemati, S., & Yin, G. (2021). Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(11), 1–36. https://doi.org/10.1145/3477600
Zhao, X. (2019). The effect of political connections: Model analysis and quantitative simulation. In Emerging Markets Finance and Trade, 1–13. https://doi.org/10.1080/1540496X.2019.1612362
Zhou, X., & Li, H. (2019). Buying on margin and short selling in an artificial double auction market. Computational Economics, 54(4), 1473–1489. https://doi.org/10.1007/s10614-017-9722-4
Acknowledgements
We express our sincere gratitude to the reviewers for their insightful feedback, which has significantly contributed to the enhancement and refinement of this manuscript.
Funding
This work was supported by Meta Research following the authors application to the request for proposals on https://research.facebook.com/research-awards/request-for-proposals-on-agent-based-user-interaction-simulation-to-find-and-fix-integrity-and-privacy-issues/#award-recipients agent-based user interaction simulation to find and fix integrity and privacy issues.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation, data collection, analysis were performed by Igor Sadoune. The first draft of the manuscript was written by Igor Sadoune and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: SEAO Dataset
Table 5 outlines the variables used in our study. It is important to note that geographical variables like countries or states were excluded because, although present in the raw data, only contracts from the province of Quebec are actually recorded in the subset we accessed.
The cleaning of the raw data involved selecting relevant variables. Many columns in the raw data were excluded as they did not provide informative signals due to their nature (e.g., web links) or redundancy with other variables. The quality of the signals was another selection criterion. Columns like temporal variables or textual entries were omitted due to low quality, evidenced by inconsistencies, excessive missing values, and non-uniform entry formats. Despite centralized dataset management by SEAO, data inconsistencies and missing values are common due to manual updates by various public entities’ administrators. These are significant limitations of this dataset.
The raw data, is available at https://www.donneesquebec.ca/recherche/dataset/systeme-electronique-dappel-doffres-seao, and the official PDF description file at https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/af41596c-b07f-4664-82c8-577e1ef9a6f3/download/seao-specificationsxml-donneesouvertes-20171010.pdf. Data for each year, and in some cases each month, must be fetched separately. The raw data, provided in XML format, needs conversion into a workable tabular array. We utilized the Python "xml" library to convert and save the data in pickle (.pkl) format. The code for processing the original XML files is available in the associated GitHub repository for this manuscript. Note that we also provide the cleaned and preprocessed pickle file.
Appendix B: Methodology Overview
Initially, the SEAO dataset underwent a thorough cleaning process. This involved handling missing values, removing irrelevant columns, and reformatting specific columns to ensure their consistency and reliability.
Following the data cleaning, preprocessing was conducted to transform the dataset and make it suitable for machine learning applications. Discrete variables were transformed using one-hot encoding techniques, while continuous bid values were standardized.
To generate synthetic data, two primary generative models were utilized: CTGAN and TVAE. The CTGAN model was trained using an array of hyperparameters, including distinct embedding dimensions, generator and discriminator dimensions, learning rates, and specific decay rates. Likewise, the TVAE, a variant that incorporates a variational autoencoder structure, was trained with specific parameters, including hidden and latent dimensions.
Once trained, these models were then employed to sample synthetic datasets, replicating the patterns and distributions seen in the original SEAO dataset.
Subsequent to the synthetic data generation, we introduced BidNet neural network model. This model was designed to predict bid values using both discrete and continuous inputs. For training efficiency, the model utilized cross-validation and early stopping methodologies. Several hyperparameters, including learning rate, batch size, and number of epochs, were tuned manually to enhance the model’s performance. Alternatively, rigorous automated tuning procedures (e.g., Bayesian optimization) can be used, provided enough computational resources and time are available.
To assess the quality of the synthetic data produced by CTGAN and TVAE, a series of classifiers, including Decision Trees, k-Nearest Neighbors, and Neural Networks, were trained on both the real and synthetic datasets. Performance metrics from these classifiers provided insights into the fidelity and utility of the synthetic data.
Finally, BidNet model’s performance was critically evaluated using various metrics. These metrics, namely the Root Mean Square Error (RMSE), Jensen-Shannon distance (JS), and Wasserstein distance (WS), compared the synthetic and real bids, giving a comprehensive understanding of the model’s accuracy and effectiveness.
Throughout the entire process, special attention was given to reproducibility. Foundational functionalities ensured consistent random states, allowing for deterministic behavior across runs. Additionally, capabilities were established to save intermediate results, trained models, and to manage computation across various devices, whether CPU or GPU.
Appendix C: Algorithms
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sadoune, I., Joanis, M. & Lodi, A. Implementing a Hierarchical Deep Learning Approach for Simulating Multilevel Auction Data. Comput Econ (2024). https://doi.org/10.1007/s10614-024-10622-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10614-024-10622-4