Skip to main content
Log in

Implementing a Hierarchical Deep Learning Approach for Simulating Multilevel Auction Data

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

We present a deep learning solution to address the challenges of simulating realistic synthetic first-price sealed-bid auction data. The complexities encountered in this type of auction data include high-cardinality discrete feature spaces and a multilevel structure arising from multiple bids associated with a single auction instance. Our methodology combines deep generative modeling (DGM) with an artificial learner that predicts the conditional bid distribution based on auction characteristics, contributing to advancements in simulation-based research. This approach lays the groundwork for creating realistic auction environments suitable for agent-based learning and modeling applications. Our contribution is twofold: we introduce a comprehensive methodology for simulating multilevel discrete auction data, and we underscore the potential of DGM as a powerful instrument for refining simulation techniques and fostering the development of economic models grounded in generative AI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgements

We express our sincere gratitude to the reviewers for their insightful feedback, which has significantly contributed to the enhancement and refinement of this manuscript.

Funding

This work was supported by Meta Research following the authors application to the request for proposals on https://research.facebook.com/research-awards/request-for-proposals-on-agent-based-user-interaction-simulation-to-find-and-fix-integrity-and-privacy-issues/#award-recipients agent-based user interaction simulation to find and fix integrity and privacy issues.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, analysis were performed by Igor Sadoune. The first draft of the manuscript was written by Igor Sadoune and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Igor Sadoune.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: SEAO Dataset

Table 5 outlines the variables used in our study. It is important to note that geographical variables like countries or states were excluded because, although present in the raw data, only contracts from the province of Quebec are actually recorded in the subset we accessed.

Table 5 Description of the variables used in the study

The cleaning of the raw data involved selecting relevant variables. Many columns in the raw data were excluded as they did not provide informative signals due to their nature (e.g., web links) or redundancy with other variables. The quality of the signals was another selection criterion. Columns like temporal variables or textual entries were omitted due to low quality, evidenced by inconsistencies, excessive missing values, and non-uniform entry formats. Despite centralized dataset management by SEAO, data inconsistencies and missing values are common due to manual updates by various public entities’ administrators. These are significant limitations of this dataset.

The raw data, is available at https://www.donneesquebec.ca/recherche/dataset/systeme-electronique-dappel-doffres-seao, and the official PDF description file at https://www.donneesquebec.ca/recherche/dataset/d23b2e02-085d-43e5-9e6e-e1d558ebfdd5/resource/af41596c-b07f-4664-82c8-577e1ef9a6f3/download/seao-specificationsxml-donneesouvertes-20171010.pdf. Data for each year, and in some cases each month, must be fetched separately. The raw data, provided in XML format, needs conversion into a workable tabular array. We utilized the Python "xml" library to convert and save the data in pickle (.pkl) format. The code for processing the original XML files is available in the associated GitHub repository for this manuscript. Note that we also provide the cleaned and preprocessed pickle file.

Appendix B: Methodology Overview

Initially, the SEAO dataset underwent a thorough cleaning process. This involved handling missing values, removing irrelevant columns, and reformatting specific columns to ensure their consistency and reliability.

Following the data cleaning, preprocessing was conducted to transform the dataset and make it suitable for machine learning applications. Discrete variables were transformed using one-hot encoding techniques, while continuous bid values were standardized.

To generate synthetic data, two primary generative models were utilized: CTGAN and TVAE. The CTGAN model was trained using an array of hyperparameters, including distinct embedding dimensions, generator and discriminator dimensions, learning rates, and specific decay rates. Likewise, the TVAE, a variant that incorporates a variational autoencoder structure, was trained with specific parameters, including hidden and latent dimensions.

Once trained, these models were then employed to sample synthetic datasets, replicating the patterns and distributions seen in the original SEAO dataset.

Subsequent to the synthetic data generation, we introduced BidNet neural network model. This model was designed to predict bid values using both discrete and continuous inputs. For training efficiency, the model utilized cross-validation and early stopping methodologies. Several hyperparameters, including learning rate, batch size, and number of epochs, were tuned manually to enhance the model’s performance. Alternatively, rigorous automated tuning procedures (e.g., Bayesian optimization) can be used, provided enough computational resources and time are available.

To assess the quality of the synthetic data produced by CTGAN and TVAE, a series of classifiers, including Decision Trees, k-Nearest Neighbors, and Neural Networks, were trained on both the real and synthetic datasets. Performance metrics from these classifiers provided insights into the fidelity and utility of the synthetic data.

Finally, BidNet model’s performance was critically evaluated using various metrics. These metrics, namely the Root Mean Square Error (RMSE), Jensen-Shannon distance (JS), and Wasserstein distance (WS), compared the synthetic and real bids, giving a comprehensive understanding of the model’s accuracy and effectiveness.

Throughout the entire process, special attention was given to reproducibility. Foundational functionalities ensured consistent random states, allowing for deterministic behavior across runs. Additionally, capabilities were established to save intermediate results, trained models, and to manage computation across various devices, whether CPU or GPU.

Appendix C: Algorithms

Algorithm 1
figure a

Training GANs-based auction features generator

Algorithm 2
figure b

Training tabular VAE for auction features

Algorithm 3
figure c

K-folds cross-validation BidNet training procedure

Algorithm 4
figure d

Synthetic bid validation / Double validation

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sadoune, I., Joanis, M. & Lodi, A. Implementing a Hierarchical Deep Learning Approach for Simulating Multilevel Auction Data. Comput Econ (2024). https://doi.org/10.1007/s10614-024-10622-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10614-024-10622-4

Keywords

Navigation