Skip to main content
Log in

A survey of public datasets for O-RAN: fostering the development of machine learning models

  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

The O-RAN architecture allows for unprecedented flexibility in Radio Access Networks (RANs). O-RAN’s components designed to control RANs, such as RAN Intelligent Controllers (RICs), places intelligence at the center of the management and orchestration of 5 G/6 G cellular networks. RICs run applications based on machine learning models, which require massive RAN data for training. Nonetheless, building testbeds to collect these data is challenging since RANs use expensive hardware and operate under a licensed spectrum, usually not available for the academy. Even though producing RAN datasets is challenging, some research groups have already made their data available. In this paper, we survey the primary public datasets available online that are considered in O-RAN papers. We identify the main characteristics and purpose of each dataset, contributing with a complement to their documentation. Also, we empirically showcase the viability of using publicly available datasets for machine learning applications within the O-RAN domain, such as spectrum and traffic classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

No datasets were generated or analyzed during the current study.

Notes

  1. A preliminary version of this work appeared in [13]

  2. https://www.ettus.com/

  3. https://www.srsran.com/

  4. The work in [8] presents a slightly different UE distribution for the traffic types. However, after analyzing the data, we can state that the distribution reported here is the actual one employed by the public dataset.

  5. https://pandas.pydata.org/

  6. https://www.gnuradio.org/

  7. https://github.com/GTA-UFRJ/useorandatasets/blob/main/readCharm.py

  8. https://github.com/lucabaldesi/charm_trainer/blob/master/read_IQ.py

  9. https://numpy.org/doc/stable/reference/generated/numpy.load.html

  10. The dataset is available inside the xApp’s source code: https://github.com/o-ran-sc/ric-app-ad/blob/master/src/ue.csv

  11. https://github.com/GTA-UFRJ/ORAN_DNN

  12. https://pytorch.org

  13. https://github.com/GTA-UFRJ/LearnRAN

  14. https://pandas.pydata.org

  15. https://numpy.org

  16. https://scikit-learn.org

References

  1. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74

    Article  Google Scholar 

  2. Amini M, Stanica R, Rosenberg C (2023) Where are the (cellular) data? ACM Comput Surv 56(2)

  3. Arnaz A, Lipman J, Abolhasan M, Hiltunen M (2022) Toward integrating intelligence and programmability in open radio access networks: a comprehensive survey. IEEE Access 10:67747–67770

    Article  Google Scholar 

  4. Baldesi L, Restuccia F, Melodia T (2021) ChARM (Channel-aware reactive mechanism) dataset. Available: http://hdl.handle.net/2047/D20423481

  5. Baldesi L, Restuccia F, Melodia T (2022) ChARM: NextG spectrum sharing through data-driven real-time O-RAN dynamic control. In: IEEE conference on computer communications (INFOCOM), pp 240–249

  6. Bartsiokas IA, Gkonis PK, Kaklamani DI, Venieris IS (2022) ML-based radio resource management in 5g and beyond networks: a survey. IEEE Access 10:83507–83528

    Article  Google Scholar 

  7. Bezerra GMG, Ferreira TN, Mattos DMF (2022) Assessing software-defined radio security and performance in virtualized environments for cloud radio access networks. In: Cyber security in networking conference (CSNet), pp 1–7

  8. Bonati L, D’Oro S, Polese M, Basagni S, Melodia T (2021) Intelligence and learning in O-RAN for data-driven NextG cellular networks. IEEE Commun Mag 59(10):21–27

    Article  Google Scholar 

  9. Bonati L, D’Oro S, Polese M, Basagni S, Melodia T (2022) Colosseum O-RAN COMMAG dataset. Available: https://github.com/wineslab/colosseum-oran-commag-dataset

  10. Bonati L, Polese M, D’Oro S, Basagni S, Melodia T (2023) OpenRAN Gym: AI/ML development, data collection, and testing for O-RAN on PAWR platforms. Comput Netw 220:109502

    Article  Google Scholar 

  11. Brik B, Boutiba K, Ksentini A (2022) Deep learning for B5G open radio access network: evolution, survey, case studies, and challenges. IEEE Open J Commun Soc 3:228–250

    Article  Google Scholar 

  12. Cao Y, Lien SY, Liang YC, Chen KC, Shen X (2022) User access control in open radio access networks: a federated deep reinforcement learning approach. IEEE Trans Wirel Commun 21(6)

  13. Couto RS, Cruz P, Campista MEM, Costa LHMK (2023) Using public datasets to train O-RAN deep learning models. In: 2nd international conference on 6G networking (6GNet), pp 1–8

  14. De Bast S, Pollin S (2021) Ultra dense indoor MaMIMO CSI dataset. Available: https://doi.org/10.21227/nr6k-8r78

  15. D’Oro S, Bonati L, Polese M, Melodia T (2022) OrchestRAN: network automation through orchestrated intelligence in the open RAN. In: IEEE conference on computer communications (INFOCOM), pp 270–279

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  17. Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116

    Article  Google Scholar 

  18. Hojeij H, Sharara M, Hoteit S, Vèque V (2023) Dynamic placement of O-CU and O-DU functionalities in open-ran architecture. In: IEEE international conference on sensing, communication, and networking (SECON)

  19. Huang CW, Chiang CT, Li Q (2017) A study of deep learning networks on mobile traffic forecasting. In: IEEE annual international symposium on personal, indoor, and mobile radio communications (PIMRC), pp 1–6

  20. Lopez MA, Barbosa GNN, Mattos DMF (2022) New barriers on 6G networking: an exploratory study on the security, privacy and opportunities for aerial networks. In: 1st international conference on 6G networking (6GNet), pp 1–6

  21. Ma Y, Zhou G, Wang S (2019) WiFi sensing with channel state information: a survey. ACM Comput Surv 52(3)

  22. Moussaoui M, Bertin E, Crespi N (2022) Telecom business models for beyond 5G and 6G networks: towards disaggregation? In: 1st international conference on 6G networking (6GNet), pp 1–8

  23. O-RAN Software Community (SC) (2022) Anomaly detection. Available: https://github.com/o-ran-sc/ric-app-ad

  24. Oliveira NR, Moraes IM, de Medeiros DSV, Lopez MA, Mattos DMF (2023) An agile conflict-solving framework for intent-based management of service level agreement. In: 2nd international conference on 6G networking (6GNet), pp 1–8

  25. Orhan O, Swamy VN, Tetzlaff T, Nassar M, Nikopour H, Talwar S (2021) Connection management xAPP for O-RAN RIC: a graph neural network and reinforcement learning approach. In: IEEE international conference on machine learning and applications (ICMLA), pp 936–941

  26. Pacheco RG, Couto RS, Simeone O (2023) On the impact of deep neural network calibration on adaptive edge offloading for image classification. J Netw Comput Appl 103679

  27. Pham VQ, Thieu HT, Kak A, Choi N (2023) HexRIC: building a better near-real time network controller for the Open RAN ecosystem. In: The international workshop on mobile computing systems and applications (HotMobile ’23), pp 15–21

  28. Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2022) ColO-RAN: developing machine learning-Based xApps for Open RAN closed-loop control on programmable experimental platforms. IEEE Trans Mobile Comput

  29. Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2022) Colosseum O-RAN ColORAN dataset. Available: https://github.com/wineslab/colosseum-oran-coloran-dataset

  30. Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2023) Understanding O-RAN: architecture, interfaces, algorithms, security, and research challenges. IEEE Commun Surv Tutor 25(2):1376–1411

    Article  Google Scholar 

  31. Raca D, Zahran AH, Sreenan CJ, Sinha RK, Halepovic E, Jana R, Gopalakrishnan V (2020) On leveraging machine and deep learning for throughput prediction in cellular networks: design, performance, and challenges. IEEE Commun Mag 58(3):11–17

    Article  Google Scholar 

  32. Ranjbar V, Girycki A, Rahman MA, Pollin S, Moonen M, Vinogradov E (2022) Cell-free mMIMO support in the O-RAN architecture: a PHY layer perspective for 5G and beyond networks. IEEE Commun Stand Mag 6(1):28–34

    Article  Google Scholar 

  33. Rezazadeh F, Zanzi L, Devoti F, Chergui H, Costa-Pérez X, Verikoukis C (2023) On the specialization of FDRL agents for scalable and distributed 6G RAN slicing orchestration. IEEE Trans Veh Technol 72(3):3473–3487

    Article  Google Scholar 

  34. Salvat Lozano JX, Ayala-Romero JA, Zanzi L, Garcia-Saavedra A, Costa-Perez X (2022) O-RAN experimental evaluation datasets. https://doi.org/10.21227/64s5-q431

  35. Salvat Lozano JX, Ayala-Romero JA, Zanzi L, Garcia-Saavedra A Costa-Perez X (2023) Open radio access networks O-RAN experimentation platform: design and datasets. IEEE Commun Mag 1–7. Accepted for publication

  36. Upadhyaya PS, Abdalla AS, Marojevic V, Reed JH, Shah VK (2022) Prototyping next-generation O-RAN research testbeds with SDRs. arXiv:2205.13178

  37. Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133

    Article  Google Scholar 

  38. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor 21(3):2224–2287

    Article  Google Scholar 

Download references

Funding

This study was financed in part by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, CNPq, PR2/UFRJ, FAPERJ Grants E-26/010.002174/2019, and E-26/201.300/2021, and FAPESP Grant 15/24494-8.

Author information

Authors and Affiliations

Authors

Contributions

RSC and PC have studied the datasets and written their description; RGP has conducted the experiments of Spectrum classification use case and written the corresponding section; VMSS has conducted the experiments of Traffic classification and written the corresponding section; MEMC and LHMK have defined the methodology to apply in this work. All authors reviewed the manuscript.

Corresponding author

Correspondence to Rodrigo S. Couto.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Couto, R.S., Cruz, P., Pacheco, R.G. et al. A survey of public datasets for O-RAN: fostering the development of machine learning models. Ann. Telecommun. (2024). https://doi.org/10.1007/s12243-024-01029-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12243-024-01029-1

Keywords

Navigation