Abstract
The O-RAN architecture allows for unprecedented flexibility in Radio Access Networks (RANs). O-RAN’s components designed to control RANs, such as RAN Intelligent Controllers (RICs), places intelligence at the center of the management and orchestration of 5 G/6 G cellular networks. RICs run applications based on machine learning models, which require massive RAN data for training. Nonetheless, building testbeds to collect these data is challenging since RANs use expensive hardware and operate under a licensed spectrum, usually not available for the academy. Even though producing RAN datasets is challenging, some research groups have already made their data available. In this paper, we survey the primary public datasets available online that are considered in O-RAN papers. We identify the main characteristics and purpose of each dataset, contributing with a complement to their documentation. Also, we empirically showcase the viability of using publicly available datasets for machine learning applications within the O-RAN domain, such as spectrum and traffic classification.
Similar content being viewed by others
Data Availability
No datasets were generated or analyzed during the current study.
Notes
A preliminary version of this work appeared in [13]
The work in [8] presents a slightly different UE distribution for the traffic types. However, after analyzing the data, we can state that the distribution reported here is the actual one employed by the public dataset.
The dataset is available inside the xApp’s source code: https://github.com/o-ran-sc/ric-app-ad/blob/master/src/ue.csv
References
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74
Amini M, Stanica R, Rosenberg C (2023) Where are the (cellular) data? ACM Comput Surv 56(2)
Arnaz A, Lipman J, Abolhasan M, Hiltunen M (2022) Toward integrating intelligence and programmability in open radio access networks: a comprehensive survey. IEEE Access 10:67747–67770
Baldesi L, Restuccia F, Melodia T (2021) ChARM (Channel-aware reactive mechanism) dataset. Available: http://hdl.handle.net/2047/D20423481
Baldesi L, Restuccia F, Melodia T (2022) ChARM: NextG spectrum sharing through data-driven real-time O-RAN dynamic control. In: IEEE conference on computer communications (INFOCOM), pp 240–249
Bartsiokas IA, Gkonis PK, Kaklamani DI, Venieris IS (2022) ML-based radio resource management in 5g and beyond networks: a survey. IEEE Access 10:83507–83528
Bezerra GMG, Ferreira TN, Mattos DMF (2022) Assessing software-defined radio security and performance in virtualized environments for cloud radio access networks. In: Cyber security in networking conference (CSNet), pp 1–7
Bonati L, D’Oro S, Polese M, Basagni S, Melodia T (2021) Intelligence and learning in O-RAN for data-driven NextG cellular networks. IEEE Commun Mag 59(10):21–27
Bonati L, D’Oro S, Polese M, Basagni S, Melodia T (2022) Colosseum O-RAN COMMAG dataset. Available: https://github.com/wineslab/colosseum-oran-commag-dataset
Bonati L, Polese M, D’Oro S, Basagni S, Melodia T (2023) OpenRAN Gym: AI/ML development, data collection, and testing for O-RAN on PAWR platforms. Comput Netw 220:109502
Brik B, Boutiba K, Ksentini A (2022) Deep learning for B5G open radio access network: evolution, survey, case studies, and challenges. IEEE Open J Commun Soc 3:228–250
Cao Y, Lien SY, Liang YC, Chen KC, Shen X (2022) User access control in open radio access networks: a federated deep reinforcement learning approach. IEEE Trans Wirel Commun 21(6)
Couto RS, Cruz P, Campista MEM, Costa LHMK (2023) Using public datasets to train O-RAN deep learning models. In: 2nd international conference on 6G networking (6GNet), pp 1–8
De Bast S, Pollin S (2021) Ultra dense indoor MaMIMO CSI dataset. Available: https://doi.org/10.21227/nr6k-8r78
D’Oro S, Bonati L, Polese M, Melodia T (2022) OrchestRAN: network automation through orchestrated intelligence in the open RAN. In: IEEE conference on computer communications (INFOCOM), pp 270–279
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116
Hojeij H, Sharara M, Hoteit S, Vèque V (2023) Dynamic placement of O-CU and O-DU functionalities in open-ran architecture. In: IEEE international conference on sensing, communication, and networking (SECON)
Huang CW, Chiang CT, Li Q (2017) A study of deep learning networks on mobile traffic forecasting. In: IEEE annual international symposium on personal, indoor, and mobile radio communications (PIMRC), pp 1–6
Lopez MA, Barbosa GNN, Mattos DMF (2022) New barriers on 6G networking: an exploratory study on the security, privacy and opportunities for aerial networks. In: 1st international conference on 6G networking (6GNet), pp 1–6
Ma Y, Zhou G, Wang S (2019) WiFi sensing with channel state information: a survey. ACM Comput Surv 52(3)
Moussaoui M, Bertin E, Crespi N (2022) Telecom business models for beyond 5G and 6G networks: towards disaggregation? In: 1st international conference on 6G networking (6GNet), pp 1–8
O-RAN Software Community (SC) (2022) Anomaly detection. Available: https://github.com/o-ran-sc/ric-app-ad
Oliveira NR, Moraes IM, de Medeiros DSV, Lopez MA, Mattos DMF (2023) An agile conflict-solving framework for intent-based management of service level agreement. In: 2nd international conference on 6G networking (6GNet), pp 1–8
Orhan O, Swamy VN, Tetzlaff T, Nassar M, Nikopour H, Talwar S (2021) Connection management xAPP for O-RAN RIC: a graph neural network and reinforcement learning approach. In: IEEE international conference on machine learning and applications (ICMLA), pp 936–941
Pacheco RG, Couto RS, Simeone O (2023) On the impact of deep neural network calibration on adaptive edge offloading for image classification. J Netw Comput Appl 103679
Pham VQ, Thieu HT, Kak A, Choi N (2023) HexRIC: building a better near-real time network controller for the Open RAN ecosystem. In: The international workshop on mobile computing systems and applications (HotMobile ’23), pp 15–21
Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2022) ColO-RAN: developing machine learning-Based xApps for Open RAN closed-loop control on programmable experimental platforms. IEEE Trans Mobile Comput
Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2022) Colosseum O-RAN ColORAN dataset. Available: https://github.com/wineslab/colosseum-oran-coloran-dataset
Polese M, Bonati L, D’Oro S, Basagni S, Melodia T (2023) Understanding O-RAN: architecture, interfaces, algorithms, security, and research challenges. IEEE Commun Surv Tutor 25(2):1376–1411
Raca D, Zahran AH, Sreenan CJ, Sinha RK, Halepovic E, Jana R, Gopalakrishnan V (2020) On leveraging machine and deep learning for throughput prediction in cellular networks: design, performance, and challenges. IEEE Commun Mag 58(3):11–17
Ranjbar V, Girycki A, Rahman MA, Pollin S, Moonen M, Vinogradov E (2022) Cell-free mMIMO support in the O-RAN architecture: a PHY layer perspective for 5G and beyond networks. IEEE Commun Stand Mag 6(1):28–34
Rezazadeh F, Zanzi L, Devoti F, Chergui H, Costa-Pérez X, Verikoukis C (2023) On the specialization of FDRL agents for scalable and distributed 6G RAN slicing orchestration. IEEE Trans Veh Technol 72(3):3473–3487
Salvat Lozano JX, Ayala-Romero JA, Zanzi L, Garcia-Saavedra A, Costa-Perez X (2022) O-RAN experimental evaluation datasets. https://doi.org/10.21227/64s5-q431
Salvat Lozano JX, Ayala-Romero JA, Zanzi L, Garcia-Saavedra A Costa-Perez X (2023) Open radio access networks O-RAN experimentation platform: design and datasets. IEEE Commun Mag 1–7. Accepted for publication
Upadhyaya PS, Abdalla AS, Marojevic V, Reed JH, Shah VK (2022) Prototyping next-generation O-RAN research testbeds with SDRs. arXiv:2205.13178
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognit 90:119–133
Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor 21(3):2224–2287
Funding
This study was financed in part by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, CNPq, PR2/UFRJ, FAPERJ Grants E-26/010.002174/2019, and E-26/201.300/2021, and FAPESP Grant 15/24494-8.
Author information
Authors and Affiliations
Contributions
RSC and PC have studied the datasets and written their description; RGP has conducted the experiments of Spectrum classification use case and written the corresponding section; VMSS has conducted the experiments of Traffic classification and written the corresponding section; MEMC and LHMK have defined the methodology to apply in this work. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Couto, R.S., Cruz, P., Pacheco, R.G. et al. A survey of public datasets for O-RAN: fostering the development of machine learning models. Ann. Telecommun. (2024). https://doi.org/10.1007/s12243-024-01029-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12243-024-01029-1