Abstract
The lack of publicly open network traffic datasets for research purposes is hindering machine learning applications to wireless network analysis and design. In this work, a number of published traffic throughput temporal evolutions are digitized and used for traffic anomaly and change point detection. The mean traffic temporal evolutions are extracted by digitizing graphical curves and organized into a time series dataset. The procedure of reusing such dataset for testing traffic analysis models is presented. The procedure consists of the following steps. Firstly, \(\alpha\)-stable distributed traffic variations are added to the mean throughput values. Secondly, traffic anomalies are generated by analyzing traffic redistribution after switching on and off base stations and sectors. Lastly, traffic trend changes are introduced to model impact of global events such as COVID-19 pandemic. Several machine learning models are used to illustrate applicability of created dataset for detecting traffic anomalies and trend changes.
Similar content being viewed by others
Data Availability
The data used in this work is available under GNU General Public License v3.0 on GitHub site at the address indicated above.
Code Availability
The code used in this work is available under GNU General Public License v3.0 on GitHub site at the address indicated above.
References
ElNashar, A. (2014). Design, deployment and performance of 4G networks: Theory and practice. Wiley.
Xu, F., Li, Y., Wang, H., Zhang, P., & Jin, D. (2017). Understanding mobile traffic patterns of large scale cellular towers in urban environment. IEEE/ACM Transactions on Networking, 25(2), 1147–1161. https://doi.org/10.1109/TNET.2016.2623950
Klessig, H., Soszka, M., & Fettweis, G. (2015). Multi-cell flow-level performance of traffic-adaptive beamforming under realistic spatial traffic conditions. In 2015 International symposium on wireless communication systems (ISWCS) (pp. 726–730). https://doi.org/10.1109/ISWCS.2015.7454445
Montjoye, Y. A. D., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3(1), 1–5. https://doi.org/10.1038/srep01376
de Montjoye, Y.-A., Gambs, S., Blondel, V., Canright, G., De Cordes, N., Deletaille, S., Engo-Monsen, K., Garcia-Herranz, M., Kendall, J., Kerry, C., & Krings, G. (2018). On the privacy-conscientious use of mobile phone data. Scientific Data, 5(1), 1–6. https://doi.org/10.1038/sdata.2018.286
Moore, A. W., & Zuev, D. (2005). Internet traffic classification using Bayesian analysis techniques. In Proceedings of the 2005 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 50–60). https://doi.org/10.1145/1064212.1064220
MAWI. (2020). MAWI Working Group Traffic Archive. Retrieved September 15, 2023, from http://mawi.wide.ad.jp/mawi/
Xie, F., Wei, D., & Wang, Z. (2021). Traffic analysis for 5G network slice based on machine learning. EURASIP Journal on Wireless Communications and Networking, 2021, 108. https://doi.org/10.1186/s13638-021-01991-7
Shahbar, K., & Zincir-Heywood, A. N. (2017). Anon17: Network traffic dataset of anonymity services. Faculty of Computer Science Dalhousie University, Tech. Rep.
Sivanathan, A., Gharakheili, H. H., Loi, F., Radford, A., Wijenayake, C., Vishwanath, A., & Sivaraman, V. (2019). Classifying IoT devices in smart environments using network traffic characteristics. IEEE Transactions on Mobile Computing, 18, 1745–1759. https://doi.org/10.1109/TMC.2018.2866249
Alghanmi, N., Alotaibi, R., & Buhari, S. M. (2022). Machine learning approaches for anomaly detection in IoT: An overview and future research directions. Wireless Personal Communications, 122, 2309–2324. https://doi.org/10.1007/s11277-021-08994-z
Wang, R., Liu, Z., Cai, Y., Tang, D., Yang, J., & Yang, Z. (2018). Benchmark data for mobile app traffic research. In Proceedings of the 15th EAI international conference on mobile and ubiquitous systems: Computing, networking and services (pp. 402–411). https://doi.org/10.1145/3286978.3287000
Aceto, G., Ciuonzo, D., Montieri, A., Persico, V., & Pescapé, A. (2019). MIRAGE: Mobile-app traffic capture and ground-truth creation. In 2019 4th International conference on computing, communications and security (ICCCS) (pp. 1–8). https://doi.org/10.1109/CCCS.2019.8888137
Azab, A., Khasawneh, M., Alrabaee, S., Choo, K.-K.R., & Sarsour, M. (2022). Network traffic classification: Techniques, datasets, and challenges. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.09.009
Zhao, S., Zhong, J., Chen, S., & Liang, J. (2022). Comprehensive mobile traffic characterization based on a large-scale mobile traffic dataset. In X. Yuan, G. Bai, C. Alcaraz, & S. Majumdar (Eds.), Network and system security. Lecture notes in computer science (vol. 13787, pp. 214–232). Springer. https://doi.org/10.1007/978-3-031-23020-2_12
Zhao, S., Chen, S., Wang, F., Wei, Z., Zhong, J., & Liang, J. (2023). A large-scale mobile traffic dataset for mobile application identification. The Computer Journal. https://doi.org/10.1093/comjnl/bxad076
Niknam, S., Roy, A., Dhillon, H. S., Singh, S., Banerji, R., Reed, J. H., Saxena, N., & Yoon, S. (2022). Intelligent O-RAN for beyond 5G and 6G wireless networks. In 2022 IEEE GlOBECOM workshops (pp. 215–220). https://doi.org/10.1109/GCWkshps56602.2022.10008676
Cordero, C. G., Vasilomanolakis, E., Wainakh, A., Mühlhäuser, M., & Nadjm-Tehrani, S. (2021). On generating network traffic datasets with synthetic attacks for intrusion detection. ACM Transactions on Privacy and Security, 24(2), 1–39. https://doi.org/10.1145/3424155
Bagui, S. S., Mink, D., Bagui, S. C., Ghosh, T., Plenkers, R., McElroy, T., Dulaney, S., & Shabanali, S. (2023). Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT &CK framework. Data, 8(1), 18. https://doi.org/10.3390/data8010018
Oliveira, A., & Vazão, T. (2021). Generating synthetic datasets for mobile wireless networks with SUMO. In Proceedings of the 19th ACM international symposium on mobility management and wireless access (pp. 33–42). https://doi.org/10.1145/3479241.3486704
Laner, M., Svoboda, P., Schwarz, S., & Rupp, M. (2012). Users in cells: A data traffic analysis. In 2012 IEEE wireless communications and networking conference (WCNC) (pp. 3063–3068). https://doi.org/10.1109/WCNC.2012.6214330
Auer, G., Blume, O., Giannini, V., Godor, I., Imran, M. A., Jading, Y., Katranaras, E., Olsson, M., Sabella, D., Skillermark, P., & Wajda, W. (2012). EARTH Deliverable D2.3: Energy efficiency analysis of the reference systems, areas of improvements and target breakdown. Retrieved September 15, 2023, from https://cordis.europa.eu/docs/projects/cnect/3/247733/080/deliverables/001-EARTHWP2D23v2.pdf
Trinh, H. D., Bui, N., Widmer, J., Giupponi, L., & Dini, P. (2017). Analysis and modeling of mobile traffic using real traces. In 2017 IEEE 28th annual international symposium on personal, indoor, and mobile radio communications (PIMRC) (pp. 1–6). https://doi.org/10.1109/PIMRC.2017.8292200
Barlacchi, G., Nadai, M. D., Larcher, R., Casella, A., Chitic, C., Torrisi, G., Antonelli, F., Vespignani, A., Pentland, A., & Lepri, B. (2015). A multi-source dataset of urban life in the city of Milan and the Province of Trentino. Scientific Data. https://doi.org/10.1038/sdata.2015.55
Feknous, M., Houdoin, T., Guyader, B. L., De Biasio, J., Gravey, A., & Gijón, J. A. T. (2014). Internet traffic analysis: A case study from two major European operators. In 2014 IEEE symposium on computers and communications (ISCC) (pp. 1–7). https://doi.org/10.1109/ISCC.2014.6912519
Polaganga, R. K., & Liang, Q. (2015). Self-similarity and modeling of LTE/LTE-A data traffic. Measurement, 75, 218–229. https://doi.org/10.1016/j.measurement.2015.07.051
Wang, S., Zhang, X., Zhang, J., Feng, J., Wang, W, & Xin, K. (2015). An approach for spatial-temporal traffic modeling in mobile cellular networks. In 2015 27th International Teletraffic congress (pp. 203–209). https://doi.org/10.1109/ITC.2015.31
Okic, A., Redondi, A. E.C ., Galimberti, I., Foglia, F., & Venturini, L. (2019). Analyzing different mobile applications in time and space: A city-wide scenario. In 2019 IEEE wireless communications and networking conference (WCNC) (pp. 1–6). https://doi.org/10.1109/WCNC.2019.8885902
Okic, A., & Redondi, A. E. C. (2019). Forecasting mobile cellular traffic sampled at different frequencies. In 2019 12th IFIP wireless and mobile networking conference (WMNC) (pp. 189–195). https://doi.org/10.23919/WMNC.2019.8881824
Graham-Cumming, J. (2020). COVID-19 impacts on Internet traffic: Seattle, Northern Italy and South Korea. Retrieved September 15, 2023, from https://blog.cloudflare.com/covid-19-impacts-on-internet-traffic-seattle-italy-and-south-korea/
Feldmann, A., Gasser, O., Lichtblau, F., Pujol, E., Poese, I., Dietzel, C., Wagner, D., Wichtlhuber, M., Tapiador, J., Vallina-Rodriguez, N., & Hohlfeld, O. (2020). The lockdown effect: Implications of the COVID-19 pandemic on Internet traffic. In Proceedings of the ACM internet measurement conference (pp. 1–18). https://doi.org/10.1145/3419394.3423658
De Oliveira Moreira, J., Pasarkar, A., Chen, W., Hu, W., Janak, J., & Schulzrinne, H. (2020). Social distancing and the Internet: What can network performance measurements tell us? In The 48th research conference on communication, information and internet policy. https://doi.org/10.2139/ssrn.3748153
Gallardo, J. R., Makrakis, D., & Orozco-Barbosa, L. (2000). Use of \(\alpha\)-stable self-similar stochastic processes for modeling traffic in broadband networks. Performance Evaluation, 40(1), 71–98. https://doi.org/10.1016/S0166-5316(99)00070-X
Li, R., Zhao, Z., Zheng, J., Mei, C., Cai, Y., & Zhang, H. (2017). The learning and prediction of application-level traffic data in cellular networks. IEEE Transactions on Wireless Communications, 16(6), 3899–3912. https://doi.org/10.1109/TWC.2017.2689772
Qi, C., Zhao, Z., Li, R., & Zhang, H. (2016). Characterizing and modeling social mobile data traffic in cellular networks. In 2016 IEEE 83rd Vehicular technology conference (VTC Spring) (pp. 1–5). https://doi.org/10.1109/VTCSpring.2016.7504161
Claussen, H. (2005). Efficient modelling of channel maps with correlated shadow fading in mobile radio systems. In 2005 IEEE 16th international symposium on personal, indoor and mobile radio communications (vol. 1, pp. 512–516). https://doi.org/10.1109/PIMRC.2005.1651489
3GPP. (2015). LTE; Evolved universal terrestrial radio access (E-UTRA); Physical layer procedures. Technical Specification TS 36.213 version 12.4.0.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2016). Time series analysis: Forecasting and control (5th ed.). Wiley.
Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080
Shu, Y., Yu, M., Liu, J., & Yang, O. (2003). Wireless traffic modeling and prediction using seasonal ARIMA models. In IEEE international conference on communications (ICC) (vol. 3, pp. 1675–1679). https://doi.org/10.1109/ICC.2003.1203886
Hanbanchong A., & Piromsopa K. (2012). SARIMA based network bandwidth anomaly detection. In 2012 Ninth international conference on computer science and software engineering (JCSSE) (pp. 104–108). https://doi.org/10.1109/JCSSE.2012.6261934
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(1), 1–22. https://doi.org/10.18637/jss.v027.i03
Yaacob, A. H., Tan, I. K., Chien, S. F., & Tan, H. K. (2010). ARIMA based network anomaly detection. In 2010 Second international conference on communication software and networks (pp. 205–209). https://doi.org/10.1109/ICCSN.2010.55
Zhu B., & Sastry S. (2011). Revisit dynamic ARIMA based anomaly detection. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 1263–1268). https://doi.org/10.1109/PASSAT/SocialCom.2011.84
Facebook. (2020). Prophet: Automatic forecasting procedure. Retrieved September 15, 2023, from https://github.com/facebook/prophet
Ensign, D. L., & Pande, V. S. (2010). Bayesian detection of intensity changes in single molecule and molecular dynamics trajectories. The Journal of Physical Chemistry B, 114(1), 280–292. https://doi.org/10.1021/jp906786b
Stern, C. D. (2020). choderalab/cpdetect. Retrieved September 15, 2023, from https://github.com/choderalab/cpdetect
Truong, C., Oudre, L., & Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167, 107299. https://doi.org/10.1016/j.sigpro.2019.107299
Truong, C. (2020). deepcharles/ruptures. Retrieved September 15, 2023, from https://github.com/deepcharles/ruptures
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 226–231).
Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. https://doi.org/10.1021/ac60214a047
Limthong, K., & Tawsook, T. (2012). Network traffic anomaly detection using machine learning approaches. In 2012 IEEE network operations and management symposium (pp. 542–545). https://doi.org/10.1109/NOMS.2012.6211951
Jia, W., Shukla, R. M., & Sengupta, S. (2019). Anomaly detection using supervised learning and multiple statistical methods. In 2019 18th IEEE international conference on machine learning and applications (ICMLA) (pp. 1291–1297). https://doi.org/10.1109/ICMLA.2019.00211
Trinh, H. D., Zeydan, E., Giupponi, L., & Dini, P. (2019). Detecting mobile traffic anomalies through physical control channel fingerprinting: A deep semi-supervised approach. IEEE Access, 7, 152187–152201. https://doi.org/10.1109/ACCESS.2019.2947742
Burgueño, J., de-la-Bandera, I., Mendoza, J., Palacios, D., Morillas, C., & Barco, R. (2020). Online anomaly detection system for mobile networks. Sensors, 20, 7232–7250. https://doi.org/10.3390/s20247232
Cortes-Polo, D., Jimenez, L. I., Paoletti, M. E., Calle-Cancho, J., & Rico-Gallego, J. A. (2023). Orthogonal projection for anomaly detection in networking datasets. Journal of Ambient Intelligence and Humanized Computing, 14, 7957–7966. https://doi.org/10.1007/s12652-023-04605-w
Trujillo, J. A., de-la-Bandera, I., Burgueño, J., Palacios, D., Baena, E., & Barco, R. (2023). Active learning methodology for expert-assisted anomaly detection in mobile communications. Sensors, 23, 126–138. https://doi.org/10.3390/s23010126
Ericsson. (2020). Ericsson Mobility Report, June 2020. Retrieved September 15, 2023, from https://www.ericsson.com/49da93/assets/local/mobility-report/documents/2020/june2020-ericsson-mobility-report.pdf
Wood, R. (2020). COVID-19: Operators should be concerned about the robustness of networks rather than capacity. Retrieved September 15, 2023, from https://www.analysysmason.com/research/content/comments/covid19-robustness-networks-rdnt0/
GSMA Europe. (2020). COVID-19 network traffic surge isn’t impacting environment confirm telecom operators. Retrieved September 15, 2023, from https://www.gsma.com/gsmaeurope/latest-news-2/covid-19-network-traffic-surge-isnt-impacting-environment-confirm-telecom-operators/
Takeuchi, J., & Yamanishi, K. (2006). A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering, 18(4), 482–492. https://doi.org/10.1109/TKDE.2006.1599387
Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51(2), 339–367. https://doi.org/10.1007/s10115-016-0987-z
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
RA: designed the methodology, prepared synthetic traffic dataset, simulated LTE network traffic distribution and wrote the main part of the manuscript; DG: performed numerical analysis of traffic time series and contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aleksiejunas, R., Garuolis, D. Usage of Published Network Traffic Datasets for Anomaly and Change Point Detection. Wireless Pers Commun 133, 1281–1303 (2023). https://doi.org/10.1007/s11277-023-10816-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-023-10816-3