Skip to main content
Log in

Usage of Published Network Traffic Datasets for Anomaly and Change Point Detection

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The lack of publicly open network traffic datasets for research purposes is hindering machine learning applications to wireless network analysis and design. In this work, a number of published traffic throughput temporal evolutions are digitized and used for traffic anomaly and change point detection. The mean traffic temporal evolutions are extracted by digitizing graphical curves and organized into a time series dataset. The procedure of reusing such dataset for testing traffic analysis models is presented. The procedure consists of the following steps. Firstly, \(\alpha\)-stable distributed traffic variations are added to the mean throughput values. Secondly, traffic anomalies are generated by analyzing traffic redistribution after switching on and off base stations and sectors. Lastly, traffic trend changes are introduced to model impact of global events such as COVID-19 pandemic. Several machine learning models are used to illustrate applicability of created dataset for detecting traffic anomalies and trend changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The data used in this work is available under GNU General Public License v3.0 on GitHub site at the address indicated above.

Code Availability

The code used in this work is available under GNU General Public License v3.0 on GitHub site at the address indicated above.

References

  1. ElNashar, A. (2014). Design, deployment and performance of 4G networks: Theory and practice. Wiley.

  2. Xu, F., Li, Y., Wang, H., Zhang, P., & Jin, D. (2017). Understanding mobile traffic patterns of large scale cellular towers in urban environment. IEEE/ACM Transactions on Networking, 25(2), 1147–1161. https://doi.org/10.1109/TNET.2016.2623950

    Article  Google Scholar 

  3. Klessig, H., Soszka, M., & Fettweis, G. (2015). Multi-cell flow-level performance of traffic-adaptive beamforming under realistic spatial traffic conditions. In 2015 International symposium on wireless communication systems (ISWCS) (pp. 726–730). https://doi.org/10.1109/ISWCS.2015.7454445

  4. Montjoye, Y. A. D., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3(1), 1–5. https://doi.org/10.1038/srep01376

    Article  Google Scholar 

  5. de Montjoye, Y.-A., Gambs, S., Blondel, V., Canright, G., De Cordes, N., Deletaille, S., Engo-Monsen, K., Garcia-Herranz, M., Kendall, J., Kerry, C., & Krings, G. (2018). On the privacy-conscientious use of mobile phone data. Scientific Data, 5(1), 1–6. https://doi.org/10.1038/sdata.2018.286

    Article  Google Scholar 

  6. Moore, A. W., & Zuev, D. (2005). Internet traffic classification using Bayesian analysis techniques. In Proceedings of the 2005 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 50–60). https://doi.org/10.1145/1064212.1064220

  7. MAWI. (2020). MAWI Working Group Traffic Archive. Retrieved September 15, 2023, from http://mawi.wide.ad.jp/mawi/

  8. Xie, F., Wei, D., & Wang, Z. (2021). Traffic analysis for 5G network slice based on machine learning. EURASIP Journal on Wireless Communications and Networking, 2021, 108. https://doi.org/10.1186/s13638-021-01991-7

    Article  Google Scholar 

  9. Shahbar, K., & Zincir-Heywood, A. N. (2017). Anon17: Network traffic dataset of anonymity services. Faculty of Computer Science Dalhousie University, Tech. Rep.

  10. Sivanathan, A., Gharakheili, H. H., Loi, F., Radford, A., Wijenayake, C., Vishwanath, A., & Sivaraman, V. (2019). Classifying IoT devices in smart environments using network traffic characteristics. IEEE Transactions on Mobile Computing, 18, 1745–1759. https://doi.org/10.1109/TMC.2018.2866249

    Article  Google Scholar 

  11. Alghanmi, N., Alotaibi, R., & Buhari, S. M. (2022). Machine learning approaches for anomaly detection in IoT: An overview and future research directions. Wireless Personal Communications, 122, 2309–2324. https://doi.org/10.1007/s11277-021-08994-z

    Article  Google Scholar 

  12. Wang, R., Liu, Z., Cai, Y., Tang, D., Yang, J., & Yang, Z. (2018). Benchmark data for mobile app traffic research. In Proceedings of the 15th EAI international conference on mobile and ubiquitous systems: Computing, networking and services (pp. 402–411). https://doi.org/10.1145/3286978.3287000

  13. Aceto, G., Ciuonzo, D., Montieri, A., Persico, V., & Pescapé, A. (2019). MIRAGE: Mobile-app traffic capture and ground-truth creation. In 2019 4th International conference on computing, communications and security (ICCCS) (pp. 1–8). https://doi.org/10.1109/CCCS.2019.8888137

  14. Azab, A., Khasawneh, M., Alrabaee, S., Choo, K.-K.R., & Sarsour, M. (2022). Network traffic classification: Techniques, datasets, and challenges. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.09.009

    Article  Google Scholar 

  15. Zhao, S., Zhong, J., Chen, S., & Liang, J. (2022). Comprehensive mobile traffic characterization based on a large-scale mobile traffic dataset. In X. Yuan, G. Bai, C. Alcaraz, & S. Majumdar (Eds.), Network and system security. Lecture notes in computer science (vol. 13787, pp. 214–232). Springer. https://doi.org/10.1007/978-3-031-23020-2_12

  16. Zhao, S., Chen, S., Wang, F., Wei, Z., Zhong, J., & Liang, J. (2023). A large-scale mobile traffic dataset for mobile application identification. The Computer Journal. https://doi.org/10.1093/comjnl/bxad076

    Article  Google Scholar 

  17. Niknam, S., Roy, A., Dhillon, H. S., Singh, S., Banerji, R., Reed, J. H., Saxena, N., & Yoon, S. (2022). Intelligent O-RAN for beyond 5G and 6G wireless networks. In 2022 IEEE GlOBECOM workshops (pp. 215–220). https://doi.org/10.1109/GCWkshps56602.2022.10008676

  18. Cordero, C. G., Vasilomanolakis, E., Wainakh, A., Mühlhäuser, M., & Nadjm-Tehrani, S. (2021). On generating network traffic datasets with synthetic attacks for intrusion detection. ACM Transactions on Privacy and Security, 24(2), 1–39. https://doi.org/10.1145/3424155

    Article  Google Scholar 

  19. Bagui, S. S., Mink, D., Bagui, S. C., Ghosh, T., Plenkers, R., McElroy, T., Dulaney, S., & Shabanali, S. (2023). Introducing UWF-ZeekData22: A comprehensive network traffic dataset based on the MITRE ATT &CK framework. Data, 8(1), 18. https://doi.org/10.3390/data8010018

    Article  Google Scholar 

  20. Oliveira, A., & Vazão, T. (2021). Generating synthetic datasets for mobile wireless networks with SUMO. In Proceedings of the 19th ACM international symposium on mobility management and wireless access (pp. 33–42). https://doi.org/10.1145/3479241.3486704

  21. Laner, M., Svoboda, P., Schwarz, S., & Rupp, M. (2012). Users in cells: A data traffic analysis. In 2012 IEEE wireless communications and networking conference (WCNC) (pp. 3063–3068). https://doi.org/10.1109/WCNC.2012.6214330

  22. Auer, G., Blume, O., Giannini, V., Godor, I., Imran, M. A., Jading, Y., Katranaras, E., Olsson, M., Sabella, D., Skillermark, P., & Wajda, W. (2012). EARTH Deliverable D2.3: Energy efficiency analysis of the reference systems, areas of improvements and target breakdown. Retrieved September 15, 2023, from https://cordis.europa.eu/docs/projects/cnect/3/247733/080/deliverables/001-EARTHWP2D23v2.pdf

  23. Trinh, H. D., Bui, N., Widmer, J., Giupponi, L., & Dini, P. (2017). Analysis and modeling of mobile traffic using real traces. In 2017 IEEE 28th annual international symposium on personal, indoor, and mobile radio communications (PIMRC) (pp. 1–6). https://doi.org/10.1109/PIMRC.2017.8292200

  24. Barlacchi, G., Nadai, M. D., Larcher, R., Casella, A., Chitic, C., Torrisi, G., Antonelli, F., Vespignani, A., Pentland, A., & Lepri, B. (2015). A multi-source dataset of urban life in the city of Milan and the Province of Trentino. Scientific Data. https://doi.org/10.1038/sdata.2015.55

    Article  Google Scholar 

  25. Feknous, M., Houdoin, T., Guyader, B. L., De Biasio, J., Gravey, A., & Gijón, J. A. T. (2014). Internet traffic analysis: A case study from two major European operators. In 2014 IEEE symposium on computers and communications (ISCC) (pp. 1–7). https://doi.org/10.1109/ISCC.2014.6912519

  26. Polaganga, R. K., & Liang, Q. (2015). Self-similarity and modeling of LTE/LTE-A data traffic. Measurement, 75, 218–229. https://doi.org/10.1016/j.measurement.2015.07.051

    Article  Google Scholar 

  27. Wang, S., Zhang, X., Zhang, J., Feng, J., Wang, W, & Xin, K. (2015). An approach for spatial-temporal traffic modeling in mobile cellular networks. In 2015 27th International Teletraffic congress (pp. 203–209). https://doi.org/10.1109/ITC.2015.31

  28. Okic, A., Redondi, A. E.C ., Galimberti, I., Foglia, F., & Venturini, L. (2019). Analyzing different mobile applications in time and space: A city-wide scenario. In 2019 IEEE wireless communications and networking conference (WCNC) (pp. 1–6). https://doi.org/10.1109/WCNC.2019.8885902

  29. Okic, A., & Redondi, A. E. C. (2019). Forecasting mobile cellular traffic sampled at different frequencies. In 2019 12th IFIP wireless and mobile networking conference (WMNC) (pp. 189–195). https://doi.org/10.23919/WMNC.2019.8881824

  30. Graham-Cumming, J. (2020). COVID-19 impacts on Internet traffic: Seattle, Northern Italy and South Korea. Retrieved September 15, 2023, from https://blog.cloudflare.com/covid-19-impacts-on-internet-traffic-seattle-italy-and-south-korea/

  31. Feldmann, A., Gasser, O., Lichtblau, F., Pujol, E., Poese, I., Dietzel, C., Wagner, D., Wichtlhuber, M., Tapiador, J., Vallina-Rodriguez, N., & Hohlfeld, O. (2020). The lockdown effect: Implications of the COVID-19 pandemic on Internet traffic. In Proceedings of the ACM internet measurement conference (pp. 1–18). https://doi.org/10.1145/3419394.3423658

  32. De Oliveira Moreira, J., Pasarkar, A., Chen, W., Hu, W., Janak, J., & Schulzrinne, H. (2020). Social distancing and the Internet: What can network performance measurements tell us? In The 48th research conference on communication, information and internet policy. https://doi.org/10.2139/ssrn.3748153

  33. Gallardo, J. R., Makrakis, D., & Orozco-Barbosa, L. (2000). Use of \(\alpha\)-stable self-similar stochastic processes for modeling traffic in broadband networks. Performance Evaluation, 40(1), 71–98. https://doi.org/10.1016/S0166-5316(99)00070-X

    Article  Google Scholar 

  34. Li, R., Zhao, Z., Zheng, J., Mei, C., Cai, Y., & Zhang, H. (2017). The learning and prediction of application-level traffic data in cellular networks. IEEE Transactions on Wireless Communications, 16(6), 3899–3912. https://doi.org/10.1109/TWC.2017.2689772

  35. Qi, C., Zhao, Z., Li, R., & Zhang, H. (2016). Characterizing and modeling social mobile data traffic in cellular networks. In 2016 IEEE 83rd Vehicular technology conference (VTC Spring) (pp. 1–5). https://doi.org/10.1109/VTCSpring.2016.7504161

  36. Claussen, H. (2005). Efficient modelling of channel maps with correlated shadow fading in mobile radio systems. In 2005 IEEE 16th international symposium on personal, indoor and mobile radio communications (vol. 1, pp. 512–516). https://doi.org/10.1109/PIMRC.2005.1651489

  37. 3GPP. (2015). LTE; Evolved universal terrestrial radio access (E-UTRA); Physical layer procedures. Technical Specification TS 36.213 version 12.4.0.

  38. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2016). Time series analysis: Forecasting and control (5th ed.). Wiley.

  39. Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080

    Article  MathSciNet  Google Scholar 

  40. Shu, Y., Yu, M., Liu, J., & Yang, O. (2003). Wireless traffic modeling and prediction using seasonal ARIMA models. In IEEE international conference on communications (ICC) (vol. 3, pp. 1675–1679). https://doi.org/10.1109/ICC.2003.1203886

  41. Hanbanchong A., & Piromsopa K. (2012). SARIMA based network bandwidth anomaly detection. In 2012 Ninth international conference on computer science and software engineering (JCSSE) (pp. 104–108). https://doi.org/10.1109/JCSSE.2012.6261934

  42. Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(1), 1–22. https://doi.org/10.18637/jss.v027.i03

    Article  Google Scholar 

  43. Yaacob, A. H., Tan, I. K., Chien, S. F., & Tan, H. K. (2010). ARIMA based network anomaly detection. In 2010 Second international conference on communication software and networks (pp. 205–209). https://doi.org/10.1109/ICCSN.2010.55

  44. Zhu B., & Sastry S. (2011). Revisit dynamic ARIMA based anomaly detection. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 1263–1268). https://doi.org/10.1109/PASSAT/SocialCom.2011.84

  45. Facebook. (2020). Prophet: Automatic forecasting procedure. Retrieved September 15, 2023, from https://github.com/facebook/prophet

  46. Ensign, D. L., & Pande, V. S. (2010). Bayesian detection of intensity changes in single molecule and molecular dynamics trajectories. The Journal of Physical Chemistry B, 114(1), 280–292. https://doi.org/10.1021/jp906786b

    Article  Google Scholar 

  47. Stern, C. D. (2020). choderalab/cpdetect. Retrieved September 15, 2023, from https://github.com/choderalab/cpdetect

  48. Truong, C., Oudre, L., & Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167, 107299. https://doi.org/10.1016/j.sigpro.2019.107299

    Article  Google Scholar 

  49. Truong, C. (2020). deepcharles/ruptures. Retrieved September 15, 2023, from https://github.com/deepcharles/ruptures

  50. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 226–231).

  51. Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. https://doi.org/10.1021/ac60214a047

    Article  Google Scholar 

  52. Limthong, K., & Tawsook, T. (2012). Network traffic anomaly detection using machine learning approaches. In 2012 IEEE network operations and management symposium (pp. 542–545). https://doi.org/10.1109/NOMS.2012.6211951

  53. Jia, W., Shukla, R. M., & Sengupta, S. (2019). Anomaly detection using supervised learning and multiple statistical methods. In 2019 18th IEEE international conference on machine learning and applications (ICMLA) (pp. 1291–1297). https://doi.org/10.1109/ICMLA.2019.00211

  54. Trinh, H. D., Zeydan, E., Giupponi, L., & Dini, P. (2019). Detecting mobile traffic anomalies through physical control channel fingerprinting: A deep semi-supervised approach. IEEE Access, 7, 152187–152201. https://doi.org/10.1109/ACCESS.2019.2947742

    Article  Google Scholar 

  55. Burgueño, J., de-la-Bandera, I., Mendoza, J., Palacios, D., Morillas, C., & Barco, R. (2020). Online anomaly detection system for mobile networks. Sensors, 20, 7232–7250. https://doi.org/10.3390/s20247232

  56. Cortes-Polo, D., Jimenez, L. I., Paoletti, M. E., Calle-Cancho, J., & Rico-Gallego, J. A. (2023). Orthogonal projection for anomaly detection in networking datasets. Journal of Ambient Intelligence and Humanized Computing, 14, 7957–7966. https://doi.org/10.1007/s12652-023-04605-w

    Article  Google Scholar 

  57. Trujillo, J. A., de-la-Bandera, I., Burgueño, J., Palacios, D., Baena, E., & Barco, R. (2023). Active learning methodology for expert-assisted anomaly detection in mobile communications. Sensors, 23, 126–138. https://doi.org/10.3390/s23010126

  58. Ericsson. (2020). Ericsson Mobility Report, June 2020. Retrieved September 15, 2023, from https://www.ericsson.com/49da93/assets/local/mobility-report/documents/2020/june2020-ericsson-mobility-report.pdf

  59. Wood, R. (2020). COVID-19: Operators should be concerned about the robustness of networks rather than capacity. Retrieved September 15, 2023, from https://www.analysysmason.com/research/content/comments/covid19-robustness-networks-rdnt0/

  60. GSMA Europe. (2020). COVID-19 network traffic surge isn’t impacting environment confirm telecom operators. Retrieved September 15, 2023, from https://www.gsma.com/gsmaeurope/latest-news-2/covid-19-network-traffic-surge-isnt-impacting-environment-confirm-telecom-operators/

  61. Takeuchi, J., & Yamanishi, K. (2006). A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering, 18(4), 482–492. https://doi.org/10.1109/TKDE.2006.1599387

    Article  Google Scholar 

  62. Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51(2), 339–367. https://doi.org/10.1007/s10115-016-0987-z

    Article  Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

RA: designed the methodology, prepared synthetic traffic dataset, simulated LTE network traffic distribution and wrote the main part of the manuscript; DG: performed numerical analysis of traffic time series and contributed to the writing of the manuscript.

Corresponding author

Correspondence to Rimvydas Aleksiejunas.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aleksiejunas, R., Garuolis, D. Usage of Published Network Traffic Datasets for Anomaly and Change Point Detection. Wireless Pers Commun 133, 1281–1303 (2023). https://doi.org/10.1007/s11277-023-10816-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-023-10816-3

Keywords

Navigation