Abstract
The Internet has evolved in the last decades as a fundamental part of human culture. Human patterns are present in network traffic due to users’ activity regarding everyday tasks or other routines. Consequently, these patterns can be found in DNS (Domain Name System) traffic, as it is a critical element for the Internet’s working. The present work shows a procedure to detect and extract some of those human patterns by applying machine learning techniques on real DNS data. Network traffic retrieved from an authoritative DNS server from the ccTLD (country-code top level domain) from Chile .cl, was processed as multiple time series for pattern extraction. Particular and complex techniques have to be used in order to work with this data structure. The procedure consists of a first stage of clustering analysis, to detect groups of domains based on their activity to analyze their behavior over time and determine persistent patterns; and a second stage of association rules extraction, to retrieve specific activity differences between the groups. Finding human patterns in the data could be of high interest to researchers that analyze human behavior regarding Internet usage. Through the application of the proposed procedure, trends and patterns present in DNS traffic were detected, which showed to be consistent over different time portions of the data.
Similar content being viewed by others
References
Bortzmeyer S (2015) DNS Privacy considerations. RFC, p 7626
Bargh JA, McKenna KYA (2004) The internet and social life. Annu. Rev. Psychol. 55:573–590
Whang Z, Tseng S-S (2011) Anomaly detection of domain name system (dns) query traffic at top level domain servers. Sci Res Essays 6(18):3858–3872
Berelson B, Steiner GA (1964) Human behavior: an inventory of scientific findings
Bui N, Cesana M, Amir Hosseini S, Qi L, Malanchini I, Widmer J (2017) A survey of anticipatory mobile networking: Context-based classification, prediction methodologies, and optimization techniques. IEEE Communications Surveys & Tutorials 19(3):1790–1821
Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779
Oliveira EMR, Viana AC, Sarraute C, Brea J, Alvarez-Hamelin I (2016) On the regularity of human mobility. Pervasive and Mobile Computing 33:73–90
Wang H, Fengli X, Li Y, Zhang P, Jin D (2015) Understanding mobile traffic patterns of large scale cellular towers in urban environment. Inproceedings of the Internet Measurement Conference, pages 225–238. ACM, 2015
Madariaga D, Panza M, Bustos-Jiménez J (2018) Dns traffic forecasting using deep neural networks. In: International Conference on Machine Learning for Networking, pages 181–192. Springer
Cassisi C, Montalto P, Aliotta M, Cannata A, Pulvirenti A et al (2012) Similarity measures and dimensionality reduction techniques for time series data mining. Advances in data mining knowledge discovery and applications, pp 71–96
Tak-chung F (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
NIC Chile. Official registry for the.cl cctld
NIC Chile..cl nameservers map
Amazon (2019) Amazon alexa topsites
Mockapetris PV (1987) Rfc1035: Domain names-implementation and specification
Kaufman L, Rousseeuw PJ (1990) Partitioning around medoids (program pam). Finding groups in data:, an introduction to cluster analysis 344:68–125
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence 2:224–227
Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM
Jiawei H, Kamber M, Kaufmann M (2001) Data mining: concepts and techniques. 2001 University of Simon Fraser
Sarda-Espinosa A (2019) dtwclust: time series clustering along with optimizations for the dynamic time warping distance. R package version 5.5.4
Hahsler M, Chelluboina S, Hornik K, Buchta C (2011) The arules r-package ecosystem: analyzing interesting patterns from large transaction datasets. J Mach Learn Res 12:1977–1981
Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications springer
Hunter JD (2007) Matplotlib: a 2d graphics environment. Computing in Science & Engineering 9 (3):90–95
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. Inproceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2–11 ACM
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Panza, M., Madariaga, D. & Bustos-Jiménez, J. Extracting human behavior patterns from DNS traffic. Ann. Telecommun. 77, 407–420 (2022). https://doi.org/10.1007/s12243-021-00888-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-021-00888-2