Abstract
Firewalls are primary components for ensuring the network and information security. For this purpose, they are deployed in all commercial, governmental and military networks as well as other large-scale networks. The security policies in an institution are implemented as firewall rules. An anomaly in these rules may lead to serious security gaps. When the network is large and policies are complicated, manual cross-check may be insufficient to detect anomalies. In this paper, an automated model based on machine learning and high performance computing methods is proposed for the detection of anomalies in firewall rule repository. To achieve this, firewall logs are analysed and the extracted features are fed to a set of machine learning classification algorithms including Naive Bayes, kNN, Decision Table and HyperPipes. F-measure, which combines precision and recall, is used for performance evaluation. In the experiments, kNN has shown the best performance. Then, a model based on the F-measure distribution was envisaged. 93 firewall rules were analysed via this model. The model anticipated that 6 firewall rules cause anomaly. These problematic rules were checked against the security reports prepared by experts and each of them are verified to be an anomaly. This paper shows that anomalies in firewall rules can be detected by analysing large scale log files automatically with machine learning methods, which enables avoiding security breaches, saving dramatic amount of expert effort and timely intervention.
Similar content being viewed by others
References
Al-Shaer, E. (2004). Managing firewall and network-edge security policies. In 2004 IEEE/IFIP Network Operations and Management Symposium (Vol. 1, p. 926). Seoul: IEEE. doi:10.1109/NOMS.2004.1317810.
Al-Shaer, E., Hamed, H., Boutaba, R., & Hasan, M. (2005). Conflict classification and analysis of distributed firewall policies. IEEE Journal on Selected Areas in Communications, 23(10), 2069–2084. doi:10.1109/JSAC.2005.854119.
Al-Shaer, E. S., & Hamed, H. H. (2003). Firewall policy advisor for anomaly discovery and rule editing. In G. Goldszmidt & J. Schnwlder (Eds.), Integrated network management VIII: Managing it all (p. 1730). Boston, MA: Springer. doi:10.1007/978-0-387-35674-7.
Alpaydın, E. (2010). Introduction to machine learning (2nd ed.). Cambridge, MA, London: MIT Press.
Breier, J., & Branišová, J. (2015). A dynamic rule creation based anomaly detection method for identifying security breaches in log records. Wireless Personal Communications,. doi:10.1007/s11277-015-3128-1.
Caruso, C., Malerba, D., & Papagni, D. (2005). Learning the daily model of network traffic. In Foundations of Intelligent Systems(pp. 131–141). Saratoga Springs, NY. http://link.springer.com/chapter/10.1007/11425274_14.
Chen, N., Shou, G., Hu, Y., & Guo, Z. (2009). An experimental research of traffic identification algorithms in broadband network. In 2009 International Symposium on Computer Network and Multimedia Technology(pp. 1–4). Wuhan: IEEE. doi:10.1109/CNMT.2009.5374758.
Chmura Kraemer, H., Periyakoil, V. S., & Noda, A. (2002). Kappa coefficients in medical research. Statistics in Medicine, 21(14), 2109–2129. doi:10.1002/sim.1180.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi:10.1177/001316446002000104.
Cover, T., & Hart, P. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13(1), 2127. doi:10.1109/TIT.1967.1053964.
Eisenstein, J., & Davis, R. (2004). Visual and linguistic information in gesture classification. In Proceedings of the 6th International Conference on Multimodal Interfaces—ICMI04, (p. 113). New York, NY: ACM Press. doi:10.1145/1027933.1027954.
Frei, A., & Rennhard, M. (2008). Histogram matrix: Log file visualization for anomaly detection. In ARES 2008—3rd International Conference on Availability, Security, and Reliability, Proceedings (pp. 610–617). doi:10.1109/ARES.2008.148.
Golnabi, K., Min, R. K., Khan, L., & Al-Shaer, E. (2006). Analysis of firewall policy rules using data mining techniques. In 10th IEEE/IFIP Network Operations and Management Symposium NOMS 2006 (Vol. 5, pp. 305–315). IEEE. doi:10.1109/NOMS.2006.1687561.
Gouda, M. G., & Liu, A. X. (2007). Structured firewall design. Computer Networks, 51(4), 1106–1120. doi:10.1016/j.comnet.2006.06.015.
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. In J. Gray (Ed.), Data mining: Concepts and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63–91.
Hu, H., Ahn, G. J., & Kulkarni, K. (2012). Detecting and resolving firewall policy anomalies. IEEE Transactions on Dependable and Secure Computing, 9(3), 318–331. doi:10.1109/TDSC.2012.20.
Hunt, R. (1998). Internet/intranet firewall security-policy, architecture and transaction services. Computer Communications, 21(13), 1107–1123. doi:10.1016/S0140-3664(98)00173-X.
Kerdegari, H., Samsudin, K., Ramli, A. R., & Mokaram, S. (2012). Evaluation of fall detection classification approaches. In 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012) (Vol. 1, pp. 131–136). Kuala Lumpur: IEEE. doi:10.1109/ICIAS.2012.6306174.
Khan, B., Khan, M. K., Mahmud, M., & Alghathbar, K. S. (2010). Security analysis of firewall rule sets in computer networks. In 2010 Fourth International Conference on Emerging Security Information, Systems and Technologies (pp. 51–56). Venice: IEEE. doi:10.1109/SECURWARE.2010.16.
Kowalski, K., & Beheshti, M. (2006). Analysis of log files intersections for security enhancement. In Third International Conference on Information Technology: New Generations (ITNG06) (pp. 452–457). Las Vegas: IEEE. doi:10.1109/ITNG.2006.32
Lai, K., & Kelley, K. (2011). Accuracy in parameter estimation for targeted effects in structural equation modeling: Sample size planning for narrow confidence intervals. Psychological Methods, 16(2), 127–148. doi:10.1037/a0021764.
Liu, A. X. (2012). Firewall policy change-impact analysis. ACM Transactions on Internet Technology, 11(4), 1–24. doi:10.1145/2109211.2109212.
Maratea, A., Petrosino, A., & Manzo, M. (2014). Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 257, 331–341. doi:10.1016/j.ins.2013.04.016.
Moses, K. P., & Devadas, M. D. (2012). An approach to reduce root mean square error in toposheets. European Journal of Scientific Researach, 91(2), 268–274.
Nilsson, N. J. (1998). Introduction to Machine Learning. Stanford, CA. Retrieved from http://robotics.stanford.edu/people/nilsson/mlbook.html.
NIST. (2016). National Vulnerability Database. Technical report, National Institute of Standarts and Information Technology Laboratory, Gaithersburg, MD. https://nvd.nist.gov/home.cfm.
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques(1st edn.). Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-76917-0.
Parker, A., de Cortázar-Atauri, I. G., Chuine, I., Barbeau, G., Bois, B., Boursiquot, J. M., et al. (2013). Classification of varieties for their timing of flowering and veraison using a modelling approach: A case study for the grapevine species Vitis vinifera L. Agricultural and Forest Meteorology, 180, 249–264. doi:10.1016/j.agrformet.2013.06.005.
Pietraszek, T., & Tanner, A. (2005). Data mining and machine learning towards reducing false positives in intrusion detection. Information Security Technical Report, 10(3), 169–183. doi:10.1016/j.istr.2005.07.001.
Shinder, T. W., Amon, C., Shimonski, R. J., & Shinder, D. L. (2003). The best damn firewall book period. Rockland, MA: Syngress Publishing. doi:10.1016/B978-193183690-6/50046-7.
Smith, M. C., & Peterson, G. D. (2005). Parallel application performance on shared high performance reconfigurable computing resources. Performance Evaluation, 60(1–4), 107–125. doi:10.1016/j.peva.2004.10.004.
Smusz, S., Kurczab, R., & Bojarski, A. J. (2013). A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds. Chemometrics and Intelligent Laboratory Systems, 128, 89–100. doi:10.1016/j.chemolab.2013.08.003.
Tran, T., Al-Shaer, E. S., & Boutaba, R. (2007). 055 PolicyVis: Firewall security policy visualization and inspection. In Proceedings of the 21st conference on Large Installation System Administration Conference USENIX Association (Vol. 7, pp. 1–16). http://usenix.org/event/lisa07/tech/full_papers/tran/tran.pdf.
Viera, A. J., & Garrett, J. M. (2005). Understanding inter observer agreement: The kappa statistic. Family Medicine, 37(5), 360–363, http://www.ncbi.nlm.nih.gov/pubmed/15883903.
Winding, R., Wright, T., & Chapple, M. (2006). System anomaly detection: Mining firewall logs. In 2006 Securecomm and Workshops (pp. 1–5). Baltimore, MD: IEEE. doi:10.1109/SECCOMW.2006.359572.
Witten, I. H., & Frank, E. (2005). Data mining practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers Inc.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining practical machine learning tools and techniques (3rd ed.). Burlington, MA: Elsevier Inc.
Yoon, M., Chen, S., & Zhang, Z. (2010). Minimizing the maximum firewall rule set in a network with multiple firewalls. IEEE Transactions on Computers, 59(2), 218–230. doi:10.1109/TC.2009.172.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
F-measure values of firewall rules according to kNN classifier.
Rule ID | TP rate | FP rate | Precision | Recall | F-measure |
---|---|---|---|---|---|
5 | 1 | 0 | 1 | 1 | 1 |
186 | 1 | 0 | 1 | 1 | 1 |
6 | 1 | 0 | 0.999 | 1 | 1 |
45 | 0.994 | 0 | 0.994 | 0.994 | 0.994 |
50 | 1 | 0 | 1 | 1 | 1 |
236 | 0.996 | 0.001 | 0.995 | 0.996 | 0.996 |
53 | 1 | 0 | 1 | 1 | 1 |
220 | 1 | 0 | 1 | 1 | 1 |
46 | 1 | 0 | 0.993 | 1 | 0.997 |
213 | 0.886 | 0 | 0.888 | 0.886 | 0.887 |
47 | 1 | 0 | 1 | 1 | 1 |
273 | 1 | 0 | 1 | 1 | 1 |
10 | 0.993 | 0 | 1 | 0.993 | 0.996 |
32 | 0.996 | 0 | 0.999 | 0.996 | 0.997 |
231 | 1 | 0 | 1 | 1 | 1 |
49 | 0.996 | 0 | 0.999 | 0.996 | 0.998 |
14 | 0.984 | 0 | 0.994 | 0.984 | 0.989 |
84 | 0.998 | 0 | 0.999 | 0.998 | 0.999 |
207 | 0.998 | 0 | 0.998 | 0.998 | 0.998 |
250 | 0.999 | 0 | 1 | 0.999 | 0.999 |
15 | 0.992 | 0 | 0.995 | 0.992 | 0.994 |
190 | 0.997 | 0 | 0.972 | 0.997 | 0.984 |
48 | 0.971 | 0 | 0.985 | 0.971 | 0.978 |
241 | 0.995 | 0 | 0.999 | 0.995 | 0.997 |
51 | 1 | 0 | 1 | 1 | 1 |
52 | 1 | 0 | 1 | 1 | 1 |
237 | 0.974 | 0 | 0.979 | 0.974 | 0.976 |
31 | 0.993 | 0 | 0.987 | 0.993 | 0.99 |
18 | 0.989 | 0 | 0.996 | 0.989 | 0.993 |
88 | 1 | 0 | 1 | 1 | 1 |
191 | 0.941 | 0 | 0.889 | 0.941 | 0.914 |
238 | 0.997 | 0 | 0.997 | 0.997 | 0.997 |
0 | 0.986 | 0 | 1 | 0.986 | 0.993 |
300 | 0.972 | 0 | 0.988 | 0.972 | 0.98 |
272 | 0.997 | 0 | 0.997 | 0.997 | 0.997 |
215 | 0.996 | 0 | 0.998 | 0.996 | 0.997 |
253 | 0.998 | 0 | 0.998 | 0.998 | 0.998 |
112 | 0.957 | 0 | 0.978 | 0.957 | 0.967 |
301 | 0.959 | 0 | 0.97 | 0.959 | 0.964 |
287 | 0.99 | 0 | 0.986 | 0.99 | 0.988 |
116 | 1 | 0 | 1 | 1 | 1 |
2 | 1 | 0 | 1 | 1 | 1 |
233 | 1 | 0 | 1 | 1 | 1 |
169 | 1 | 0 | 0.983 | 1 | 0.992 |
271 | 0.996 | 0 | 1 | 0.996 | 0.998 |
109 | 0.609 | 0 | 0.56 | 0.609 | 0.583 |
258 | 0.97 | 0 | 1 | 0.97 | 0.985 |
266 | 1 | 0 | 1 | 1 | 1 |
289 | 0.996 | 0 | 0.995 | 0.996 | 0.996 |
282 | 0.999 | 0 | 0.999 | 0.999 | 0.999 |
129 | 0.474 | 0 | 0.5 | 0.474 | 0.486 |
72 | 1 | 0 | 0.933 | 1 | 0.966 |
68 | 0.984 | 0 | 1 | 0.984 | 0.992 |
280 | 0.997 | 0 | 0.987 | 0.997 | 0.992 |
111 | 0.96 | 0 | 0.98 | 0.96 | 0.97 |
13 | 0.625 | 0 | 0.476 | 0.625 | 0.541 |
41 | 0.667 | 0 | 0.5 | 0.667 | 0.571 |
139 | 0.836 | 0 | 0.918 | 0.836 | 0.875 |
17 | 1 | 0 | 1 | 1 | 1 |
247 | 0.94 | 0 | 1 | 0.94 | 0.969 |
27 | 0.636 | 0 | 0.897 | 0.636 | 0.745 |
286 | 0.999 | 0 | 0.999 | 0.999 | 0.999 |
44 | 0.984 | 0 | 0.968 | 0.984 | 0.976 |
104 | 0.992 | 0 | 0.936 | 0.992 | 0.963 |
132 | 0.364 | 0 | 0.571 | 0.364 | 0.444 |
130 | 0.333 | 0 | 0.364 | 0.333 | 0.348 |
19 | 0.833 | 0 | 0.714 | 0.833 | 0.769 |
119 | 1 | 0 | 1 | 1 | 1 |
90 | 0.667 | 0 | 1 | 0.667 | 0.8 |
278 | 0.84 | 0 | 0.808 | 0.84 | 0.824 |
81 | 0.75 | 0 | 0.75 | 0.75 | 0.75 |
1 | 1 | 0 | 0.935 | 1 | 0.967 |
16 | 0.833 | 0 | 0.833 | 0.833 | 0.833 |
76 | 0.8 | 0 | 1 | 0.8 | 0.889 |
21 | 0.444 | 0 | 0.364 | 0.444 | 0.4 |
28 | 0.999 | 0 | 0.999 | 0.999 | 0.999 |
257 | 1 | 0 | 0.8 | 1 | 0.889 |
59 | 0.949 | 0 | 1 | 0.949 | 0.974 |
42 | 1 | 0 | 1 | 1 | 1 |
299 | 0.833 | 0 | 0.714 | 0.833 | 0.769 |
251 | 0.667 | 0 | 0.667 | 0.667 | 0.667 |
269 | 1 | 0 | 1 | 1 | 1 |
99 | 0.667 | 0 | 0.667 | 0.667 | 0.667 |
228 | 0 | 0 | 0 | 0 | 0 |
230 | 0 | 0 | 0 | 0 | 0 |
192 | 1 | 0 | 1 | 1 | 1 |
3 | 0 | 0 | 0 | 0 | 0 |
156 | 0 | 0 | 0 | 0 | 0 |
291 | 1 | 0 | 1 | 1 | 1 |
188 | 0.994 | 0 | 0.988 | 0.994 | 0.991 |
267 | 1 | 0 | 1 | 1 | 1 |
297 | 1 | 0 | 1 | 1 | 1 |
274 | 0.999 | 0 | 0.999 | 0.999 | 0.999 |
Rights and permissions
About this article
Cite this article
Ucar, E., Ozhan, E. The Analysis of Firewall Policy Through Machine Learning and Data Mining. Wireless Pers Commun 96, 2891–2909 (2017). https://doi.org/10.1007/s11277-017-4330-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-4330-0