Abstract

Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or sub-optimal.

We apply the ε-machine—a special type of HMM—to the task of constructing network protocol models solely from network traffic. Unlike previous approaches, ε-machine reconstruction infers the minimal HMM architecture directly from data and is well suited to applications such as anomaly detection. We draw distinctions between our approach and previous research, and discuss the benefits and challenges of ε-machine for protocol model inference.

Keywords

Statistical Inference Reverse Engineering Network Protocols Markov Models Computational Mechanics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Erman, J., Mahanti, A., Arlitt, M.: Internet traffic identification using machine learning. In: Proceedings of the 49th IEEE Global Telecommunications Conference, pp. 1–6 (2006)Google Scholar
  2. 2.
    Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  3. 3.
    Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Let. 63 (1989); Crutchfield, J.P.: Physica D 75 11–54 (1994); Crutchfield, J. P., Shalizi, C. R.: Phys. Rev. E 59(1), 275–283, 105–108 (1999) Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley Interscience, New York (2006)MATHGoogle Scholar
  5. 5.
    Beddoe, M.: Network protocol analysis using bioinformatics algorithms. Technical report, McAfee Inc. (2005)Google Scholar
  6. 6.
    Cui, W., Paxson, V., Weaver, N., Katz, R.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Symposium on Network and Distributed System Security (2006)Google Scholar
  7. 7.
    Cui, W., Kannan, J., Wang, H.: Discoverer: Automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 1–14 (2007)Google Scholar
  8. 8.
    Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (2008)Google Scholar
  9. 9.
    Wondracek, G., Milani Comparetti, P., Kruegel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Symposium on Network and Distributed System Security (2008)Google Scholar
  10. 10.
    Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM conference on Computer and Communications Security, pp. 621–634 (2009)Google Scholar
  11. 11.
    Leita, C., Mermoud, K., Dacier, M.: Scriptgen: An automated script generation tool for honeyd. In: Proceedings of the 21st Annual Computer Security Applications Conference, pp. 203–214 (2005)Google Scholar
  12. 12.
    Milani Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: Protocol specification extraction. In: IEEE Symposium on Security and Privacy (2009)Google Scholar
  13. 13.
    Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)CrossRefMATHGoogle Scholar
  14. 14.
    Crutchfield, J., Feldman, D.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 15, 25–54 (2003)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Shalizi, C.R., Shalizi, K.L.: Blind construction of optimal nonlinear recursive predictors for discrete sequences. In: Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pp. 504–511 (2004)Google Scholar
  16. 16.
    Shalizi, C., Shalizi, K., Crutchfield, J.: Pattern discovery in time series, Part I: Theory, algorithm, analysis, and convergence, 2002 Santa Fe Institute Working Paper 02-10-060; arXiv.org/abs/cs.LG/0210025Google Scholar
  17. 17.
    Li, H., Zhang, K., Jiang, T.: Minimum entropy clustering and applications to gene expression analysis. In: Computational Systems Bioinformatics Conference, International IEEE Computer Society, pp. 142–151 (2004)Google Scholar
  18. 18.
    Postel, J.: Internet Control Message Protocol (1981), Updated by RFCs 950, 4884Google Scholar
  19. 19.
    Modbus Organization: Modbus Messaging Implementation Guide 1.0b (2006)Google Scholar
  20. 20.
    Bugalho, M., Oliveira, A.L.: Inference of regular languages using state merging algorithms with search. Pattern Recognition 38 (2005)Google Scholar
  21. 21.
    Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: Proceedings of the 2nd international workshop on Random testing, p. 1 (2007)Google Scholar
  22. 22.
    Infigo Information Security: Multiple FTP Servers vulnerabilities (2006) (accessed October 29, 2006)Google Scholar

Copyright information

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2010

Authors and Affiliations

  • Sean Whalen
    • 1
    • 2
  • Matt Bishop
    • 1
  • James P. Crutchfield
    • 1
    • 2
    • 3
  1. 1.Department of Computer ScienceUniversity of CaliforniaDavisUSA
  2. 2.Department of PhysicsUniversity of CaliforniaDavisUSA
  3. 3.Santa Fe InstituteSanta FeUSA

Personalised recommendations