Estimating Asset Sensitivity by Profiling Users

  • Youngja Park
  • Christopher Gates
  • Stephen C. Gates
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8134)


We introduce algorithms to automatically score and rank information technology (IT) assets in an enterprise, such as computer systems or data files, by their business value and criticality to the organization. Typically, information assets are manually assigned classification labels with respect to the confidentiality, integrity and availability. In this paper, we propose semi-automatic machine learning algorithms to automatically estimate the sensitivity of assets by profiling the users. Our methods do not require direct access to the target assets or privileged knowledge about the assets, resulting in a more efficient, scalable and privacy-preserving approach compared with existing data security solutions relying on data content classification. Instead, we rely on external information such as the attributes of the users, their access patterns and other published data content by the users. Validation with a set of 8,500 computers collected from a large company show that all our algorithms perform significantly better than two baseline methods.


Asset Sensitivity Criticality Data Security Information Security 


  1. 1.
    Aksoy, S., Haralick, R.M.: Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognition Letters 22(5), 563–582 (2001)zbMATHCrossRefGoogle Scholar
  2. 2.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proceedings of International Conference on Machine Learning, ICML, pp. 11–18 (2003)Google Scholar
  3. 3.
    Beaudoin, L., Eng, P.: Asset valuation technique for network management and security. In: Proceedings of the Sixth IEEE International Conference on Data Mining Workshops, ICDMW 2006, pp. 718–721. IEEE Computer Society (2006)Google Scholar
  4. 4.
    Beaver, J.M., Patton, R.M., Potok, T.E.: An approach to the automated determination of host information value. In: IEEE Symposium on Computational Intelligence in Cyber Security, CICS, pp. 92–99. IEEE (2011)Google Scholar
  5. 5.
    Bell, D.E., LaPadula, L.J.: Secure computer systems: Mathematical foundations. MITRE Corporation, 1 (1973)Google Scholar
  6. 6.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  7. 7.
    Cole, E.: Advanced Persistent Threat: Understanding the Danger and How to Protect Your Organization. Syngress (2012)Google Scholar
  8. 8.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  9. 9.
    Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (4), 422–446 (2002)Google Scholar
  10. 10.
    Kim, A., Kang, M.H.: Determining asset criticality for cyber defense. Technical Report NRL/MR/5540–11-9350, NAVAL RESEARCH LAB WASHINGTON (2011)Google Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  12. 12.
    Park, Y., Gates, S.C., Teiken, W., Chari, S.N.: System for automatic estimation of data sensitivity with applications to access control and other applications. In: Proceedings of The ACM Symposium on Access Control Models and Technologies, SACMAT (2011)Google Scholar
  13. 13.
    Park, Y., Gates, S.C., Teiken, W., Cheng, P.-C.: An experimental study on the measurement of data sensitivitys. In: Proceedings of Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, BADGERS, pp. 68–75 (2011)Google Scholar
  14. 14.
    Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metriclearning for large margin nearest neighbor classification. In: Proceedings of the Neural Information Processing Systems Conference, NIPS (2005)Google Scholar
  15. 15.
    Sawilla, R.E., Ou, X.: Identifying critical attack assets in dependency attack graphs. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 18–34. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal (1948)Google Scholar
  17. 17.
    Shental, N., Hertz, T., Weinshall, D., Pavel, M.: Adjustment learning and relevant component analysis. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 776–790. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  18. 18.
    Stamati-Koromina, V., Ilioudis, C., Overill, R., Georgiadis, C.K., Stamatis, D.: Insider threats in corporate environments: a case study for data leakage prevention. In: Proceedings of the Fifth Balkan Conference in Informatics, BCI 2012, pp. 271–274 (2012)Google Scholar
  19. 19.
    Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the 21 st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, vol. 24, pp. 315–323 (1998)Google Scholar
  20. 20.
    Yang, L.: Distance metric learning: A comprehensive survey (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Youngja Park
    • 1
  • Christopher Gates
    • 2
  • Stephen C. Gates
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA
  2. 2.Purdue UniversityIndianaUSA

Personalised recommendations