Skip to main content

iCAT: An Interactive Customizable Anonymization Tool

  • Conference paper
  • First Online:
Computer Security – ESORICS 2019 (ESORICS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11735))

Included in the following conference series:

  • 2549 Accesses

Abstract

Today’s data owners usually resort to data anonymization tools to ease their privacy and confidentiality concerns. However, those tools are typically ready-made and inflexible, leaving a gap both between the data owner and data users’ requirements, and between those requirements and a tool’s anonymization capabilities. In this paper, we propose an interactive customizable anonymization tool, namely iCAT, to bridge the aforementioned gaps. To this end, we first define the novel concept of anonymization space to model all combinations of per-attribute anonymization primitives based on their levels of privacy and utility. Second, we leverage NLP and ontology modeling to provide an automated way to translate data owners and data users’ textual requirements into appropriate anonymization primitives. Finally, we implement iCAT and evaluate its efficiency and effectiveness with both real and synthetic network data, and we assess the usability through a user-based study involving participants from industry and research laboratories. Our experiments show an effectiveness of about 96.5% for data owners and 92.6% for data users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.wsj.com/articles/google-exposed-user-data-feared-repercussions-of-disclosing-to-public-1539017194.

  2. 2.

    https://www.techworld.com/security/uks-most-infamous-data-breaches-3604586/.

  3. 3.

    This list is not meant to be exhaustive, and our model and methodology can be extended to include other anonymization primitives.

References

  1. Rieck, K.: Pseudonymizer for solaris audit trails (2018). http://www.mlsec.org/bsmpseu/bsmpseu.1

  2. Assila, A., Ezzedine, H., et al.: Standardized usability questionnaires: features and quality focus. Electron. J. Comput. Sci. Inf. Technol. eJCIST 6(1), 15–31 (2016)

    Google Scholar 

  3. Bell, E.D., La Padula, J.L.: Secure computer system: unified exposition and multics interpretation (1976)

    Google Scholar 

  4. Denning, D.E.: A lattice model of secure information flow. Commun. ACM 19(5), 236–243 (1976)

    Article  MathSciNet  Google Scholar 

  5. Donnellan, T.: Lattice Theory. Pergamon Press, Oxford (1968)

    Google Scholar 

  6. Kohler, E.: Ipsumdump tool (2015). https://read.seas.harvard.edu/~kohler/ipsumdump/

  7. Blanton, E.: Tcpurify tool (2019). https://web.archive.org/web/20140203210616/irg.cs.ohiou.edu/~eblanton/tcpurify/

  8. Foukarakis, M., Antoniades, D., Antonatos, S., Markatos, E.P.: Flexible and high-performance anonymization of NetFlow records using anontool. In: Third International Conference on Security and Privacy in Communications Networks and the Workshops, SecureComm 2007, pp. 33–38. IEEE (2007)

    Google Scholar 

  9. Gringoli, F.: TCPanon tool (2019). http://netweb.ing.unibs.it/~ntw/tools/tcpanon/

  10. Google: Traces from requests processed by Google cluster management system (2019). https://github.com/google/cluster-data

  11. Greg Minshall of Ipsilon Networks: Tcpdpriv (2005). http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html

  12. Haag, P.: Nfdump (2010). World Wide Web. http://nfdump.sourceforge.net

  13. IMPREVA: Camouflage data masking (2018). https://www.imperva.com/products/data-security/data-masking/

  14. Kayaalp, M., Sagan, P., Browne, A.C., McDonald, C.J.: NLM-scrubber (2018). https://scrubber.nlm.nih.gov/files/

  15. Li, Y., Slagell, A., Luo, K., Yurcik, W.: CANINE: a combined conversion and anonymization tool for processing netflows for security. In: International Conference on Telecommunication Systems Modeling and Analysis, vol. 21 (2005)

    Google Scholar 

  16. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  17. Moore, D., Keys, K., Koga, R., Lagache, E., Claffy, K.C.: The CoralReef software suite as a tool for system and network administrators. In: Proceedings of the 15th USENIX Conference on System Administration, pp. 133–144. USENIX Association (2001)

    Google Scholar 

  18. Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. ACM SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)

    Article  Google Scholar 

  19. Rules for the protection of personal data inside and outside the EU. Gdpr (2018). https://ec.europa.eu/info/law/law-topic/data-protection_en

  20. Sandhu, R.S.: Lattice-based access control models. Computer 26(11), 9–19 (1993)

    Article  Google Scholar 

  21. Slagell, A.J., Lakkaraju, K., Luo, K.: FLAIM: a multi-level anonymization framework for computer and network logs. LISA 6, 3–8 (2006)

    Google Scholar 

  22. Sys4 Consults: A generic log anonymizer (2018). https://github.com/sys4/loganon

  23. UCIMLR: Burst Header Packet flooding attack on Optical Burst Switching Network Data Set (2019). https://archive.ics.uci.edu/ml/datasets/

  24. Yurcik, W., Woolam, C., Hellings, G., Khan, L., Thuraisingham, B.: SCRUB-tcpdump: a multi-level packet anonymizer demonstrating privacy/analysis tradeoffs. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 49–56. IEEE (2007)

    Google Scholar 

Download references

Acknowledgment

The authors thank the anonymous reviewers for their valuable comments. This work is partially supported by the Natural Sciences and Engineering Research Council of Canada and Ericsson Canada under CRD Grant N01823 and by PROMPT Quebec.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Momen Oqaily , Yosr Jarraya , Mengyuan Zhang , Lingyu Wang , Makan Pourzandi or Mourad Debbabi .

Editor information

Editors and Affiliations

Appendix

Appendix

The following details each module of iCAT as shown in Fig. 6B.

(A) Data Loading and Processing (DLP). This module is used to load the data, and enables filtering and cleansing operations. This module consists of following sub-modules:

Data Processing: This sub-module enables performing data pre-processing and adjustment operations. It can also automatically detect all data attributes and their types, which are needed by the Anonymization Space Manager to build the anonymization space lattice.

Data Filtering: This sub-module deploys several algorithms that can be automatically and manually used to filter and remove records from data (e.g., column deletion, row deletion, searched deletion and frequency deletion).

(B) Requirements Interpreter (RI). This module translates the data owner’s and data user’s requirements into data attributes types and anonymization primitives. It consists of the following three sub-modules:

Requirements Parser: It takes the English statement and transforms them into a set of requirements using the Stanford CoreNLP. Then, it processes and filters those requirements using the POS tool.

Requirements Mapper: This sub-module takes the parsed requirements and communicates with the Method-Ontology and the Type-Ontology databases in order to map each requirement into the related attribute type and then the corresponding anonymization primitives.

Ambiguity Solver: This sub-module is mainly responsible of communicating with the user (i.e. data owner or data user) through the Interactive Communicator (IC) sub-module in order to solve any ambiguity that occurs at the Requirement Mapper sub-module.

(C) iCAT Manager.

Identity Access Management and Permission Granter (IPG): This module associates the data user identity with the privacy-level specified by the data owner, which is needed to determine the anonymization sub-space assigned to him based on privacy-up principle.

Interactive Communicator: This sub-module is mainly responsible for interacting with the data owner or data user and handles the communications between them and the RI module.

I/O Manager: This module is responsible for configuring the data source from where the data is fetched (e.g. from a file system or a database) and the loading of the actual data to be anonymized.

(D) Anonymization Space Manager. This module is mainly responsible of generating the anonymization space and implementing the access control mechanism over the anonymization space for the data user. This module consists of the following sub-modules:

Anonymization Space Builder (ASB): This sub-module automatically builds the entire anonymization space, which consists of all available combination of anonymization primitives for each data attribute based on its type. Building the anonymization space lattice is detailed in Sect. 2.3. The resulting anonymization-space lattice will be stored in the Access Control database.

Anonymization Controller: This module implements the access control mechanism over the anonymization space for the data user. It receives the utility-level from the data user and perform an intersection/masking operation between the privacy level and utility level in order to determine the allowed combinations of anonymization primitives. It also ensures that the Data Anonymizer only accesses the allowed anonymization primitives for the user.

(E) Data Anonymizer. This module is mainly responsible for anonymizing the data with the respect to the trust-level assigned to the users. It is designed in a building-blocks manner such that if there exist new or more efficient anonymization primitives they can be easily integrated into iCAT. This module holds the following sub-modules:

Anonymization Primitives: This sub-module holds the implementation of all existing anonymization algorithms corresponding to the 12 anonymization primitives discussed in Sect. 2.

Anonymization Mapper: This sub-module is responsible of creating a mapping file that maps the plain-text data into their anonymized values for later recognition purposes (e.g., if hashing is used to anonymize IP addresses, a file contains the original IP addresses and their hashes are created).

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oqaily, M., Jarraya, Y., Zhang, M., Wang, L., Pourzandi, M., Debbabi, M. (2019). iCAT: An Interactive Customizable Anonymization Tool. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11735. Springer, Cham. https://doi.org/10.1007/978-3-030-29959-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29959-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29958-3

  • Online ISBN: 978-3-030-29959-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics