iCAT: An Interactive Customizable Anonymization Tool

Oqaily, Momen; Jarraya, Yosr; Zhang, Mengyuan; Wang, Lingyu; Pourzandi, Makan; Debbabi, Mourad

doi:10.1007/978-3-030-29959-0_32

Momen Oqaily¹¹,
Yosr Jarraya¹²,
Mengyuan Zhang¹²,
Lingyu Wang¹¹,
Makan Pourzandi¹² &
…
Mourad Debbabi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11735))

Included in the following conference series:

European Symposium on Research in Computer Security

2549 Accesses

Abstract

Today’s data owners usually resort to data anonymization tools to ease their privacy and confidentiality concerns. However, those tools are typically ready-made and inflexible, leaving a gap both between the data owner and data users’ requirements, and between those requirements and a tool’s anonymization capabilities. In this paper, we propose an interactive customizable anonymization tool, namely iCAT, to bridge the aforementioned gaps. To this end, we first define the novel concept of anonymization space to model all combinations of per-attribute anonymization primitives based on their levels of privacy and utility. Second, we leverage NLP and ontology modeling to provide an automated way to translate data owners and data users’ textual requirements into appropriate anonymization primitives. Finally, we implement iCAT and evaluate its efficiency and effectiveness with both real and synthetic network data, and we assess the usability through a user-based study involving participants from industry and research laboratories. Our experiments show an effectiveness of about 96.5% for data owners and 92.6% for data users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.wsj.com/articles/google-exposed-user-data-feared-repercussions-of-disclosing-to-public-1539017194.
2.
https://www.techworld.com/security/uks-most-infamous-data-breaches-3604586/.
3.
This list is not meant to be exhaustive, and our model and methodology can be extended to include other anonymization primitives.

References

Rieck, K.: Pseudonymizer for solaris audit trails (2018). http://www.mlsec.org/bsmpseu/bsmpseu.1
Assila, A., Ezzedine, H., et al.: Standardized usability questionnaires: features and quality focus. Electron. J. Comput. Sci. Inf. Technol. eJCIST 6(1), 15–31 (2016)
Google Scholar
Bell, E.D., La Padula, J.L.: Secure computer system: unified exposition and multics interpretation (1976)
Google Scholar
Denning, D.E.: A lattice model of secure information flow. Commun. ACM 19(5), 236–243 (1976)
Article MathSciNet Google Scholar
Donnellan, T.: Lattice Theory. Pergamon Press, Oxford (1968)
Google Scholar
Kohler, E.: Ipsumdump tool (2015). https://read.seas.harvard.edu/~kohler/ipsumdump/
Blanton, E.: Tcpurify tool (2019). https://web.archive.org/web/20140203210616/irg.cs.ohiou.edu/~eblanton/tcpurify/
Foukarakis, M., Antoniades, D., Antonatos, S., Markatos, E.P.: Flexible and high-performance anonymization of NetFlow records using anontool. In: Third International Conference on Security and Privacy in Communications Networks and the Workshops, SecureComm 2007, pp. 33–38. IEEE (2007)
Google Scholar
Gringoli, F.: TCPanon tool (2019). http://netweb.ing.unibs.it/~ntw/tools/tcpanon/
Google: Traces from requests processed by Google cluster management system (2019). https://github.com/google/cluster-data
Greg Minshall of Ipsilon Networks: Tcpdpriv (2005). http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html
Haag, P.: Nfdump (2010). World Wide Web. http://nfdump.sourceforge.net
IMPREVA: Camouflage data masking (2018). https://www.imperva.com/products/data-security/data-masking/
Kayaalp, M., Sagan, P., Browne, A.C., McDonald, C.J.: NLM-scrubber (2018). https://scrubber.nlm.nih.gov/files/
Li, Y., Slagell, A., Luo, K., Yurcik, W.: CANINE: a combined conversion and anonymization tool for processing netflows for security. In: International Conference on Telecommunication Systems Modeling and Analysis, vol. 21 (2005)
Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Moore, D., Keys, K., Koga, R., Lagache, E., Claffy, K.C.: The CoralReef software suite as a tool for system and network administrators. In: Proceedings of the 15th USENIX Conference on System Administration, pp. 133–144. USENIX Association (2001)
Google Scholar
Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. ACM SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)
Article Google Scholar
Rules for the protection of personal data inside and outside the EU. Gdpr (2018). https://ec.europa.eu/info/law/law-topic/data-protection_en
Sandhu, R.S.: Lattice-based access control models. Computer 26(11), 9–19 (1993)
Article Google Scholar
Slagell, A.J., Lakkaraju, K., Luo, K.: FLAIM: a multi-level anonymization framework for computer and network logs. LISA 6, 3–8 (2006)
Google Scholar
Sys4 Consults: A generic log anonymizer (2018). https://github.com/sys4/loganon
UCIMLR: Burst Header Packet flooding attack on Optical Burst Switching Network Data Set (2019). https://archive.ics.uci.edu/ml/datasets/
Yurcik, W., Woolam, C., Hellings, G., Khan, L., Thuraisingham, B.: SCRUB-tcpdump: a multi-level packet anonymizer demonstrating privacy/analysis tradeoffs. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 49–56. IEEE (2007)
Google Scholar

Download references

Acknowledgment

The authors thank the anonymous reviewers for their valuable comments. This work is partially supported by the Natural Sciences and Engineering Research Council of Canada and Ericsson Canada under CRD Grant N01823 and by PROMPT Quebec.

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada
Momen Oqaily, Lingyu Wang & Mourad Debbabi
Ericsson Security Research, Ericsson Canada, Montreal, QC, Canada
Yosr Jarraya, Mengyuan Zhang & Makan Pourzandi

Authors

Momen Oqaily
View author publications
You can also search for this author in PubMed Google Scholar
Yosr Jarraya
View author publications
You can also search for this author in PubMed Google Scholar
Mengyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Makan Pourzandi
View author publications
You can also search for this author in PubMed Google Scholar
Mourad Debbabi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Momen Oqaily , Yosr Jarraya , Mengyuan Zhang , Lingyu Wang , Makan Pourzandi or Mourad Debbabi .

Editor information

Editors and Affiliations

NEC Corporation, Kawasaki, Japan
Kazue Sako
University of Surrey, Guildford, UK
Steve Schneider
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Peter Y. A. Ryan

Appendix

The following details each module of iCAT as shown in Fig. 6B.

(A) Data Loading and Processing (DLP). This module is used to load the data, and enables filtering and cleansing operations. This module consists of following sub-modules:

Data Processing: This sub-module enables performing data pre-processing and adjustment operations. It can also automatically detect all data attributes and their types, which are needed by the Anonymization Space Manager to build the anonymization space lattice.

Data Filtering: This sub-module deploys several algorithms that can be automatically and manually used to filter and remove records from data (e.g., column deletion, row deletion, searched deletion and frequency deletion).

(B) Requirements Interpreter (RI). This module translates the data owner’s and data user’s requirements into data attributes types and anonymization primitives. It consists of the following three sub-modules:

Requirements Parser: It takes the English statement and transforms them into a set of requirements using the Stanford CoreNLP. Then, it processes and filters those requirements using the POS tool.

Requirements Mapper: This sub-module takes the parsed requirements and communicates with the Method-Ontology and the Type-Ontology databases in order to map each requirement into the related attribute type and then the corresponding anonymization primitives.

Ambiguity Solver: This sub-module is mainly responsible of communicating with the user (i.e. data owner or data user) through the Interactive Communicator (IC) sub-module in order to solve any ambiguity that occurs at the Requirement Mapper sub-module.

(C) iCAT Manager.

Identity Access Management and Permission Granter (IPG): This module associates the data user identity with the privacy-level specified by the data owner, which is needed to determine the anonymization sub-space assigned to him based on privacy-up principle.

Interactive Communicator: This sub-module is mainly responsible for interacting with the data owner or data user and handles the communications between them and the RI module.

I/O Manager: This module is responsible for configuring the data source from where the data is fetched (e.g. from a file system or a database) and the loading of the actual data to be anonymized.

(D) Anonymization Space Manager. This module is mainly responsible of generating the anonymization space and implementing the access control mechanism over the anonymization space for the data user. This module consists of the following sub-modules:

Anonymization Space Builder (ASB): This sub-module automatically builds the entire anonymization space, which consists of all available combination of anonymization primitives for each data attribute based on its type. Building the anonymization space lattice is detailed in Sect. 2.3. The resulting anonymization-space lattice will be stored in the Access Control database.

Anonymization Controller: This module implements the access control mechanism over the anonymization space for the data user. It receives the utility-level from the data user and perform an intersection/masking operation between the privacy level and utility level in order to determine the allowed combinations of anonymization primitives. It also ensures that the Data Anonymizer only accesses the allowed anonymization primitives for the user.

(E) Data Anonymizer. This module is mainly responsible for anonymizing the data with the respect to the trust-level assigned to the users. It is designed in a building-blocks manner such that if there exist new or more efficient anonymization primitives they can be easily integrated into iCAT. This module holds the following sub-modules:

Anonymization Primitives: This sub-module holds the implementation of all existing anonymization algorithms corresponding to the 12 anonymization primitives discussed in Sect. 2.

Anonymization Mapper: This sub-module is responsible of creating a mapping file that maps the plain-text data into their anonymized values for later recognition purposes (e.g., if hashing is used to anonymize IP addresses, a file contains the original IP addresses and their hashes are created).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oqaily, M., Jarraya, Y., Zhang, M., Wang, L., Pourzandi, M., Debbabi, M. (2019). iCAT: An Interactive Customizable Anonymization Tool. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11735. Springer, Cham. https://doi.org/10.1007/978-3-030-29959-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-29959-0_32
Published: 15 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29958-3
Online ISBN: 978-3-030-29959-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

iCAT: An Interactive Customizable Anonymization Tool

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation