Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach

Wafo Soh, C.; Njilla, L. L.; Kwiat, K. K.; Kamhoua, C. A.

doi:10.1007/s41066-018-0127-0

Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach

Original Paper
Published: 20 August 2018

Volume 5, pages 71–84, (2020)
Cite this article

Granular Computing Aims and scope Submit manuscript

C. Wafo Soh¹,
L. L. Njilla²,
K. K. Kwiat² &
…
C. A. Kamhoua³

239 Accesses
7 Citations
Explore all metrics

Abstract

The challenging and pervasive issue associated with information exchange is inferential disclosure. It occurs in the following three situations: (1) the exchanged data correlate with publicly available information, (2) the exchanged data comprise patterns similar to those in a sharing partner’s datum, and (3) the shared data’s attributes are interdependent. In this work, we provide and implement new algorithms that impede the third type of inferential attack. They rely on rough set theory to undermine the deductive route from nonsensitive to sensitive features. Our approach comprises three steps which include learning quasi-identifiers, computing a granulation of the underlying information system that maximizes the distribution of sensitive attributes in each granule, and masking the deductive route from nonsensitive to sensitive features. Our routine for learning quasi-identifiers achieves both the largest distinction and separation without an exhaustive search among tuples of features. The learned quasi-identifiers are employed to find a granulation of the information system that strikes a balance between the anonymity of quasi-identifiers and the diversity of sensitive attributes, without solving a difficult optimization problem. We employ this granulation in a strategy similar to that used in k-anonymity to de-identify private information systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SECRETA: A Tool for Anonymizing Relational, Transaction and RT-Datasets

A Rule-Based Approach to Local Anonymization for Exclusivity Handling in Statistical Databases

Privacy Issues in Association Rule Mining

References

Barbaro M, Zeller T, Hansell S (2006) A face is exposed for AOL searcher no. 4417749. New York Times 9(2008)
Bishop M, Bhumiratana B, Crawford R, Levitt K (2004) How to sanitize data. In: Enabling technologies: infrastructure for collaborative enterprises. WET ICE 2004. 13th IEEE International Workshops. IEEE, pp 217–222
Bishop M, Cummins J, Peisert S, Singh A, Bhumiratana B, Agarwal D, Frincke D, Hogarth M (2010) Relationships and data sanitization: a study in scarlet. In: Proceedings of the 2010 New Security Paradigms Workshop. ACM, pp 151–164
Carey P, Gerver KM, Moreno JV, Rockwood E (2016) Potential Risks and Rewards of Cybersecurity Information Sharing Under CISA. The National Law Review
Carr M (2016) Public-private partnerships in national cyber-security strategies. Int Affairs 92(1):43–62
Google Scholar
Coull SE, Wright CV, Monrose F, Collins MP, Reiter MK (2007) Playing devil’s advocate: inferring sensitive information from anonymized network traces. NDSS 7:35–47
Google Scholar
DeSmit Z, Elhabashy AE, Wells LJ, Camelio JA (2016) Cyber-physical vulnerability assessment in manufacturing systems. Procedia Manuf 5:1060–1074
Google Scholar
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Düntsch I, Gediga G (1998) Uncertainty measures of rough set prediction. Artif Intell 106(1):109–137
MathSciNet MATH Google Scholar
Evangelakos J (2016) A guide to the cybersecurity act of 2015. Law360
Fidler DP, Pregent R, Vandurme A (2013) NATO, cyber defense, and international law. John’s J. Int’l & Comp. L. 4:1
Google Scholar
Gal-Or E, Ghose A (2004a) The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104
Gal-Or E, Ghose A (2004b) The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104
Gilbert G (1972) Distance between sets. Nature 239(5368):174
Google Scholar
Greco S, Matarazzo B, Slowinski R (2000) Extension of the rough set approach to multicriteria decision support. INFOR: Inf Syst Oper Res 38(3):161–195
MATH Google Scholar
Grzymala-Busse JW (1988) Knowledge acquisition under uncertainty—a rough set approach. J Intell Rob Syst 1(1):3–16
Google Scholar
Grzymala-Busse JW (1992) LERS-a system for learning from examples based on rough sets. In: Intelligent decision support Springer pp 3–18
Google Scholar
Guha S, Yau SS, Buduru AB (2016) Attack detection in cloud infrastructures using artificial neural network with genetic feature selection. In: Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 2016 IEEE 14th Intl C. IEEE, pp 414–419
Harvey S, Evans D (2016) Defending Against Cyber Espionage: The US Office of Personnel Management Hack as a Case Study in Information Assurance 2016 NCUR
Hausken K (2007) Information sharing among firms and cyber attacks. J Account Publ Policy 26(6):639–688
Google Scholar
Huang SY (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory. Springe, New York
Google Scholar
Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37:547–579
Google Scholar
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Google Scholar
Jasper SE (2017) US cyber threat intelligence sharing frameworks. Int J Intell CounterIntell 30(1):53–65
MathSciNet Google Scholar
Johnson AL (2016) Cybersecurity for financial institutions: the integral role of information sharing in cyber attack mitigation. NC Banking Inst 20:277
Google Scholar
Kopetz H (2011) Internet of things. In: Real-time systems. Springer, pp 307–323
Kosub S (2016) A note on the triangle inequality for the jaccard distance. arXiv preprint arXiv:1612.02696
Kryszkiewicz M (1998) Rough set approach to incomplete information systems. Inf Sci 112(1–4):39–49
MathSciNet MATH Google Scholar
Kshetri N (2010) The global cybercrime industry: economic, institutional and strategic perspectives. Springer, New York
Google Scholar
Kuncheva LI (1992) Fuzzy rough sets: application to feature selection. Fuzzy Sets Syst 51(2):147–153
MathSciNet Google Scholar
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference. IEEE, p 25
Leung Y, Wu WZ, Zhang WX (2006) Knowledge acquisition in incomplete information systems: a rough set approach. Eur J Operat Res 168(1):164–180
MathSciNet MATH Google Scholar
Lewis TG (2014) Critical infrastructure protection in homeland security: defending a networked nation. Wiley, Chichester
Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on IEEE pp 106–115
Liu H, Cocea M (2017) Fuzzy information granulation towards interpretable sentiment analysis. Granular Comput 1(4):289–302
Google Scholar
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1):3
Google Scholar
McGuire M, Dowling S (2013) Cyber crime: A review of the evidence. Summary of key findings and implications, Home Office Research report, p 75
Motwani R, Xu Y (2007) Efficient algorithms for masking and finding quasi-identifiers. In: Proceedings of the Conference on Very Large Data Bases (VLDB), pp 83–93
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
MATH Google Scholar
Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Operat Res 99(1):48–57
MATH Google Scholar
Pedrycz W, Chen SM (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type. Springer, Heidelberg
Google Scholar
Pedrycz W, Chen SM (2015a) Granular computing and decision-making: interactive and iterative approaches. Springer, Heidelberg
Google Scholar
Pedrycz W, Chen SM (2015b) Information granularity, big data, and computational intelligence. Springer, Heidelberg
Google Scholar
Peters JF, Skowron A, Synak P, Ramanna S (2003) Rough sets and information granulation. In: International Fuzzy Systems Association World Congress. Springer, pp 370–377
Google Scholar
Polkowski L (2013) Rough sets in knowledge discovery 2: applications, case studies and software systems. Physica
Polkowski L, Tsumoto S, Lin TY (2012) Rough set methods and applications: new developments in knowledge discovery in information systems. Physica
Prasser F, Kohlmayer F, Spengler H, Kuhn KA (2018) A scalable and pragmatic method for the safe sharing of high-quality health data. IEEE J Biomed Health Inf 22(2):611–622
Google Scholar
Qian Y, Liang J, Pedrycz W, Dang C (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618
MathSciNet MATH Google Scholar
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression Technical report Technical report, SRI International
Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, Princeton
MATH Google Scholar
Singer N (2009) When 2+ 2 equals a privacy question. New York Times
Skare P, Falk H, Rice M, Winkel J (2016) In the face of cybersecurity: how the common information model can be used. IEEE Power Energy Mag 14(1):94–104
Google Scholar
Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588
MathSciNet MATH Google Scholar
Sweeney L (2002b) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10(05):557–570
MathSciNet MATH Google Scholar
Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recognit Lett 24(6):833–849
MATH Google Scholar
Syed Z, Padia A, Finin T, Mathews ML, Joshi A (2016) UCO: A Unified Cybersecurity Ontology. In: AAAI Workshop – Artificial Intelligence for Cyber Security
Tan PN, Steinbach M, Kumar V (2005) Association analysis: basic concepts and algorithms. Introduction to Data mining pp 327–414
Thornton-Trump I (2018) Malicious attacks and actors: an examination of the modern cyber criminal. EDPACS 57(1):17–23
Google Scholar
Tsumoto S (1998) Automated extraction of medical expert system rules from clinical databases based on rough set theory. Inf Sci 112(1–4):67–84
Google Scholar
Wahl RS (2016) Latency in intrusion detection systems (IDS) and cyber-attacks: a quantitative comparative study PhD thesis Capella University
Weiss NE (2015) Legislation to facilitate cybersecurity information sharing: Economic analysis. Congressional Research Service
Wolfram S (2017) Mathematica, Version 11.1. Wolfram Research Inc., Champaign, Illinois
Xu W, Li W, Zhang X (2017) Generalized multigranulation rough sets and optimal granularity selection. Granular Comput 2(4):271–88
Google Scholar
Xu ZY, Liu ZP, Yang BR, Song W (2006) Quick attribute reduction algorithm with complexity of \(\max (O(| C|| U|), O (| C|^2| U/C|))\). Jisuanji Xuebao (Chinese Journal of Computers) 29(3): 391–399
Yao Y (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271
MATH Google Scholar
Yar M (2013) Cybercrime and society. Sage
Zadeh LA (1965) Information and control. Fuzzy sets 8(3):338–353
Google Scholar
Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46(1):39–59
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was completed during the Summer Faculty Fellow Program (SFFP) at the Cyber Assurance Branch of the Air Force Lab. We would like to thank all the members of the Host Lab for their assistance.

Author information

Authors and Affiliations

Department of Mathematical and Statistical Sciences, Jackson State University, 1400 J R Lynch Street, Jackson, MS, 39257, USA
C. Wafo Soh
Air Force Research Laboratory, Cyber Assurance Branch, Rome, NY, 13441, USA
L. L. Njilla & K. K. Kwiat
Army Research Laboratory, Adelphi, MD, 20783, USA
C. A. Kamhoua

Authors

C. Wafo Soh
View author publications
You can also search for this author in PubMed Google Scholar
L. L. Njilla
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Kwiat
View author publications
You can also search for this author in PubMed Google Scholar
C. A. Kamhoua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Wafo Soh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wafo Soh, C., Njilla, L.L., Kwiat, K.K. et al. Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. Granul. Comput. 5, 71–84 (2020). https://doi.org/10.1007/s41066-018-0127-0

Download citation

Received: 16 February 2018
Accepted: 08 August 2018
Published: 20 August 2018
Issue Date: January 2020
DOI: https://doi.org/10.1007/s41066-018-0127-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach

Abstract

Access this article

Similar content being viewed by others

SECRETA: A Tool for Anonymizing Relational, Transaction and RT-Datasets

A Rule-Based Approach to Local Anonymization for Exclusivity Handling in Statistical Databases

Privacy Issues in Association Rule Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach

Abstract

Access this article

Similar content being viewed by others

SECRETA: A Tool for Anonymizing Relational, Transaction and RT-Datasets

A Rule-Based Approach to Local Anonymization for Exclusivity Handling in Statistical Databases

Privacy Issues in Association Rule Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation