Abstract
The challenging and pervasive issue associated with information exchange is inferential disclosure. It occurs in the following three situations: (1) the exchanged data correlate with publicly available information, (2) the exchanged data comprise patterns similar to those in a sharing partner’s datum, and (3) the shared data’s attributes are interdependent. In this work, we provide and implement new algorithms that impede the third type of inferential attack. They rely on rough set theory to undermine the deductive route from nonsensitive to sensitive features. Our approach comprises three steps which include learning quasi-identifiers, computing a granulation of the underlying information system that maximizes the distribution of sensitive attributes in each granule, and masking the deductive route from nonsensitive to sensitive features. Our routine for learning quasi-identifiers achieves both the largest distinction and separation without an exhaustive search among tuples of features. The learned quasi-identifiers are employed to find a granulation of the information system that strikes a balance between the anonymity of quasi-identifiers and the diversity of sensitive attributes, without solving a difficult optimization problem. We employ this granulation in a strategy similar to that used in k-anonymity to de-identify private information systems.
Similar content being viewed by others
References
Barbaro M, Zeller T, Hansell S (2006) A face is exposed for AOL searcher no. 4417749. New York Times 9(2008)
Bishop M, Bhumiratana B, Crawford R, Levitt K (2004) How to sanitize data. In: Enabling technologies: infrastructure for collaborative enterprises. WET ICE 2004. 13th IEEE International Workshops. IEEE, pp 217–222
Bishop M, Cummins J, Peisert S, Singh A, Bhumiratana B, Agarwal D, Frincke D, Hogarth M (2010) Relationships and data sanitization: a study in scarlet. In: Proceedings of the 2010 New Security Paradigms Workshop. ACM, pp 151–164
Carey P, Gerver KM, Moreno JV, Rockwood E (2016) Potential Risks and Rewards of Cybersecurity Information Sharing Under CISA. The National Law Review
Carr M (2016) Public-private partnerships in national cyber-security strategies. Int Affairs 92(1):43–62
Coull SE, Wright CV, Monrose F, Collins MP, Reiter MK (2007) Playing devil’s advocate: inferring sensitive information from anonymized network traces. NDSS 7:35–47
DeSmit Z, Elhabashy AE, Wells LJ, Camelio JA (2016) Cyber-physical vulnerability assessment in manufacturing systems. Procedia Manuf 5:1060–1074
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Düntsch I, Gediga G (1998) Uncertainty measures of rough set prediction. Artif Intell 106(1):109–137
Evangelakos J (2016) A guide to the cybersecurity act of 2015. Law360
Fidler DP, Pregent R, Vandurme A (2013) NATO, cyber defense, and international law. John’s J. Int’l & Comp. L. 4:1
Gal-Or E, Ghose A (2004a) The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104
Gal-Or E, Ghose A (2004b) The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104
Gilbert G (1972) Distance between sets. Nature 239(5368):174
Greco S, Matarazzo B, Slowinski R (2000) Extension of the rough set approach to multicriteria decision support. INFOR: Inf Syst Oper Res 38(3):161–195
Grzymala-Busse JW (1988) Knowledge acquisition under uncertainty—a rough set approach. J Intell Rob Syst 1(1):3–16
Grzymala-Busse JW (1992) LERS-a system for learning from examples based on rough sets. In: Intelligent decision support Springer pp 3–18
Guha S, Yau SS, Buduru AB (2016) Attack detection in cloud infrastructures using artificial neural network with genetic feature selection. In: Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 2016 IEEE 14th Intl C. IEEE, pp 414–419
Harvey S, Evans D (2016) Defending Against Cyber Espionage: The US Office of Personnel Management Hack as a Case Study in Information Assurance 2016 NCUR
Hausken K (2007) Information sharing among firms and cyber attacks. J Account Publ Policy 26(6):639–688
Huang SY (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory. Springe, New York
Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37:547–579
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Jasper SE (2017) US cyber threat intelligence sharing frameworks. Int J Intell CounterIntell 30(1):53–65
Johnson AL (2016) Cybersecurity for financial institutions: the integral role of information sharing in cyber attack mitigation. NC Banking Inst 20:277
Kopetz H (2011) Internet of things. In: Real-time systems. Springer, pp 307–323
Kosub S (2016) A note on the triangle inequality for the jaccard distance. arXiv preprint arXiv:1612.02696
Kryszkiewicz M (1998) Rough set approach to incomplete information systems. Inf Sci 112(1–4):39–49
Kshetri N (2010) The global cybercrime industry: economic, institutional and strategic perspectives. Springer, New York
Kuncheva LI (1992) Fuzzy rough sets: application to feature selection. Fuzzy Sets Syst 51(2):147–153
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference. IEEE, p 25
Leung Y, Wu WZ, Zhang WX (2006) Knowledge acquisition in incomplete information systems: a rough set approach. Eur J Operat Res 168(1):164–180
Lewis TG (2014) Critical infrastructure protection in homeland security: defending a networked nation. Wiley, Chichester
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on IEEE pp 106–115
Liu H, Cocea M (2017) Fuzzy information granulation towards interpretable sentiment analysis. Granular Comput 1(4):289–302
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1):3
McGuire M, Dowling S (2013) Cyber crime: A review of the evidence. Summary of key findings and implications, Home Office Research report, p 75
Motwani R, Xu Y (2007) Efficient algorithms for masking and finding quasi-identifiers. In: Proceedings of the Conference on Very Large Data Bases (VLDB), pp 83–93
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Operat Res 99(1):48–57
Pedrycz W, Chen SM (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type. Springer, Heidelberg
Pedrycz W, Chen SM (2015a) Granular computing and decision-making: interactive and iterative approaches. Springer, Heidelberg
Pedrycz W, Chen SM (2015b) Information granularity, big data, and computational intelligence. Springer, Heidelberg
Peters JF, Skowron A, Synak P, Ramanna S (2003) Rough sets and information granulation. In: International Fuzzy Systems Association World Congress. Springer, pp 370–377
Polkowski L (2013) Rough sets in knowledge discovery 2: applications, case studies and software systems. Physica
Polkowski L, Tsumoto S, Lin TY (2012) Rough set methods and applications: new developments in knowledge discovery in information systems. Physica
Prasser F, Kohlmayer F, Spengler H, Kuhn KA (2018) A scalable and pragmatic method for the safe sharing of high-quality health data. IEEE J Biomed Health Inf 22(2):611–622
Qian Y, Liang J, Pedrycz W, Dang C (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression Technical report Technical report, SRI International
Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, Princeton
Singer N (2009) When 2+ 2 equals a privacy question. New York Times
Skare P, Falk H, Rice M, Winkel J (2016) In the face of cybersecurity: how the common information model can be used. IEEE Power Energy Mag 14(1):94–104
Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588
Sweeney L (2002b) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10(05):557–570
Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recognit Lett 24(6):833–849
Syed Z, Padia A, Finin T, Mathews ML, Joshi A (2016) UCO: A Unified Cybersecurity Ontology. In: AAAI Workshop – Artificial Intelligence for Cyber Security
Tan PN, Steinbach M, Kumar V (2005) Association analysis: basic concepts and algorithms. Introduction to Data mining pp 327–414
Thornton-Trump I (2018) Malicious attacks and actors: an examination of the modern cyber criminal. EDPACS 57(1):17–23
Tsumoto S (1998) Automated extraction of medical expert system rules from clinical databases based on rough set theory. Inf Sci 112(1–4):67–84
Wahl RS (2016) Latency in intrusion detection systems (IDS) and cyber-attacks: a quantitative comparative study PhD thesis Capella University
Weiss NE (2015) Legislation to facilitate cybersecurity information sharing: Economic analysis. Congressional Research Service
Wolfram S (2017) Mathematica, Version 11.1. Wolfram Research Inc., Champaign, Illinois
Xu W, Li W, Zhang X (2017) Generalized multigranulation rough sets and optimal granularity selection. Granular Comput 2(4):271–88
Xu ZY, Liu ZP, Yang BR, Song W (2006) Quick attribute reduction algorithm with complexity of \(\max (O(| C|| U|), O (| C|^2| U/C|))\). Jisuanji Xuebao (Chinese Journal of Computers) 29(3): 391–399
Yao Y (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271
Yar M (2013) Cybercrime and society. Sage
Zadeh LA (1965) Information and control. Fuzzy sets 8(3):338–353
Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46(1):39–59
Acknowledgements
This work was completed during the Summer Faculty Fellow Program (SFFP) at the Cyber Assurance Branch of the Air Force Lab. We would like to thank all the members of the Host Lab for their assistance.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wafo Soh, C., Njilla, L.L., Kwiat, K.K. et al. Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. Granul. Comput. 5, 71–84 (2020). https://doi.org/10.1007/s41066-018-0127-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41066-018-0127-0