Skip to main content
Log in

Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach

  • Original Paper
  • Published:
Granular Computing Aims and scope Submit manuscript

Abstract

The challenging and pervasive issue associated with information exchange is inferential disclosure. It occurs in the following three situations: (1) the exchanged data correlate with publicly available information, (2) the exchanged data comprise patterns similar to those in a sharing partner’s datum, and (3) the shared data’s attributes are interdependent. In this work, we provide and implement new algorithms that impede the third type of inferential attack. They rely on rough set theory to undermine the deductive route from nonsensitive to sensitive features. Our approach comprises three steps which include learning quasi-identifiers, computing a granulation of the underlying information system that maximizes the distribution of sensitive attributes in each granule, and masking the deductive route from nonsensitive to sensitive features. Our routine for learning quasi-identifiers achieves both the largest distinction and separation without an exhaustive search among tuples of features. The learned quasi-identifiers are employed to find a granulation of the information system that strikes a balance between the anonymity of quasi-identifiers and the diversity of sensitive attributes, without solving a difficult optimization problem. We employ this granulation in a strategy similar to that used in k-anonymity to de-identify private information systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Barbaro M, Zeller T, Hansell S (2006)  A face is exposed for AOL searcher no. 4417749. New York Times 9(2008)

  • Bishop M, Bhumiratana B, Crawford R, Levitt K (2004)  How to sanitize data. In: Enabling technologies: infrastructure for collaborative enterprises. WET ICE 2004. 13th IEEE International Workshops. IEEE, pp 217–222

  • Bishop M, Cummins J, Peisert S, Singh A, Bhumiratana B, Agarwal D, Frincke D, Hogarth M (2010) Relationships and data sanitization: a study in scarlet. In: Proceedings of the 2010 New Security Paradigms Workshop. ACM, pp 151–164

  • Carey P, Gerver KM, Moreno JV, Rockwood E (2016)  Potential Risks and Rewards of Cybersecurity Information Sharing Under CISA. The National Law Review

  • Carr M (2016) Public-private partnerships in national cyber-security strategies. Int Affairs 92(1):43–62

    Google Scholar 

  • Coull SE, Wright CV, Monrose F, Collins MP, Reiter MK (2007) Playing devil’s advocate: inferring sensitive information from anonymized network traces. NDSS 7:35–47

    Google Scholar 

  • DeSmit Z, Elhabashy AE, Wells LJ, Camelio JA (2016) Cyber-physical vulnerability assessment in manufacturing systems. Procedia Manuf 5:1060–1074

    Google Scholar 

  • Dheeru D, Karra Taniskidou E (2017)  UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Düntsch I, Gediga G (1998) Uncertainty measures of rough set prediction. Artif Intell 106(1):109–137

    MathSciNet  MATH  Google Scholar 

  • Evangelakos J (2016)  A guide to the cybersecurity act of 2015. Law360

  • Fidler DP, Pregent R, Vandurme A (2013) NATO, cyber defense, and international law. John’s J. Int’l & Comp. L. 4:1

    Google Scholar 

  • Gal-Or E, Ghose A (2004a)  The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104

  • Gal-Or E, Ghose A (2004b)  The economic consequences of sharing security information. In: Economics of information security. Springer, pp 95–104

  • Gilbert G (1972) Distance between sets. Nature 239(5368):174

    Google Scholar 

  • Greco S, Matarazzo B, Slowinski R (2000) Extension of the rough set approach to multicriteria decision support. INFOR: Inf Syst Oper Res 38(3):161–195

    MATH  Google Scholar 

  • Grzymala-Busse JW (1988) Knowledge acquisition under uncertainty—a rough set approach. J Intell Rob Syst 1(1):3–16

    Google Scholar 

  • Grzymala-Busse JW (1992)  LERS-a system for learning from examples based on rough sets. In: Intelligent decision support Springer pp 3–18

    Google Scholar 

  • Guha S, Yau SS, Buduru AB (2016)  Attack detection in cloud infrastructures using artificial neural network with genetic feature selection. In: Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 2016 IEEE 14th Intl C. IEEE, pp 414–419

  • Harvey S, Evans D (2016)  Defending Against Cyber Espionage: The US Office of Personnel Management Hack as a Case Study in Information Assurance 2016 NCUR

  • Hausken K (2007) Information sharing among firms and cyber attacks. J Account Publ Policy 26(6):639–688

    Google Scholar 

  • Huang SY (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory. Springe, New York

    Google Scholar 

  • Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37:547–579

    Google Scholar 

  • Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50

    Google Scholar 

  • Jasper SE (2017) US cyber threat intelligence sharing frameworks. Int J Intell CounterIntell 30(1):53–65

    MathSciNet  Google Scholar 

  • Johnson AL (2016) Cybersecurity for financial institutions: the integral role of information sharing in cyber attack mitigation. NC Banking Inst 20:277

    Google Scholar 

  • Kopetz H (2011)  Internet of things. In: Real-time systems. Springer, pp 307–323

  • Kosub S (2016)  A note on the triangle inequality for the jaccard distance. arXiv preprint arXiv:1612.02696

  • Kryszkiewicz M (1998) Rough set approach to incomplete information systems. Inf Sci 112(1–4):39–49

    MathSciNet  MATH  Google Scholar 

  • Kshetri N (2010) The global cybercrime industry: economic, institutional and strategic perspectives. Springer, New York

    Google Scholar 

  • Kuncheva LI (1992) Fuzzy rough sets: application to feature selection. Fuzzy Sets Syst 51(2):147–153

    MathSciNet  Google Scholar 

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2005)  Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, pp 49–60

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2006)  Mondrian multidimensional k-anonymity. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference. IEEE, p 25

  • Leung Y, Wu WZ, Zhang WX (2006) Knowledge acquisition in incomplete information systems: a rough set approach. Eur J Operat Res 168(1):164–180

    MathSciNet  MATH  Google Scholar 

  • Lewis TG (2014) Critical infrastructure protection in homeland security: defending a networked nation. Wiley, Chichester

    Google Scholar 

  • Li N, Li T, Venkatasubramanian S (2007)  t-closeness: privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on IEEE pp 106–115

  • Liu H, Cocea M (2017) Fuzzy information granulation towards interpretable sentiment analysis. Granular Comput 1(4):289–302

    Google Scholar 

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1):3

    Google Scholar 

  • McGuire M, Dowling S (2013) Cyber crime: A review of the evidence. Summary of key findings and implications, Home Office Research report, p 75

  • Motwani R, Xu Y (2007)  Efficient algorithms for masking and finding quasi-identifiers. In: Proceedings of the Conference on Very Large Data Bases (VLDB), pp 83–93

  • Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    MATH  Google Scholar 

  • Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Operat Res 99(1):48–57

    MATH  Google Scholar 

  • Pedrycz W, Chen SM (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type. Springer, Heidelberg

    Google Scholar 

  • Pedrycz W, Chen SM (2015a) Granular computing and decision-making: interactive and iterative approaches. Springer, Heidelberg

    Google Scholar 

  • Pedrycz W, Chen SM (2015b) Information granularity, big data, and computational intelligence. Springer, Heidelberg

    Google Scholar 

  • Peters JF, Skowron A, Synak P, Ramanna S (2003) Rough sets and information granulation. In: International Fuzzy Systems Association World Congress. Springer, pp 370–377

    Google Scholar 

  • Polkowski L (2013)  Rough sets in knowledge discovery 2: applications, case studies and software systems. Physica

  • Polkowski L, Tsumoto S, Lin TY (2012)  Rough set methods and applications: new developments in knowledge discovery in information systems. Physica

  • Prasser F, Kohlmayer F, Spengler H, Kuhn KA (2018) A scalable and pragmatic method for the safe sharing of high-quality health data. IEEE J Biomed Health Inf 22(2):611–622

    Google Scholar 

  • Qian Y, Liang J, Pedrycz W, Dang C (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618

    MathSciNet  MATH  Google Scholar 

  • Samarati P, Sweeney L (1998)  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression Technical report Technical report, SRI International

  • Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Singer N (2009) When 2+ 2 equals a privacy question. New York Times

  • Skare P, Falk H, Rice M, Winkel J (2016) In the face of cybersecurity: how the common information model can be used. IEEE Power Energy Mag 14(1):94–104

    Google Scholar 

  • Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588

    MathSciNet  MATH  Google Scholar 

  • Sweeney L (2002b) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10(05):557–570

    MathSciNet  MATH  Google Scholar 

  • Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recognit Lett 24(6):833–849

    MATH  Google Scholar 

  • Syed Z, Padia A, Finin T, Mathews ML, Joshi A (2016) UCO: A Unified Cybersecurity Ontology. In: AAAI Workshop – Artificial Intelligence for Cyber Security

  • Tan PN, Steinbach M, Kumar V (2005)  Association analysis: basic concepts and algorithms. Introduction to Data mining pp 327–414

  • Thornton-Trump I (2018) Malicious attacks and actors: an examination of the modern cyber criminal. EDPACS 57(1):17–23

    Google Scholar 

  • Tsumoto S (1998) Automated extraction of medical expert system rules from clinical databases based on rough set theory. Inf Sci 112(1–4):67–84

    Google Scholar 

  • Wahl RS (2016)  Latency in intrusion detection systems (IDS) and cyber-attacks: a quantitative comparative study PhD thesis Capella University

  • Weiss NE (2015)  Legislation to facilitate cybersecurity information sharing: Economic analysis. Congressional Research Service

  • Wolfram S (2017)  Mathematica, Version 11.1. Wolfram Research Inc., Champaign, Illinois

  • Xu W, Li W, Zhang X (2017) Generalized multigranulation rough sets and optimal granularity selection. Granular Comput 2(4):271–88

    Google Scholar 

  • Xu ZY, Liu ZP, Yang BR, Song W (2006) Quick attribute reduction algorithm with complexity of \(\max (O(| C|| U|), O (| C|^2| U/C|))\). Jisuanji Xuebao (Chinese Journal of Computers) 29(3): 391–399

  • Yao Y (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271

    MATH  Google Scholar 

  • Yar M (2013)  Cybercrime and society. Sage

  • Zadeh LA (1965) Information and control. Fuzzy sets 8(3):338–353

    Google Scholar 

  • Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46(1):39–59

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was completed during the Summer Faculty Fellow Program (SFFP) at the Cyber Assurance Branch of the Air Force Lab. We would like to thank all the members of the Host Lab for their assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Wafo Soh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wafo Soh, C., Njilla, L.L., Kwiat, K.K. et al. Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. Granul. Comput. 5, 71–84 (2020). https://doi.org/10.1007/s41066-018-0127-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41066-018-0127-0

Keywords

Navigation