The VLDB Journal

, Volume 19, Issue 3, pp 385–410 | Cite as

Suppressing microdata to prevent classification based inference

Regular Paper


The revolution of the Internet together with the progression in computer technology makes it easy for institutions to collect an unprecedented amount of personal data. This pervasive data collection rally coupled with the increasing necessity of dissemination and sharing of non-aggregated data, i.e., microdata, raised a lot of concerns about privacy. One method to ensure privacy is to selectively hide the confidential, i.e. sensitive, information before disclosure. However, with data mining techniques, it is now possible for an adversary to predict the hidden confidential information from the disclosed data sets. In this paper, we concentrate on one such data mining technique called classification. We extend our previous work on microdata suppression to prevent both probabilistic and decision tree classification based inference. We also provide experimental results showing the effectiveness of not only the proposed methods but also the hybrid methods, i.e., methods suppressing microdata against both classification models, on real-life data sets.


Privacy Disclosure protection Data suppression Data perturbation Data mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wikipedia,: Privacy—Wikipedia the free encyclopedia. Available at (2005)
  2. 2.
    Report to Congress regarding the Terrorism Information Awareness Program, May 20, 2003Google Scholar
  3. 3.
    O’Leary D.E.: Knowledge discovery as a threat to database security. In: Piatetsky-Shapiro, G., Frawley, W. (eds) Knowledge Discovery in Databases, pp. 507–516. AAAI Press, MIT Press, Menlo Park, California (1991)Google Scholar
  4. 4.
    O’Leary D.E.: Some privacy issues in knowledge discovery: the OECD personal privacy guidelines. IEEE Expert: Intelligent Syst. Appl. 10(2), 48–52 (1995)MathSciNetGoogle Scholar
  5. 5.
    Klosgen, W.: Knowledge discovery in databases and data privacy. IEEE Expert, April 1995Google Scholar
  6. 6.
    Piatetsky-Shapiro, G.: Knowledge discovery in databases vs. personal privacy. IEEE Expert, April 1995Google Scholar
  7. 7.
    Selfridge, P.: Privacy and knowledge discovery in databases. IEEE Expert, April 1995Google Scholar
  8. 8.
    Azgın Hintoǧlu, A., Saygın, Y.: Suppressing microdata to prevent probabilistic classification based inference. In: Proceedings of the Workshop on Secure Data Management (SDM’05) (2005)Google Scholar
  9. 9.
    Cox L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75(370), 377–385 (1980)MATHCrossRefGoogle Scholar
  10. 10.
    Sande, G.: Automated cell suppression to reserve confidentiality of business statistics. In: Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 346–353 (1983)Google Scholar
  11. 11.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. IEEE Sympos. Res. Security Privacy (1998)Google Scholar
  12. 12.
    Samarati P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  13. 13.
    Sweeney L.: k-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Sys. 10(5), 557–570 (2002)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    USC Annenberg School—Center for the Digital Future: The Highlights of the Digital Future Report, Year Five, Ten Years Ten Trends. Available at
  15. 15.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. SIGKDD (2002)Google Scholar
  16. 16.
    Øhrn A., Ohno-Machado L.: Using boolean reasoning to anonymize databases. Artif. Intell. Med. 15(3), 235–254 (1999)CrossRefGoogle Scholar
  17. 17.
    Nissenbaum H.: Protecting privacy in an information age: the problem of privacy in public. Law Philos. 17, 559–596 (1998)Google Scholar
  18. 18.
    Sweeney, L.: Information Explosion. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Zayatz, L., Doyle, P., Theeuwes, J., Lane, J., (eds.) Urban Institute, Washington, DC (2001)Google Scholar
  19. 19.
    Dreiseitl, S., Vinterbo, S., Ohno-Machado, L.: Disambiguation data: extracting information from anonymized sources. In: Proceedings of the 2001 American Medical Informatics Annual Symposium, pp. 144–148 (2001)Google Scholar
  20. 20.
    Aggarwal, C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st VLDB Conference (2005)Google Scholar
  21. 21.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-Diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (2006)Google Scholar
  22. 22.
    Cover T.M., Thomas J.A.: Elements of information theory. Wiley, New York (1991)MATHCrossRefGoogle Scholar
  23. 23.
    UCI Machine Learning Repository,
  24. 24.
    Mangasarian O.L., Wolberg W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)Google Scholar
  25. 25.
    Tsochantaridis I., Joachims T., Hofmann T., Altun Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetGoogle Scholar
  26. 26.
    Adam N.R., Wortmann J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)CrossRefGoogle Scholar
  27. 27.
    Denning D.E.: Cryptography and Data Security. Addison-Wesley, (1982)Google Scholar
  28. 28.
    Domingo-Ferrer, J.: (eds) Inference control in statistical databases. Lecture Notes in Computer Science, vol. 2316. Springer-Verlag, Berlin (2002)Google Scholar
  29. 29.
    Farkas, C., Jajodia, S.: The inference problem: a survey. SIGKDD Explorations (2003)Google Scholar
  30. 30.
    Geurts, J.: Heuristics for cell suppression in tables. Technical Paper, Netherlands Central Bureau of Statistics (1992)Google Scholar
  31. 31.
    Kao M.Y.: Data security equals graph connectivity. SIAM J. Discret. Math. 9, 87–100 (1996)MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Kelly J.P., Golden B.L., Assad A.A.: Cell suppression: disclosure protection for sensitive tabular data. Networks 22, 397–417 (1992)MATHCrossRefGoogle Scholar
  33. 33.
    Fischetti M., Salazar J.J.: Models and algortihms for the 2-dimensional cell suppression problem in statistical disclosure control. Math. Program. 84, 283–312 (1999)MATHMathSciNetGoogle Scholar
  34. 34.
    Fischetti M., Salazar J.J.: Models and algorithms for optimizing cell suppression in tabular data with linear constraints. J. Am. Stat. Assoc. 95(451), 916–928 (2000)CrossRefGoogle Scholar
  35. 35.
    Willenborg, L., De Waal, T.: Statistical disclosure control in practice. Lecture Notes in Statistics, vol. 111. Springer Verlag, New York (1996)Google Scholar
  36. 36.
    Domingo-Ferrer J., Torra V.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRefGoogle Scholar
  37. 37.
    Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.), Privacy in Statistical Databases, vol. 3050, pp. 162–174 (2004)Google Scholar
  38. 38.
    Oganian A., Domingo-Ferrer J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U.N. Econ. Comm. Eur. 18(4), 345–354 (2001)Google Scholar
  39. 39.
    Solanas, A., Martinez-Balleste, A., Mateo-Sanz, J.M., Domingo-Ferrer, J.: Towards microaggregation with genetic algorithms. In: Proceedings of the Third IEEE Conference on Intelligent Systems, pp. 65–70 (2006)Google Scholar
  40. 40.
    Martinez-Balleste, A., Solanas, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A Genetic approach to multivariate microaggregation for database privacy. In: Proceedings of 23rd IEEE Internation Conference on Data Engineering, pp. 180–185 (2007)Google Scholar
  41. 41.
    Hansen S.L., Mukherjee S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)CrossRefGoogle Scholar
  42. 42.
    Laszlo M., Mukherjee S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)CrossRefGoogle Scholar
  43. 43.
    Sande G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. Uncertain. Fuzziness Knowl. Syst. 10(5), 459–476 (2002)MATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Jajodia, S., Meadows, C. : Inference problems in multilevel secure database management systems. In: Abrams, M.D., Jajodia, S., Podell,H.J. (eds.) Information Security—An Integrated Collection of Essays, pp. 570–584. IEEE C. S. Press, (1989)Google Scholar
  45. 45.
    Quian, X., Stickel, M.E., Karp, P.D., Lunt, T.F., Garvey, T.D.: Detection and elimination of inference channels in multilevel relational database systems. In: Proceedings of IEEE Symp. Security and Privacy, pp. 196–205 (1993)Google Scholar
  46. 46.
    Stachour P., Thuraisingham B.: Design of LDV: A multilevel secure relational database management system. IEEE Trans. Knowl. Data Eng. 2(2), 190–209 (1990)CrossRefGoogle Scholar
  47. 47.
    Su T., Ozsoyoglu G.: Inference in MLS database systems. IEEE Trans. Knowl. Data Eng. 3(2–3), 147–168 (1991)Google Scholar
  48. 48.
    Marks D.: Inference in MLS database systems. IEEE Trans. Knowl. Data Eng. 8(1), 46–55 (1996)CrossRefGoogle Scholar
  49. 49.
    Delugach H., Hinke T.: Wizard: A database inference analysis and detection system. IEEE Trans. Knowl. Data Eng. 8(1), 56–66 (1996)CrossRefGoogle Scholar
  50. 50.
    Hinke T., Delugach H., Wolf R.P.: Protecting databases from inference attacks. Comput. Secur. 16(8), 687–708 (1997)CrossRefGoogle Scholar
  51. 51.
    Dawson, S., di Vimercati, S.D.C., Lincoln, P., Samarati, P.: Minimal data upgrading to prevent inference and association. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 114–125. ACP Press, (1999)Google Scholar
  52. 52.
    Brodsky A., Farkas C., Jajodia A.: Secure databases: Constraints, inference channels and monitoring disclosure. IEEE Trans. Knowl. Data Eng. 12(6), 900–919 (2000)CrossRefGoogle Scholar
  53. 53.
    Hinke T.H., Delugach H.S., Chandrasekhar A.: A fast algorithm for detecting second paths in database inference analysis. J. Comput. Secur. 3(2, 3), 147–168 (1995)Google Scholar
  54. 54.
    Denning, D.: Commutative filters for reducing inference threats in multilevel database systems. In: Proceedings of IEEE Symposium on Security and Privacy, pp. 134–146 (1985)Google Scholar
  55. 55.
    Thuraisingham B.: Security checking in relational database management systems augmented with inference engines. Comput. Secur. 6, 479–492 (1987)CrossRefGoogle Scholar
  56. 56.
    Domingo-Ferrer J., Torra V.: Ordinal, continious and heteregoneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)CrossRefMathSciNetGoogle Scholar
  57. 57.
    Domingo-Ferrer, J., Solanas, A., Martinez-Balleste, A.: Privacy in statistical databases:k-anonymity through microaggregation. In: Proceedings of IEEE Granular Computing (2006)Google Scholar
  58. 58.
    Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 466–473 (2005)Google Scholar
  59. 59.
    Chang, L., Moskowitz, I.S.: Parsimonious downgrading and decision trees applied to the inference problem. In: Proceedings of the Workshop of New Security Paradigms, pp. 82–89 (1999)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Sabancı UniversityIstanbulTurkey

Personalised recommendations