Data Mining and Knowledge Discovery

, Volume 28, Issue 1, pp 65-91

First online:

The effect of homogeneity on the computational complexity of combinatorial data anonymization

  • Robert BredereckAffiliated withInstitut für Softwaretechnik und Theoretische Informatik, TU Berlin
  • , André NichterleinAffiliated withInstitut für Softwaretechnik und Theoretische Informatik, TU Berlin Email author 
  • , Rolf NiedermeierAffiliated withInstitut für Softwaretechnik und Theoretische Informatik, TU Berlin
  • , Geevarghese PhilipAffiliated withMax-Planck-Institut für Informatik

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number t in of different input rows and the number t out of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and -Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed.


k-Anonymity p-Sensitivity -Diversity Domain generalization hierarchies Matrix modification problems Parameterized algorithmics Fixed-parameter tractability NP-hardness