Implementation of Bitmap Based Incognito and Performance Evaluation
In the era of the Internet, more and more privacy-sensitive data is published online. Even though this kind of data are published with sensitive attributes such as name and social security number removed, the privacy can be revealed by joining those data with some other external data. This technique is called joining attack. Among many techniques developed against the joining attack, the k-anonymization generalizes and/or suppresses some portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. Incognito is one of the most efficient k-anonymization algorithms. However, Incognito requires many repeating sorts against large volume data. In this paper, we propose a bitmap based Incognito algorithm. Using the bitmap technique, we can completely eliminate the expensive sort operations, and can even prune some steps in the traditional Incognito algorithm. Therefore, our new algorithm can improve the performance by an order of magnitude. From the perspective of implementation, the key issue in bitmap based Incognito is the speed of bitwise AND/OR and bit-count operations. For this, we designed and implemented a bitmap package which exploits the Single Instruction Multiple Data technique. Our experimental result shows that bitmap-based Incognito outperforms the traditional Incognito by an order of magnitude.
KeywordsBinary Search Single Instruction Multiple Data Social Security Number Test Node Generalization Lattice
Unable to display preview. Download preview PDF.
- 1.LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the ACM SIGMOD international conference on Management of data, Baltimore, Maryland, pp. 49–60 (2005)Google Scholar
- 3.Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)Google Scholar
- 4.Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the ACM SIGMOD international conference on Management of data, Madison, Wisconsin, pp. 145–156 (2002)Google Scholar
- 6.Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of Proceedings of the 32nd International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487–499 (1994)Google Scholar
- 7.Bayardo, R.J., Agrawal, R.: Data Privacy through Optimal k-Anonymization. In: Proceedings of the 21st International Conference on Data Engineering, pp. 217–228 (2005)Google Scholar
- 8.Lewis, J.: Cost-Based Oracle Fundamentals. Apress, Berkeley (2005)Google Scholar
- 9.Test Data from: http://vldb.skku.ac.kr/mbar/files/