Gkoulalas–Divanis & Verykios in  introduce the first exact methodology to strategically perform sensitive frequent itemset hiding based on a new notion of hybrid database generation. This approach broadens the regular process of data sanitization (as introduced in  and adopted by the overwhelming majority of researchers [10, 47, 55]) by applying an extension to the original database instead of either modifying existing transactions (directly or through the application of transformations), or rebuilding the dataset from scratch to accommodate sensitive knowledge hiding. The extended portion of the database contains a set of carefully crafted transactions that achieve to lower the importance of the sensitive patterns to a degree that they become uninteresting from the perspective of the data mining algorithm, while minimally affecting the importance of the nonsensitive ones. The hiding process is guided by the need to maximize the data utility of the sanitized database by introducing the least possible amount of side-effects. The released database, which consists of the initial part (original database) and the extended part (database extension), can guarantee the protection of the sensitive knowledge, when mined at the same or higher support as the one used in the original database.
KeywordsSafety Margin Hybrid Algorithm Frequent Itemsets Original Database Database Extension
Unable to display preview. Download preview PDF.