Abstract
Classification is the problem of identifying a set of categories where new data belong, on the basis of a set of training data whose category membership is known. Its application is wide-spread, such as the medical science domain. The issue of the classification knowledge protection has been paid attention increasingly in recent years because of the popularity of cloud environments. In the paper, we propose a Shaking Sorted-Sampling (triple-S) algorithm for protecting the classification knowledge of a dataset. The triple-S algorithm sorts the data of an original dataset according to the projection results of the principal components analysis so that the features of the adjacent data are similar. Then, we generate noise data with incorrect classes and add those data to the original dataset. In addition, we develop an effective positioning strategy, determining the added positions of noise data in the original dataset, to ensure the restoration of the original dataset after removing those noise data. The experimental results show that the disturbance effect of the triple-S algorithm on the CLC, MySVM, and LibSVM classifiers increases when the noise data ratio increases. In addition, compared with existing methods, the disturbance effect of the triple-S algorithm is more significant on MySVM and LibSVM when a certain amount of the noise data added to the original dataset is reached.
Similar content being viewed by others
References
Aggarwal, A., Rani, R., and Dhir, R., Recognition of devanagari handwritten numerals using gradient features and SVM. Int. J. Comput. Appl. 48(8):39–44, 2012. doi:10.5120/7371-0151.
Jahromia, M. Z., Parvinniab, E., and Johna, R., A method of learning weighted similarity function to improve the performance of nearest neighbor. Inform. Sci. 179(17):2964–2973, 2009. doi:10.1016/j.ins.2009.04.012.
Martens, D., Baesens, B., and Gestel, T. V., Decompositional rule extraction from support vector machines by active learning. IEEE Trans. Knowl. Data Eng 21(2):178–191, 2009. doi:10.1109/TKDE.2008.131.
Li, Y., Chen, M., Li, Q., and Zhang, W., Enabling multilevel trust in privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 24(9):1598–1612, 2012. doi:10.1109/TKDE.2011.124.
Li, X. B., and Sarkar, S., A tree-based data perturbation approach for privacy-preserving data mining. IEEE Trans. Knowl. Data Eng. 18(9):1278–1283, 2006. doi:10.1109/TKDE.2006.136.
Pearson, S., and Yee, G., Privacy and security for cloud computing. Springer, Heidelberg, 2003.
Geng, R., Bose, I., and Chen, X., Prediction of financial distress: An empirical study of listed chinese companies using data mining. Eur. J. Oper. Res. 241(1):236–247, 2014. doi:10.1016/j.ejor.2014.08.016.
Hou, L., Yang, S., and Chen, Z., The use of data mining techniques and support vector regression for financial forecasting. Int. J. Database Theory Appl. 6(4):145–156, 2013.
Oreski, S., and Oreski, G., Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4):2052–2064, 2014. doi:10.1016/j.eswa.2013.09.004.
Yang, W., and Qiao, S., A novel anonymization algorithm: Privacy protection and knowledge preservation. Expert Syst. Appl. 37(1):756–766, 2010. doi:10.1016/j.eswa.2009.05.097.
Bacardit, J., and Llorà, X., Large-scale data mining using genetics-based machine learning. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 3(1):37–61, 2013. doi:10.1002/widm.1078.
Chen, T. S., Lin, C. C., Chiu, Y. H., Lin, H. L., and Chen, R. C., A new binary classifier: Clustering-launched classification. Lect. Notes in Comput. Sci. 4114:278–283, 2006. doi:10.1007/11816171_35.
Chen, T. S., Chen, J., Lin, Y. C., Tsai, Y. C., Kao, Y. H., and Wu, K., A novel knowledge protection technique base on support vector machine model for anti-classification. Electr. Eng. Control 98:517–524, 2011. doi:10.1007/978-3-642-21765-4_63.
Clifton, C., Kantarcioglu, M., and Vaidya, J., Defining privacy for data mining. Proceedings of the National Science Foundation Workshop on Next Generation Data Mining 126–133, 2002.
Gkoulalas-Divanis, A., and Verykios, V. S., Exact knowledge hiding through database extension. IEEE Trans. Knowl. Data Eng. 21(5):699–713, 2009. doi:10.1109/TKDE.2008.199.
Bertino, E., Ghinita, G., Kantarcioglu, M., Nguyen, D., Park, J., Sandhu, R., Sultana, S., Thuraisingham, B, and Xu, S., A roadmap for privacy-enhanced secure data provenance. Journal of Intelligent Information Systems 43(3): 481–501, 2014. doi:10.1007/s10844-014-0322-7
Goldberg, M., Cloud security alliance lists 10 big data security challenges. http://data-informed.com/cloudsecurity-alliance-lists-10-big-data-security-challenges/. Accessed 3 September 2013.
Hubbard, D., and Sutton, M., Top threats to cloud computing V1. 0. Cloud Security Alliance, 2010.
Jansen, W. A., Cloud hooks: Security and privacy issues in cloud computing. Proceedings of the 44th Hawaii International Conference on System Sciences 1–10, 2011. doi:10.1109/HICSS.2011.103.
Kshetri, N., Privacy and security issues in cloud computing: The role of institutions and institutional evolution. Telecommun. Pol. 37(4):372–386, 2013. doi:10.1016/j.telpol.2012.04.011.
Rong, C., Nguyen, S. T., and Jaatun, M. G., Beyond lightning: A survey on security challenges in cloud computing. Comput. Electr. Eng. 39(1):47–54, 2013. doi:10.1016/j.compeleceng.2012.04.015.
Subashini, S., and Kavitha, V., A survey on security issues in service delivery models of cloud computing. J. Netw. Comput. Appl. 34(1):1–11, 2011. doi:10.1016/j.jnca.2010.07.006.
Bianchi, T., Piva, A., and Barni, M., On the implementation of the discrete Fourier transform in the encrypted domain. IEEE Trans. Inf. Forensic Secur 4(1):86–97, 2009. doi:10.1109/TIFS.2008.2011087.
Hao, Z., Zhong, S., and Yu, N., A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability. IEEE Trans. Knowl. Data Eng. 23(9):1432–1437, 2011. doi:10.1109/TKDE.2011.62.
Sasikala, I. S., and Banu, N., Privacy preserving data mining using piecewise vector quantization (PVQ). Int. J. Adv. Res. Comput. Sci. Technol. 2(3):302–306, 2014.
Chun, J. Y., Hong, D., Jeong, I. R., and Lee, D. H., Privacy-preserving disjunctive normal form operations on distributed sets. Inform. Sci. 231:113–122, 2013. doi:10.1016/j.ins.2011.07.003.
Fung, B., Wang, K., Chen, R., and Yu, P. S., Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. (CSUR) 42(4):14, 2010. doi:10.1145/1749603.1749605.
Liu, K., and Kargupta, H., Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1):92–106, 2006. doi:10.1109/TKDE.2006.14.
Matatov, N., Rokach, L., and Maimon, O., Privacy-preserving data mining: A feature set partitioning approach. Inform. Sci. 180(14):2696–2720, 2010. doi:10.1016/j.ins.2010.03.011.
Zhu, D., Li, X. B., and Wu, S., Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining. Decis. Support. Syst. 48(1):133–140, 2009. doi:10.1016/j.dss.2009.07.003.
Li, T., Li, N., Zhang, J., and Molloy, I., Slicing: A new approach for privacy preserving data publishing. IEEE Trans. Knowl. Data Eng. 24(3):561–574, 2012. doi:10.1109/TKDE.2010.236.
Chen, T. S., Chen, J., Lin, Y. C., and Tsai, Y. C., Research to protect database by shaking random sampling interference (SRSI). Proceedings of the 2009 Global Congress on Intelligent Systems 569–572, 2009. doi:10.1109/GCIS.2009.384.
Chen, T. S., Chen, J., Kao, Y. H., and Hsieh, T. C., A novel anti-data mining technique based on hierarchical anti-clustering (HAC). Proceedings of the Eighth International Conference on Intelligent Systems Design and Applications 426–430, 2008. doi:10.1109/ISDA.2008.155
Abdi, H., and Williams, L. J., Principal component analysis. Comput. Stat. 2(4):433–459, 2010. doi:10.1002/wics.101.
Lin, J. S., Tien, S. W., Chen, T. S., Kao, Y. H., Lin, C. C., and Chiu, Y. H., Referential hierarchical clustering algorithm based upon principal component analysis and genetic algorithm. Proceedings of the WSEAS International Conference on Applied Computer Science 139–143, 2007.
Rüping, S., mySVM-manual. University of Dortmund, 2000.
Chang, C. C., and Lin, C. J., LIBSVM: a library for support vector machines. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 2 June 2008.
Frank, A., and Asuncion, A., UCI Machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 6 September 2010.
Acknowledgments
This work was supported partially by the Ministry of Science and Technology of Republic of China under grant MOST 103-2221-E-025-007 and 102-2218-E-025-001.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Transactional Processing Systems
Rights and permissions
About this article
Cite this article
Lin, CY., Chen, TS., Tsai, HF. et al. A Novel Anti-classification Approach for Knowledge Protection. J Med Syst 39, 113 (2015). https://doi.org/10.1007/s10916-015-0305-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-015-0305-4