Journal of Intelligent Information Systems

, Volume 44, Issue 1, pp 107–132 | Cite as

A distributed decision support algorithm that preserves personal privacy

  • George MathewEmail author
  • Zoran Obradovic


Assuring confidentiality of personal information and preserving privacy are vital when data is harvested from multiple institutions for business decision-making. An algorithm that builds knowledge using statistics based on subject data from distributed sites that satisfy specified selection criteria is presented here. The algorithm maintains complete fidelity of information structures in the distributed data compared to the centralized equivalent. Heterogeneous data schemas across sites can be accommodated and thresholds can be set for global minimum saturation for attributes to participate in the prediction model building. Policies for inclusion and exclusion of non-exhaustive attributes among sites are introduced. Unification of attributes is introduced for homogenizing attribute values globally. Results of experiments using data from medical, higher education, and social domains elucidate the value of our algorithm in regulated industries, where shipping raw data outside parent institution is not practical.


Data privacy Privacy-preserving framework Distributed decision support systems 



This research was supported in part by the National Science Foundation through major research instrumentation grant number CNS-09-58854.


  1. Adam, N. R., & Wortman, J. C. (1989). Security control methods for statistical databases. ACM Computing Surveys, 21(4), 515–556.CrossRefGoogle Scholar
  2. Aggarwal, C. C., & Yu, P. S. (2008). Privacy-preserving data mining: Models and algorithms. New York: Springer Science+Business Media, LLC.CrossRefGoogle Scholar
  3. Allaert, F.-A., & Barber, B. (1998). Some Systems Implications of EU data protection directive. European Journal of Information Systems, 7(1), 1–4.CrossRefGoogle Scholar
  4. Bar-Or, A., Keren, D., Schuster, A., & Wolff, R. (2005). Hierarchical decision tree induction in distributed genomic databases. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1138–1151.CrossRefGoogle Scholar
  5. Bialecki, A., Muir, R., & Ingersoll, G. (2012). Apache Lucene 4. ACM SIGIR Workshop on Open Source Information Retrieval (pp. 17–24). Portland, OR, USA.Google Scholar
  6. Brand, R. (2002). Microdata protection through noise addition. Inference Control in Statistical Databases. Lecture Notes in Computer Science, Vol. 2316. Springer-Verlag, Berlin-Heidelberg.Google Scholar
  7. Canetti, R. (1998). Security and composition of multi-party cryptographic protocols. Journal of Cryptography, 2000(13), 143–202.Google Scholar
  8. Caragea, D., Silvescu, A., & Honavar, V. (2004). A framework for learning from distributed data using sufficient statistics and its application to learning decision trees. International Journal on Hybrid Intelligent Systems, 1(1–2), 80–89.zbMATHGoogle Scholar
  9. Chow, C., & Mokbel, M. F. (2011). Trajectory privacy in location-based services and data publication. ACM SIGKDD Explorations: Special Issue on Privacy in Mobility Data Mining, 13(1), 19–29.CrossRefGoogle Scholar
  10. Cieslak, D. A., Hoens, T. R., Chawla, N. V., & Kegelmeyer, W. P. (2012). Hellinger distance decision trees are robust and skew-insensitive. Data Mining and Knowledge Discovery, 24(1), 136–158.CrossRefzbMATHMathSciNetGoogle Scholar
  11. Courtright, C. G. (2001). Criteria for developing clinical decision support systems. 14th IEEE Symposium on Computer-Based Medical Systems (pp. 270 – 275). Bethesda, MD, USA.Google Scholar
  12. Dinur, I., & Nissim, K. (2003). Revealing information while preserving privacy. 22nd ACM Symposium on Principles of Database Systems (PODS) (pp. 202–210). San Diego, CA, USA.Google Scholar
  13. Du, W., & Atallah, M.J. (2001). Secure multi-party computation problems and their applications: A review and open problems. New Security Paradigms Workshop (pp. 11–20). Cloudcroft, NM, USA.Google Scholar
  14. Dwork, C. (2006). Differential privacy. 33rd International Colloquium on Automata, Languages and Programming (ICALP) (pp. 1–12). Venice, Italy.Google Scholar
  15. Frank, A., & Asuncion, A. (2010a). SPECT heart data set, UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
  16. Frank, A., & Asuncion, A. (2010b). Student loan relational data set, UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
  17. Friedman, A., & Schuster, A. (2010). Data mining with differential privacy. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 493–502). Washington D.C., USA.Google Scholar
  18. Fu, Y. (2001). Distributed data mining: An overview. Newsletter of the IEEE Technical Committee on Distributed Processing. Spring 2001, 5–9.Google Scholar
  19. Giannella, C., Liu, K., Olsen, T., & Kargupta, H. (2004). Communication efficient construction of decision trees over heterogeneously distributed data. Fourth IEEE International Conference on Data Mining (pp. 67–74). Brighton, UK.Google Scholar
  20. Goldreich, O. (1998). Secure multi-party computation. Available at
  21. Goldstein, D. E. (2000). e-Healthcare: Harness the power of internet e-commerce & e-care (pp. 417–418). Gaithersberg: Aspen Publishers Inc.Google Scholar
  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutermann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations, 11(1), 10–18.CrossRefGoogle Scholar
  23. Han, J., & Fu, Y. (1994). Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases. AAAI’94 Workshop Knowledge Discovery in Databases (KDD’94) (pp. 157–168). Seattle WA, USA.Google Scholar
  24. Heston, A., Summers, R., & Aten, B. (2009). Penn World Table Version 6.3. Center for International Comparisons of Production, Income and Prices. USA: University of Pennsylvania.Google Scholar
  25. Huang, H., & Dong, Z. (2013). Research on architecture and query performance based on graph database Neo4j. 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet) (pp. 533–536). Xianning, China.Google Scholar
  26. Inan, A., Saygyn, Y., Savas, E., Hintoglu, A. A., & Levi, A. (2006). Privacy preserving clustering on horizontally partitioned data. 22nd International Conference on Data Engineering Workshops, 95. Atlanta, GA, USA.Google Scholar
  27. Kantarcioglu, M. (2008). A survey of privacy-preserving methods across horizontally partitioned data. Advances in Database Systems, 34, 313–335.CrossRefGoogle Scholar
  28. Kantarcioglu, M., & Clifton, C. (2004). Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1026–1037.CrossRefGoogle Scholar
  29. Kantarcioglu, M., Nix, R., & Vaidya, J. (2009). An efficient approximate protocol for privacy-preserving association rule mining. 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD) (pp. 515–524). Bangkok, Thailand.Google Scholar
  30. Karthikeyan, G., & Pais, P. (2010). Clinical judgment and evidence-based medicine: time for reconciliation. Indian Journal of Medical Research, 132(5), 623–626.Google Scholar
  31. Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making,
  32. Khoshgoftaar, T. M. (2005). Identifying noise in attributes of interest. Fourth International Conference on Machine Learning Applications (pp. 55–60). Boca Raton, FL, USA.Google Scholar
  33. Kumbhar, M. N., & Kharat, R. (2012). Privacy preserving mining of association rules on horizontally and vertically partitioned data: A review paper. 12th International Conference on Hybrid Intelligent Systems (HIS), (pp. 231–235). Pune, India.Google Scholar
  34. (2000). Privacy-preserving data mining. Advances in Cryptology – CRYPTO ‘00, Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg. 1880, 36–53.Google Scholar
  35. Lindell, Y., & Pinkas, B. (2009). Secure multiparty computation for privacy-preserving data mining. The Journal of Privacy and Confidentiality, 1(1), 59–98.Google Scholar
  36. Mathew, G., & Obradovic, Z. (2010). Vocabularies in collaboration channels. 6th International Conference on Collaborative Computing: Networking, Applications and Work Sharing (pp. 1–5). Chicago, IL, USA.Google Scholar
  37. Mathew, G., & Obradovic, Z. (2011a). Constraint graphs as security filters for privacy assurance in medical transactions. 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (pp. 502–504). Chicago, IL, USA.Google Scholar
  38. Mathew, G., & Obradovic, Z. (2011b). A privacy-preserving framework for distributed clinical decision support. 1st IEEE International Conference on Computational Advances in Bio and medical Sciences (pp. 129–134). Orlando, FL, USA.Google Scholar
  39. Mathew, G., & Obradovic, Z. (2012). Distributed privacy preserving decision system for predicting hospitalization risks in hospitals with insufficient data. Machine Learning in Health Informatics Workshop: International Conference on Machine Learning Applications - ICMLA (pp. 178–183). Boca Raton, FL, USA.Google Scholar
  40. Mathew, G. & Obradovic, Z. (2013). Improving computational efficiency for personalized medical applications in mobile cloud computing environment. IEEE International Conference on Healthcare Informatics, The First Workshop on Mobile Cloud Computing in Healthcare (pp. 535–540). Philadelphia, PA, USA.Google Scholar
  41. Moret, B. M. E. (1982). Decision trees and diagrams. ACM Computing Surveys, 14(4), 593–623.CrossRefGoogle Scholar
  42. Navathe, S., Ceri, S., Wiederhold, G., & Dou, J. (1984). Vertical partitioning algorithms for database design. ACM Transactions on Database Systems, 9(4), 680–710.Google Scholar
  43. Park, B-H., & Kargupta, H. (2003). Distributed data mining: Algorithms, systems and applications. In N. Ye (Ed.), The handbook of data mining (pp. 341–358). Lawrence Erlbaum Associates.Google Scholar
  44. Pinkas, B. (2002). Cryptographic techniques for privacy-preserving data mining. SIGKDD Explorations, 4(2), 12–19.CrossRefGoogle Scholar
  45. Quinlan, J. R. (1986). Introduction to decision trees. Machine Learning, 1, 81–106.Google Scholar
  46. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publishers.Google Scholar
  47. Rockwell, R. C., & Abeles, R. P. (1998). Sharing and archiving data is fundamental to scientific progress. Journal of Gerontology Series B: Psychological Sciences and Social Sciences., 53(1), S5–S8.CrossRefGoogle Scholar
  48. Samarati, P. (2001). Protecting respondents’ identities in Microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010–1027.CrossRefGoogle Scholar
  49. Silva, J. C. D., Klusch, M., Lodi, S., & Moro, G. (2004). Inference attacks in peer-to-peer homogeneous distributed data mining. 16th European Conference on Artificial Intelligence (ECAI) (pp. 450–454). Valencia, Spain.Google Scholar
  50. Spirit and Power: A 10-Country Survey of Pentecostals. (2006). Available at:
  51. Sweeney, L. (2010). Data Sharing Under HIPAA: 12 Years Later. Advance HIT Project. White paper 1006. USA: Harvard University.Google Scholar
  52. Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston: Pearson Addison Wesley.Google Scholar
  53. Vaidya, J., & Clifton, C. (2003a). Privacy-preserving k-means Clustering over Vertically Partitioned Data. ACM SIGKDD International Conference on Knowledge Discovery and Data (pp. 206–215). Washington, DC, USA.Google Scholar
  54. Vaidya, J., & Clifton, C. (2003b). Leveraging the “Multi” in secure multi-party computation, ACM Workshop on Privacy in the Electronic Society (pp. 53–59). Washington, DC, USA.Google Scholar
  55. Vaidya, J., & Clifton, C. (2005). Privacy-preserving decision trees over vertically partitioned data. Lecture Notes in Computer Science, Springer, Berlin-Heidelberg. 3654, 139–152.Google Scholar
  56. Vaidya, J., & Clifton, C. (2009). Privacy-preserving Kth element score over vertically partitioned data. IEEE Transactions on Knowledge and Data Engineering, 21(2), 253–258.CrossRefGoogle Scholar
  57. Verykios, V. S., Bertino, E., Fovino, I. N., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Record, 33(1), 50–57.CrossRefGoogle Scholar
  58. Vest, J. R., & Gamm, L. D. (2010). Health information exchange: persistent challenges and new strategies. Journal of American Medical Association, 17(3), 288–294.Google Scholar
  59. Wu, Y., Jiang, X., & Ohno-Machado, L. (2012). Grid Binary LOgic REgression (GLORE): building shared models without sharing data. Journal of American Medical Informatics Association, 19(5), 758–764.CrossRefGoogle Scholar
  60. Xu, Z. (2011). Classification of privacy-preserving distributed data mining protocols. 6th International Conference on Digital Information Management (pp. 337–342). Melbourne, Australia.Google Scholar
  61. Yao, A. C. (1986). How to generate and exchange secrets. 27th IEEE Symposium on Foundations of Computer Science (pp. 162–167). Toronto, Canada.Google Scholar
  62. Yu, H., Vaidya, J., & Jiang, X. (2006). Privacy-preserving svm classification on vertically partitioned data. Advances in Knowledge Discovery and Data Mining, 3918, 647–656.CrossRefGoogle Scholar
  63. Zheleva, E., & Getoor, L. (2007). Preserving the privacy of sensitive relationships in graph data. Privacy, Security and Trust in KDD, First ACM SIGKDD International Workshop (PinKDD), (pp. 153–171). San Jose, CA, USA.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Center for Data Analytics and Biomedical InformaticsTemple UniversityPhiladelphiaUSA

Personalised recommendations