Skip to main content
Log in

Protecting business intelligence and customer privacy while outsourcing data mining tasks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Nowadays data mining plays an important role in decision making. Since many organizations do not possess the in-house expertise of data mining, it is beneficial to outsource data mining tasks to external service providers. However, most organizations hesitate to do so due to the concern of loss of business intelligence and customer privacy. In this paper, we present a Bloom filter based solution to enable organizations to outsource their tasks of mining association rules, at the same time, protect their business intelligence and customer privacy. Our approach can achieve high precision in data mining by trading-off the storage requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 247–255

  2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of database pp 207–216

  3. Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: Proceedings of the ACM SIGMOD ICMD, pp 563–574

  4. Agrawal R, Srikant R (1994) Faster algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB’94), Santiago de Chile, Chile, September 12–15, pp 487–499

  5. Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of database, Texas, USA, May 16–18, pp 439–450

  6. Agrawal S, Haritsa JR (2005) A framework for high-accuracy privacy-preserving mining. In: Proceedings of the 21th IEEE international conference on data engineering (ICDE 2005), Tokyo, Japan, pp 193–204

  7. Apte C, Liu B, Pednault E and Smyth P (2002). Business applications of data mining. Commun ACM 45(8): 49–53

    Article  Google Scholar 

  8. Atallah M, Bertino E, Elmagarmid AK, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: Proceedings of the IEEE KDEE, pp 45–52

  9. Bishop M, Bhumiratana B, Crawford R, Levitt K (2004) How to sanitize data. In: Proceedings of the 13th IEEE international workshops on enabling technologies: infrastructure for collaborative enterprises (WETICE’04), Modena, Italy, June 14–16, pp 217–222

  10. Bloom B (1970). Space time tradeoffs in hash coding with allowable errors. Commun ACM 7(13): 422–426

    Article  Google Scholar 

  11. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th international information hiding workshop, pp 369–383

  12. Dibbeern J, Heinzl A (2002) Outsourcing information systems in small and medium sized enterprises: a test of a multi-theoretical casaul model. In: Dibbeern J (ed) Information systems outsourcing: enduring themes, emergent patterns, and future directions. Springer, New York

  13. Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Proceedings of IEEE ICDM’02 workshop on privacy, security, and data mining, vol 14, pp 1–8

  14. Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database system, pp 211–222

  15. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD KDD 2002, pp 217–228

  16. Hacigumus H, Iyer B, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the ACM SIGMOD international conference on management of database, pp 216–227

  17. Hacigumus H, Iyer B, Mehrotra S (2002) Providing database as a service. In: Proceedings of the international conference on data engineering, pp 29–40

  18. Hacigumus H, Iyer B, Mehrotra S (2004) Efficient execution of aggregation queries over encrypted relational databases. In: Proceedings of international conference on database systems for advanced applications, pp 125–136

  19. Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, MA, USA, June 14–16, pp 37–48

  20. Iyer B, Mehrotra S, Mykletun E, Tsudik G, Wu Y (2004) A framework for efficient storage security in RDBMS. In: Proceedings of international conference on EDBT, pp 147–164

  21. Kantarcıǒlu M, Clifton C (2002) Privacy preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of the ACM SIGMOD workshop on research issues on data mining and knowledge discovery, pp 24–31

  22. Kantarcıǒlu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD KDD 2004, pp 599–604

  23. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE ICDM, pp 99–106

  24. Kargupta H, Datta S, Wang Q and Sivakumar K (2005). Random-data perturbation techniques and privacy-preserving data mining. Knowledge Inf Syst Int J 7(4): 387–414

    Article  Google Scholar 

  25. Lin Q-Y, Chen Y-L, Chen J-S and Chen Y-C (2003). Mining inter-organizational retailing knowledge for an alliance formed by competitive firms. Inf Manage 40(5): 431–442

    Article  Google Scholar 

  26. Lindell Y and Pinkas B (2002). Privacy preserving data mining. J Cryptol 15(3): 177–206

    Article  MATH  MathSciNet  Google Scholar 

  27. Lui SM, Qiu L (2007) Individual privacy and organizational privacy in business analytics. In: Proceedings of the 40th Hawaii international conference on system sciences (HICSS 2007), Hawaii, USA, January 3–6, p 216b

  28. Milne G-R (2000). Privacy and ethical issues in database/interactive marketing and public policy: a research framework and overview of the special issue. J Public Policy Marketing 19: 1–6

    Article  Google Scholar 

  29. Oliveira S, Zaiane O (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE ICDM workshop on privacy, security and data mining, pp 43–54

  30. Oliveira S, Zaiane O (2003) Algorithms for balancing privacy and knowledge discovery in association rule mining. In: Proceedings of the 7th international database engineering and applications symposium, pp 54–63

  31. Oliveira S, Zaiane O (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE ICDM, pp 211–218

  32. Ordones C, Ezquerra N and Santana CA (2006). Constraining and summarizing association rules in medical data. Knowledge Inf Syst Int J 9(3): 259–283

    Google Scholar 

  33. Pinkas B (2002). Cryptographic techniques for privacy preserving data mining. ACM SIGKDD Explor 4(2): 12–19

    Article  Google Scholar 

  34. Qiu L, Li Y, Wu X (2006) An approach to outsourcing data mining tasks while protecting business intelligence and customer privacy. In: Workshops proceedings of the 6th IEEE international conference on data mining (ICDM 2006), Hong Kong, China, December 18–22, pp 551–558

  35. Raś ZW, Gürdal O, Im S, Tzacheva A (2007) Data confidentiality versus chase. In: Proceedings of the joint rough sets symposium (JRS07), Toronto, Canada, May 14–16. Springer LNAI vol 4482, pp 330–337

  36. Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: Proceedings of VLDB’02, pp 682–693

  37. Saygin Y, Verykios VS and Clifton C (2001). Using unknowns to prevent discovery of association rules. Sigmod Rec 30(4): 45–54

    Article  Google Scholar 

  38. Vaidya J and Clifton C (2004). Privacy-preserving data mining: why, how and when. IEEE Security Privacy 2(6): 19–27

    Article  Google Scholar 

  39. Xu S, Zhang J, Han D and Wang J (2006). A singular value decomposition based data distortion strategy for privacy protection. Knowledge Inf Syst Int J 10(3): 383–397

    Article  Google Scholar 

  40. Yao AC-C (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science (FOCS’86), Xi’an, China, pp 162–167

  41. Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proceedings of the 7th ACM-SIGKDD international conference on knowledge discovery and data mining, pp 401–406

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Qiu.

Additional information

This research was supported by the USA National Science Foundation Grants CCR-0310974 and IIS-0546027.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, L., Li, Y. & Wu, X. Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17, 99–120 (2008). https://doi.org/10.1007/s10115-007-0113-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0113-3

Keywords

Navigation