Advertisement

AUDIO: An Integrity \(\underline{Audi}\)ting Framework of \(\underline{O}\)utlier-Mining-as-a-Service Systems

  • Ruilin Liu
  • Hui (Wendy) Wang
  • Anna Monreale
  • Dino Pedreschi
  • Fosca Giannotti
  • Wenge Guo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7524)

Abstract

Spurred by developments such as cloud computing, there has been considerable recent interest in the data-mining-as-a-service paradigm. Users lacking in expertise or computational resources can outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises issues about result integrity: how can the data owner verify that the mining results returned by the server are correct? In this paper, we present AUDIO, an integrity auditing framework for the specific task of distance-based outlier mining outsourcing. It provides efficient and practical verification approaches to check both completeness and correctness of the mining results. The key idea of our approach is to insert a small amount of artificial tuples into the outsourced data; the artificial tuples will produce artificial outliers and non-outliers that do not exist in the original dataset. The server’s answer is verified by analyzing the presence of artificial outliers/non-outliers, obtaining a probabilistic guarantee of correctness and completeness of the mining result. Our empirical results show the effectiveness and efficiency of our method.

Keywords

Cloud Computing Association Rule Mining Data Owner Mining Result Frequent Itemset Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cloud Security Alliance. Security Guidance for Critical Areas of Focus in Cloud Computing (2009), http://www.cloudsecurityalliance.org/guidance/csaguide.pdf
  2. 2.
    Google Prediction APIs, http://code.google.com/apis/predict/
  3. 3.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: SIGMOD (2001)Google Scholar
  4. 4.
    Angiulli, F., Fassetti, F.: Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. In: TKDD (2009)Google Scholar
  5. 5.
    Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and the hardness of approximation problems. Journal of ACM 45 (1998)Google Scholar
  6. 6.
    Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons (1994)Google Scholar
  7. 7.
    Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuoka, R., Molina, J.: Controlling Data in the Cloud: Outsourcing Computation without Outsourcing Control. In: CCSW (2009)Google Scholar
  8. 8.
    Cuzzocrea, A., Wang, W.: Approximate range-Sum query answering on data cubes with probabilistic guarantees. In: JIIS, vol. 27 (2007)Google Scholar
  9. 9.
    Cuzzocrea, A., Wang, W., Matrangolo, U.: Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 97–107. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Du, J., Wei, W., Gu, X., Yu, T.: Runtest: assuring integrity of dataflow processing in cloud computing infrastructures. In: ASIACCS (2010)Google Scholar
  11. 11.
    Du, W., Jia, J., Mangal, M., Murugesan, M.: Uncheatable grid computing. In: ICDCS (2004)Google Scholar
  12. 12.
    Giannotti, F., Lakshmanan, L.V., Monreale, A., Pedreschi, D., Wang, H.: Privacy-preserving mining of association rules from outsourced transaction databases. In: SPCC (2010)Google Scholar
  13. 13.
    Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof systems. SIAM Journal of Computing 18 (1989)Google Scholar
  14. 14.
    Hacigümüş, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. In: SIGMOD (2002)Google Scholar
  15. 15.
    Jeevanand, E.S., Nair, N.U.: On determining the number of outliers in exponential and pareto samples. On determining the number of outliers in exponential and Pareto samples. Stat. Pap. 39 (1998)Google Scholar
  16. 16.
    Papadimitriou, S., Kitawaga, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE (2002)Google Scholar
  17. 17.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB (1998)Google Scholar
  18. 18.
    Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: An Efficient Approximation Scheme for Data Mining Tasks. In: ICDE (2001)Google Scholar
  19. 19.
    Kreibich, C., Crowcroft, J.: Honeycomb: creating intrusion detection signatures using honeypots. SIGCOMM Computer Communication Review 34 (2004)Google Scholar
  20. 20.
    Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Dynamic authenticated index structures for outsourced databases. In: SIGMOD (2006)Google Scholar
  21. 21.
    Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: PKDD (2006)Google Scholar
  22. 22.
    Molloy, I., Li, N., Li, T.: On the (in)security and (im)practicality of outsourcing precise association rule mining. In: ICDM (2009)Google Scholar
  23. 23.
    Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and integrity in outsourced databases. Trans. Storage 2 (May 2006)Google Scholar
  24. 24.
    Nguyen, H.V., Gopalkrishnan, V., Assent, I.: An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part I. LNCS, vol. 6587, pp. 138–152. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  25. 25.
    Pang, H., Jain, A., Ramamritham, K., Tan, K.-L.: Verifying completeness of relational query results in data publishing. In: SIGMOD (2005)Google Scholar
  26. 26.
    Qiu, L., Li, Y., Wu, X.: Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowledge Information System 17(1) (2008)Google Scholar
  27. 27.
    Ramaswamy, S., Rastogi, R., Shim, K., Aitrc: Efficient algorithms for mining outliers from large data sets. In: SIGMOD (2000)Google Scholar
  28. 28.
    Sion, R.: Query execution assurance for outsourced databases. In: VLDB (2005)Google Scholar
  29. 29.
    Tai, C.-H., Yu, P.S., Chen, M.-S.: k-support anonymity based on pseudo taxonomy for outsourcing of frequent itemset mining. In: SIGKDD (2010)Google Scholar
  30. 30.
    Wong, W.K., Cheung, D.W., Hung, E., Kao, B., Mamoulis, N.: Security in outsourcing of association rule mining. In: VLDB (2007)Google Scholar
  31. 31.
    Wong, W.K., Cheung, D.W., Kao, B., Hung, E., Mamoulis, N.: An audit environment for outsourcing of frequent itemset mining. PVLDB 2 (2009)Google Scholar
  32. 32.
    Xie, M., Wang, H., Yin, J., Meng, X.: Integrity auditing of outsourced data. In: VLDB (2007)Google Scholar
  33. 33.
    Yiu, M.L., Assent, I., Jensen, C.S., Kalnis, P.: Outsourced Similarity Search on Metric Data Assets. TKDE 24 (2012)Google Scholar
  34. 34.
    Yiu, M.L., Ghinita, G., Jensen, C.S., Kalnis, P.: Enabling search services on outsourced private spatial data. VLDB J. 19 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ruilin Liu
    • 1
  • Hui (Wendy) Wang
    • 1
  • Anna Monreale
    • 2
  • Dino Pedreschi
    • 2
  • Fosca Giannotti
    • 3
  • Wenge Guo
    • 4
  1. 1.Stevens Institute of TechnologyUSA
  2. 2.University of PisaPisaItaly
  3. 3.ISTI-CNRPisaItaly
  4. 4.New Jersey Institute of TechnologyUSA

Personalised recommendations