Advertisement

DCUBE: CUBE on Dirty Databases

  • Guohua Jiang
  • Hongzhi Wang
  • Shouxu Jiang
  • Jianzhong Li
  • Hong Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6184)

Abstract

In the real world databases, dirty data such as inconsistent data, duplicate data affect the effectiveness of applications with database. It brings new challenges to efficiently process OLAP on the database with dirty data. CUBE is an important operator for OLAP. This paper proposes the CUBE operation based on overlapping clustering, and an effective and efficient storing and computing method for CUBE on the database with dirty data. Based on CUBE, this paper proposes efficient algorithms for answering aggregation queries, and the processing methods of other major operators for OLAP on the database with dirty data. Experimental results show the efficiency of the algorithms presented in this paper.

Keywords

dirty data CUBE OLAP 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Xie, J., Yang, J., Chen, J., Wang, H., Yu, P.S.: A sampling-based approach to information recovery. In: ICDE, pp. 476–485. IEEE, Cancún (2008)Google Scholar
  2. 2.
    Qi, Y., Candan, K.S., Sapino, M.L.: Ficsr: Feedback-based nonistency resolution and query processing on misaligned data sources. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD, pp. 151–162. ACM, Beijing (2007)Google Scholar
  3. 3.
    Jeffery, S.R., Garofalakis, M.N., Franklin, M.J.: Adaptive Cleaning for RFID Data Streams. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 163–174. ACM, Seoul (2006)Google Scholar
  4. 4.
    Xiong, H., Pandey, G., Steinbach, M., Kumar, V.: Enhancing Data Analysis with Noise Removal. TKDE 18(2), 304–319 (2006)Google Scholar
  5. 5.
    Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. Journal of Computer and System Sciences 73(4), 610–635 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Fuxman, A., Fazli, E., Miller, R.J.: ConQuer: Efficient Management of Inconsistent Databases. In: Özcan, F. (ed.) SIGMOD, pp. 155–166. ACM, Baltimore (2005)Google Scholar
  7. 7.
    Andritsos, P., Fuxman, P., Miller, R.J.: Clean Answers over Dirty Databases: A Probabilistic Approach. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 30. IEEE, Atlanta (2006)Google Scholar
  8. 8.
    Gal, A., Martinez, M.V., Simari, G.I., Subrahmanian, V.S.: Aggregate Query Answering under Uncertain Schema Mappings. In: ICDE, pp. 940–951. IEEE, Shanghai (2009)Google Scholar
  9. 9.
    Jiang, G., Wang, H., Li, J., Gao, H.: An Aggregation Query Processing Method of Dirty Database Based on Clustering. Journal of Computer Research and Development (suppl. 46), 140–146 (2009)Google Scholar
  10. 10.
    Sismanis, Y., Wang, L., Fuxman, A., Haas, P.J., Reinwald, B.: Resolution-Aware Query Answering for Business Intelligence. In: ICDE, pp. 976–987. IEEE, Shanghai (2009)Google Scholar
  11. 11.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. The Computing Research Repository, abs/cs/0701155 (2007)Google Scholar
  12. 12.
    Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)Google Scholar
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guohua Jiang
    • 1
  • Hongzhi Wang
    • 1
  • Shouxu Jiang
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Institute of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations