Skip to main content

Knowledge Management for Model Driven Data Cleaning of Very Large Database

  • Conference paper

Part of the book series: Studies in Computational Intelligence ((SCI,volume 443))

Abstract

From a knowledge management perspective, we explore data cleaning of very large databases with focus on semantic rich data and linked data. We identify four aspects of complexity which, if they were not explicitly addressed and fully managed will hinder both the recognizing and attaining of the best result: (a) the inconsistency of solution knowledge due to their partial applicability among multiple concerns; (b) the side effect which is introduced during the introduction of solution knowledge for pursuing a precision relating to the existence of multiple semantics; (c) unconscious ignorance of implicit weights of some parameters for value computation; (d) a holism based reasoning which is irreplaceable by simplification for some situations. After analyzing the state of the art, we propose an ongoing Model Driven Engineering (MDE) based knowledge management platform for identifying, refining, organizing and evaluating related variants and solutions with mitigated complexity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  2. Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23 (2000)

    Google Scholar 

  3. Davidson, L., Hu, G.: Analysis of ISSP Environment II Survey Data Using Variable Clustering. In: SNPD (Selected Papers), pp. 1–13 (2011)

    Google Scholar 

  4. Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)

    Article  MATH  Google Scholar 

  5. Duan, Y., Cruz, C., Nicolle, C.: Architectural Reconstruction of 3D Building Objects through Semantic Knowledge Management. In: SNPD, pp. 261–266 (2010)

    Google Scholar 

  6. Karmacharya, A., Cruz, C., Boochs, F., Marzani, F.: Integration of Spatial Processing and Knowledge Processing Through the Semantic Web Stack. In: GeoS, pp. 200–216 (2011)

    Google Scholar 

  7. Lee, M.L., Ling, T.W., Low, W.L.: IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the Sixth ACM SIGKDD, KDD 2000, pp. 290–294. ACM, New York (2000)

    Google Scholar 

  8. Liao, S.H.: Knowledge management technologies and applications - literature review from 1995 to 2002. Expert. Syst. Appl. 25(2), 155–164 (2003)

    Article  Google Scholar 

  9. Strong, O., Chiang, C.C., Kim, H.K., Kang, B., Lee, R.Y.: Layering MDA: Applying Transparent Layers of Knowledge to Platform Independent Models. In: SNPD, pp. 191–199 (2009)

    Google Scholar 

  10. Marbs, A., Hmida, H., Hung, T., Karmachaiya, A., Cruz, C., Habed, A., Nicolle, C., Voisin, Y.: Integration of knowledge to support automatic object reconstruction from images and 3D data. In: Systems, Signals and Devices (SSD), pp. 1–13 (2011)

    Google Scholar 

  11. Bradji, L., Boufaïda, M.: A Rule Management System for Knowledge Based Data Cleaning. Intelligent Information Management 3(6), 230–239 (2011)

    Article  Google Scholar 

  12. Duan, Y., Cruz, C., Nicolle, C.: Managing Semantics Knowledge for 3D Architectural Reconstruction of Building Objects. In: SERA, pp. 121–128 (2010)

    Google Scholar 

  13. Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)

    Article  MATH  Google Scholar 

  14. Yan, H., Diao, X.C.: The Design and Implementation of Data Cleaning Knowledge Modeling. In: Proceedings of KAM, pp. 177–179. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  15. Duan, Y.: Semantics Computation: Towards Identifying Answers from Problem Expressions. In: SSNE 2011, pp. 19–24 (2011)

    Google Scholar 

  16. Duan, Y.: Semantics Computation:A Problem Solving Perspective. IJIMT 2(6), 490–499 (2011)

    Google Scholar 

  17. Duan, Y., Cruz, C.: Formalizing Semantic of Natural Language through Conceptualization from Existence. IJIMT 2(1), 37–42 (2011)

    Google Scholar 

  18. Duan, Y.: A Dualism Based Semantics Formalization Mechanism for Model Driven Engineering. IJSSCI 1(4), 90–110 (2009)

    Google Scholar 

  19. Kedad, Z., Métais, E.: Ontology-Based Data Cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Apiletti, D., Bruno, G., Ficarra, E., Baralis, E.: Data Cleaning and Semantic Improvement in Biological Databases. J. Integrative Bioinformatics 3(2) (2006)

    Google Scholar 

  21. Brüggemann, S.: Rule Mining for Automatic Ontology Based Data Cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 522–527. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  22. Alonso-Jimenez, J.A., Borrego-Diaz, J., Chavez-Gonzalez, A.M., Martin-Mateos, F.J.: Foundational Challenges in Automated Semantic Web Data and Ontology Cleaning. IEEE Intelligent Systems 21(1), 42–52 (2006)

    Article  Google Scholar 

  23. Kim, H.K., Lee, R.Y.: MS2Web: Applying MDA and SOA to Web Services. In: Proceedings of SNPD 2008, pp. 163–180 (2008)

    Google Scholar 

  24. Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: Proceedings of VLDB 2004, pp. 588–599. VLDB Endowment (2004)

    Google Scholar 

  25. Kim, H., Zhang, Y., Oussena, S., Clark, T.: A case study on model driven data integration for data centric software development. In: Proceedings of the ACM DSMM 2009, pp. 1–6. ACM, New York (2009)

    Chapter  Google Scholar 

  26. Jiang, N., Chen, Z.: Model-driven data cleaning for signal processing system in sensor networks. In: Proceedings of Signal Processing Systems (ICSPS). IEEE Computer Society (2010)

    Google Scholar 

  27. Carmè, A., Mazón, J.-N., Rizzi, S.: A Model-Driven Heuristic Approach for Detecting Multidimensional Facts in Relational Data Sources. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 13–24. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  28. Duan, Y., Cheung, S.C., Fu, X., Gu, Y.: A Metamodel Based Model Transformation Approach. In: SERA, pp. 184–191 (2005)

    Google Scholar 

  29. Winkler, W.E., Winkler, W.E.: Using the em algorithm for weight computation in the fellegi-sunter model of record linkage. In: Proceedings of Section on Survey Research Methods, American Statistical Association, pp. 667–671 (2000)

    Google Scholar 

  30. Yi, L., Liu, B.: Web page cleaning for web mining through feature weighting. In: Proceedings of the IJCAI, pp. 43–48. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  31. Delen, D., Al-Hawamdeh, S.: A holistic framework for knowledge discovery and management. Commun. ACM 52(6), 141–145 (2009)

    Article  Google Scholar 

  32. Duan, Y.: Value Modeling and Calculation for Everything as a Service (XaaS) based on Reuse. In: Proceedings of SNPD 2012. IEEE Computer Society (2012)

    Google Scholar 

  33. Jin, H., Huang, L., Yuan, P.: K-Radius Subgraph Comparison for RDF Data Cleansing. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 309–320. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  34. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  35. Duan, Y., Cruz, C., Nicolle, C.: Identification Objective True/False from Subjective Yes/No Semantic based on OWA and CWA. In: ICECT, pp. 689–693. IEEE Computer Society (2012)

    Google Scholar 

  36. Duan, Y., Cruz, C.: Attaining and Applying Consistency from Semantic Evolved from Conceptualization. In: ICECT, pp. 699–704. IEEE Computer Society (2012)

    Google Scholar 

  37. Christen, P., Goiser, K.: Quality and Complexity Measures for Data Linkage and Deduplication. In: Quality Measures in Data Mining, pp. 127–151 (2007)

    Google Scholar 

  38. Beskales, G.: Modeling and Querying Uncertainty in Data Cleaning. PhD thesis, University of Waterloo (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yucong Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Duan, Y., Lee, R. (2013). Knowledge Management for Model Driven Data Cleaning of Very Large Database. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2012. Studies in Computational Intelligence, vol 443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32172-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32172-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32171-9

  • Online ISBN: 978-3-642-32172-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics