Abstract
From a knowledge management perspective, we explore data cleaning of very large databases with focus on semantic rich data and linked data. We identify four aspects of complexity which, if they were not explicitly addressed and fully managed will hinder both the recognizing and attaining of the best result: (a) the inconsistency of solution knowledge due to their partial applicability among multiple concerns; (b) the side effect which is introduced during the introduction of solution knowledge for pursuing a precision relating to the existence of multiple semantics; (c) unconscious ignorance of implicit weights of some parameters for value computation; (d) a holism based reasoning which is irreplaceable by simplification for some situations. After analyzing the state of the art, we propose an ongoing Model Driven Engineering (MDE) based knowledge management platform for identifying, refining, organizing and evaluating related variants and solutions with mitigated complexity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23 (2000)
Davidson, L., Hu, G.: Analysis of ISSP Environment II Survey Data Using Variable Clustering. In: SNPD (Selected Papers), pp. 1–13 (2011)
Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)
Duan, Y., Cruz, C., Nicolle, C.: Architectural Reconstruction of 3D Building Objects through Semantic Knowledge Management. In: SNPD, pp. 261–266 (2010)
Karmacharya, A., Cruz, C., Boochs, F., Marzani, F.: Integration of Spatial Processing and Knowledge Processing Through the Semantic Web Stack. In: GeoS, pp. 200–216 (2011)
Lee, M.L., Ling, T.W., Low, W.L.: IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the Sixth ACM SIGKDD, KDD 2000, pp. 290–294. ACM, New York (2000)
Liao, S.H.: Knowledge management technologies and applications - literature review from 1995 to 2002. Expert. Syst. Appl. 25(2), 155–164 (2003)
Strong, O., Chiang, C.C., Kim, H.K., Kang, B., Lee, R.Y.: Layering MDA: Applying Transparent Layers of Knowledge to Platform Independent Models. In: SNPD, pp. 191–199 (2009)
Marbs, A., Hmida, H., Hung, T., Karmachaiya, A., Cruz, C., Habed, A., Nicolle, C., Voisin, Y.: Integration of knowledge to support automatic object reconstruction from images and 3D data. In: Systems, Signals and Devices (SSD), pp. 1–13 (2011)
Bradji, L., Boufaïda, M.: A Rule Management System for Knowledge Based Data Cleaning. Intelligent Information Management 3(6), 230–239 (2011)
Duan, Y., Cruz, C., Nicolle, C.: Managing Semantics Knowledge for 3D Architectural Reconstruction of Building Objects. In: SERA, pp. 121–128 (2010)
Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)
Yan, H., Diao, X.C.: The Design and Implementation of Data Cleaning Knowledge Modeling. In: Proceedings of KAM, pp. 177–179. IEEE Computer Society, Washington, DC (2008)
Duan, Y.: Semantics Computation: Towards Identifying Answers from Problem Expressions. In: SSNE 2011, pp. 19–24 (2011)
Duan, Y.: Semantics Computation:A Problem Solving Perspective. IJIMT 2(6), 490–499 (2011)
Duan, Y., Cruz, C.: Formalizing Semantic of Natural Language through Conceptualization from Existence. IJIMT 2(1), 37–42 (2011)
Duan, Y.: A Dualism Based Semantics Formalization Mechanism for Model Driven Engineering. IJSSCI 1(4), 90–110 (2009)
Kedad, Z., Métais, E.: Ontology-Based Data Cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)
Apiletti, D., Bruno, G., Ficarra, E., Baralis, E.: Data Cleaning and Semantic Improvement in Biological Databases. J. Integrative Bioinformatics 3(2) (2006)
Brüggemann, S.: Rule Mining for Automatic Ontology Based Data Cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 522–527. Springer, Heidelberg (2008)
Alonso-Jimenez, J.A., Borrego-Diaz, J., Chavez-Gonzalez, A.M., Martin-Mateos, F.J.: Foundational Challenges in Automated Semantic Web Data and Ontology Cleaning. IEEE Intelligent Systems 21(1), 42–52 (2006)
Kim, H.K., Lee, R.Y.: MS2Web: Applying MDA and SOA to Web Services. In: Proceedings of SNPD 2008, pp. 163–180 (2008)
Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: Proceedings of VLDB 2004, pp. 588–599. VLDB Endowment (2004)
Kim, H., Zhang, Y., Oussena, S., Clark, T.: A case study on model driven data integration for data centric software development. In: Proceedings of the ACM DSMM 2009, pp. 1–6. ACM, New York (2009)
Jiang, N., Chen, Z.: Model-driven data cleaning for signal processing system in sensor networks. In: Proceedings of Signal Processing Systems (ICSPS). IEEE Computer Society (2010)
Carmè, A., Mazón, J.-N., Rizzi, S.: A Model-Driven Heuristic Approach for Detecting Multidimensional Facts in Relational Data Sources. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 13–24. Springer, Heidelberg (2010)
Duan, Y., Cheung, S.C., Fu, X., Gu, Y.: A Metamodel Based Model Transformation Approach. In: SERA, pp. 184–191 (2005)
Winkler, W.E., Winkler, W.E.: Using the em algorithm for weight computation in the fellegi-sunter model of record linkage. In: Proceedings of Section on Survey Research Methods, American Statistical Association, pp. 667–671 (2000)
Yi, L., Liu, B.: Web page cleaning for web mining through feature weighting. In: Proceedings of the IJCAI, pp. 43–48. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Delen, D., Al-Hawamdeh, S.: A holistic framework for knowledge discovery and management. Commun. ACM 52(6), 141–145 (2009)
Duan, Y.: Value Modeling and Calculation for Everything as a Service (XaaS) based on Reuse. In: Proceedings of SNPD 2012. IEEE Computer Society (2012)
Jin, H., Huang, L., Yuan, P.: K-Radius Subgraph Comparison for RDF Data Cleansing. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 309–320. Springer, Heidelberg (2010)
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)
Duan, Y., Cruz, C., Nicolle, C.: Identification Objective True/False from Subjective Yes/No Semantic based on OWA and CWA. In: ICECT, pp. 689–693. IEEE Computer Society (2012)
Duan, Y., Cruz, C.: Attaining and Applying Consistency from Semantic Evolved from Conceptualization. In: ICECT, pp. 699–704. IEEE Computer Society (2012)
Christen, P., Goiser, K.: Quality and Complexity Measures for Data Linkage and Deduplication. In: Quality Measures in Data Mining, pp. 127–151 (2007)
Beskales, G.: Modeling and Querying Uncertainty in Data Cleaning. PhD thesis, University of Waterloo (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duan, Y., Lee, R. (2013). Knowledge Management for Model Driven Data Cleaning of Very Large Database. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2012. Studies in Computational Intelligence, vol 443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32172-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-32172-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32171-9
Online ISBN: 978-3-642-32172-6
eBook Packages: EngineeringEngineering (R0)