An Approach to Identify n-wMVD for Eliminating Data Redundancy
Data Cleaning is a process for determining whether two or more records defined differently in database, represent the same real world object. Data Cleaning is a vital function in data warehouse preprocessing. It is found that the problem of duplication /redundancy is encountered frequently when large amounts of data collected from different sources is put in the warehouse. Eliminating redundancy in the data warehouse resolves conflicts in making wrong decisions. Data cleaning is also used to solve problem of “wastage of storage space”. One way of eliminating redundancy is by retrieving similar records using tokens formed on prominent attributes. Another approach is to use Conditional Functional Dependencies (CFD’s) to capture the consistency of data by combining semantically related data. Existing work on data cleaning do not deal with the case of multi-valued attributes. This paper deals with nesting based weak multi-valued dependencies (n-wMVD) which can handle multi-valued attributes and redundancy removal. Our contributions are of two fold (i) An approach to convert the given database to wMVD (ii) Implementation of n-wMVD to eliminate redundancy. The applicability of our approach was tested. The results are encouraging and are presented in the paper.
KeywordsConditional Functional Dependencies (CFD) weak Multi-valued Dependencies (wMVD) nesting based weak Multi-valued Dependencies (n-wMVD)
Unable to display preview. Download preview PDF.
- 1.Viswanadham, S., Kumari, V.V.: Eliminating Data Redundancy using Nesting based wMVD. In: 2012 4th International Conference on Electronics Computer Technology, ICECT. IEEE Publications (April 2012)Google Scholar
- 2.Rahm, E., Hong, H.D.: Data Cleaning: Problems and Current Approaches. In: IEEE Techn. Bulletin on Data Engineering, University of Leipzig, Germany (2000)Google Scholar
- 3.Ezeife, C.I., Ohanekwu, T.E.: Use of Smart Tokens in Cleaning Integrated Warehouse Data. International Journal of Data Warehousing & Mining (April-June 2005)Google Scholar
- 4.Fan, W., Kementsietsidis, A.: Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM Transactions on Data Base Systems (TODS) 33(2) (June 2008)Google Scholar
- 6.Fischer, P., Van Gucht, D.: Weak Multivalued Dependencies. In: PoDS Conference. ACM (1984)Google Scholar
- 8.Fagin, R.: Multivalued Dependencies and a New Normal Form for Relational Databases. Trans. ACM Database Syst. (1977)Google Scholar