This chapter considers a record- grouping problem, which is called the transitive closure problem. The problem arises from the area of efficient information processing aiming at improving data quality and information quality. To provide a context in which the problem could be better understood, we consider the importance of data quality and information quality and discuss samples of related work in this introduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A relation that is reflexive, symmetric, and transitive is said to be an equivalence relation (Hopcraft and Ullman 2001).
- 2.
R is a relation. The transitive closure of R, denoted R+, is defined by 1) if (a, b) is in R, then (a, b) is in R+, 2) if (a, b) is in R+and (b, c) is in R, then (a, c) is in R+, and 3) Nothing is in R+unless it so follows from 1) and 2) [10].
References
Agostino D (2004) Getting Clean. CIOINSIGHT.
Ballou D (1999) Enhancing data quality in Data Warehousing Environment. Comm. ACM (42:1), pp. 73-78.
Ballou D, Wang H, Pazer G (1998) Modeling Information Manufacturing Systems to Determining Information Product Quality. Management Science (44:4), pp. 462-484.
Baxter R, Christen P, Churches T (2003) A Comparison of Fast Blocking Methods for Record Linkage. ACM SIGKDD '03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, pp 25-27.
Bheemavaram R (2006) Parallel and Distributed Grouping Algorithms for Finding Related Records of Huge Data Sets on Cluster Grid. MS thesis, University of Arkansas.
Cormen T, Leiserson C, Rivest R, Stein C (2001) Introduction to Algorithms Second Edition. McGraw-Hill Higher Education.
Delone W, Mclean E (1992) Information Systems Success: The Quest for the Independent Variable. Information Systems Research (3:1), pp. 60-95.
Depompa B (1996) Scrub Data Clean. InformationWeek (610), pp. 88-92.
Faden M (2000) Data Cleansing Helps E-Business Run More Efficiently. InformationWeek (781).
Goiser K, Christen P (2006) Towards Automated Record Linkage. Proceedings of the 5th Australasian Data Mining Conference, pp. 23-31.
Hall P, Dowling G (1980) Approximate String Matching. ACM Computing Surveys, 13(4), pp. 381-402.
Hernandez M, Stolfo, S (1995) The merge/purge problem for large databases. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of data, pp. 127-138.
Hernandez M, Stolfo S (1998) Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Journal of Data Mining and Knowledge Discovery, 1(2).
Hopcroft J, Ullman, J (2001) Introduction to Automata Theory, Languages, and Computation. Addsion-Wesley Publishing Company.
Jaro M, (1989) Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tempa, Florida. Journal of the American Statistical Society, 84(406), pp. 414-420.
Jokinen P, Tarhio .J, Ukkonen E (1996) A comparison of approximate string matching algorithms. Software Practice and Experience, 26(12): pp. 1439-1458.
Li W (2007) A Parallel and Distributed Approach For Finding Transitive Closure of Data Records: A proposal. Submitted manuscript.
Li W (2008) Private Communication With Project Sponsor.
Li W, Schweiger T (2007) Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Volume II, 905-909.
Li W, Hayes D, Zhang J, Bheemavaram R, Portor C, Schweiger T (2007) Parallel and Distributed Grouping Algorithms for Finding Related Records of Huge Data Sets on Cluster Grids. Proceedings of the Acxiom Laboratory for Applied Research (ALAR) 2007 conference on Applied Research in Information Technology.
Li W, Zhang J, Bheemavaram R (2006) Efficient Algorithms for Grouping Data to Improve Data Quality. Proceedings of the 2006 International Conference on Information and Knowledge Engineering, pp. 149-154.
McCallum A, Nigam K, Ungar L (2000) Efficient clustering of High-Dimensional data Sets with Application to Reference Matching. Proceedings of the 6th ACM SIGKDD int. Conf. on KDD, pp. 169-178.
Navarro G, Baeza-Yates R, Sutinen E, Tarhio J (2001) Indexing mechods for approximate string matching. IEEE Data Engineering Bulletin 24(4): pp. 19-27.
Redman T (1996) Data Quality for the Information Age. Artech House, Norwood, MA.
Redman T (1998) The Impact of poor data quality on the typical enterprise. Comm. ACM (41:2), pp. 79-82.
Snir M, Otto S, Huss S, Walker D, Dongarra J (1995) MPI : The Complete Reference. MIT Press.
Zhang J, Bheemavaram R, Li W (2006) Transitive Closure of Data Records: Application and Computation. Proceedings of the Acxiom Laboratory for Applied Research (ALAR) 2006 conference on Applied Research in Information Technology, pp. 71-81.
Acknowledgments
This research was supported in part by Acxiom Corporation through the Acxiom Laboratory for Applied Research.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Li, W.N., Bheemavaram, R., Zhang, X. (2009). Transitive Closure of Data Records: Application and Computation. In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0176-7_3
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0175-0
Online ISBN: 978-1-4419-0176-7
eBook Packages: Computer ScienceComputer Science (R0)