Skip to main content

Transitive Closure of Data Records: Application and Computation

  • Chapter
  • First Online:
Book cover Data Engineering

This chapter considers a record- grouping problem, which is called the transitive closure problem. The problem arises from the area of efficient information processing aiming at improving data quality and information quality. To provide a context in which the problem could be better understood, we consider the importance of data quality and information quality and discuss samples of related work in this introduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A relation that is reflexive, symmetric, and transitive is said to be an equivalence relation (Hopcraft and Ullman 2001).

  2. 2.

    R is a relation. The transitive closure of R, denoted R+, is defined by 1) if (a, b) is in R, then (a, b) is in R+, 2) if (a, b) is in R+and (b, c) is in R, then (a, c) is in R+, and 3) Nothing is in R+unless it so follows from 1) and 2) [10].

References

  • Agostino D (2004) Getting Clean. CIOINSIGHT.

    Google Scholar 

  • Ballou D (1999) Enhancing data quality in Data Warehousing Environment. Comm. ACM (42:1), pp. 73-78.

    Article  Google Scholar 

  • Ballou D, Wang H, Pazer G (1998) Modeling Information Manufacturing Systems to Determining Information Product Quality. Management Science (44:4), pp. 462-484.

    Article  MATH  Google Scholar 

  • Baxter R, Christen P, Churches T (2003) A Comparison of Fast Blocking Methods for Record Linkage. ACM SIGKDD '03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, pp 25-27.

    Google Scholar 

  • Bheemavaram R (2006) Parallel and Distributed Grouping Algorithms for Finding Related Records of Huge Data Sets on Cluster Grid. MS thesis, University of Arkansas.

    Google Scholar 

  • Cormen T, Leiserson C, Rivest R, Stein C (2001) Introduction to Algorithms Second Edition. McGraw-Hill Higher Education.

    Google Scholar 

  • Delone W, Mclean E (1992) Information Systems Success: The Quest for the Independent Variable. Information Systems Research (3:1), pp. 60-95.

    Article  Google Scholar 

  • Depompa B (1996) Scrub Data Clean. InformationWeek (610), pp. 88-92.

    Google Scholar 

  • Faden M (2000) Data Cleansing Helps E-Business Run More Efficiently. InformationWeek (781).

    Google Scholar 

  • Goiser K, Christen P (2006) Towards Automated Record Linkage. Proceedings of the 5th Australasian Data Mining Conference, pp. 23-31.

    Google Scholar 

  • Hall P, Dowling G (1980) Approximate String Matching. ACM Computing Surveys, 13(4), pp. 381-402.

    Article  MathSciNet  Google Scholar 

  • Hernandez M, Stolfo, S (1995) The merge/purge problem for large databases. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of data, pp. 127-138.

    Google Scholar 

  • Hernandez M, Stolfo S (1998) Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Journal of Data Mining and Knowledge Discovery, 1(2).

    Google Scholar 

  • Hopcroft J, Ullman, J (2001) Introduction to Automata Theory, Languages, and Computation. Addsion-Wesley Publishing Company.

    Google Scholar 

  • Jaro M, (1989) Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tempa, Florida. Journal of the American Statistical Society, 84(406), pp. 414-420.

    Google Scholar 

  • Jokinen P, Tarhio .J, Ukkonen E (1996) A comparison of approximate string matching algorithms. Software Practice and Experience, 26(12): pp. 1439-1458.

    Article  Google Scholar 

  • Li W (2007) A Parallel and Distributed Approach For Finding Transitive Closure of Data Records: A proposal. Submitted manuscript.

    Google Scholar 

  • Li W (2008) Private Communication With Project Sponsor.

    Google Scholar 

  • Li W, Schweiger T (2007) Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Volume II, 905-909.

    Google Scholar 

  • Li W, Hayes D, Zhang J, Bheemavaram R, Portor C, Schweiger T (2007) Parallel and Distributed Grouping Algorithms for Finding Related Records of Huge Data Sets on Cluster Grids. Proceedings of the Acxiom Laboratory for Applied Research (ALAR) 2007 conference on Applied Research in Information Technology.

    Google Scholar 

  • Li W, Zhang J, Bheemavaram R (2006) Efficient Algorithms for Grouping Data to Improve Data Quality. Proceedings of the 2006 International Conference on Information and Knowledge Engineering, pp. 149-154.

    Google Scholar 

  • McCallum A, Nigam K, Ungar L (2000) Efficient clustering of High-Dimensional data Sets with Application to Reference Matching. Proceedings of the 6th ACM SIGKDD int. Conf. on KDD, pp. 169-178.

    Google Scholar 

  • Navarro G, Baeza-Yates R, Sutinen E, Tarhio J (2001) Indexing mechods for approximate string matching. IEEE Data Engineering Bulletin 24(4): pp. 19-27.

    Google Scholar 

  • Redman T (1996) Data Quality for the Information Age. Artech House, Norwood, MA.

    Google Scholar 

  • Redman T (1998) The Impact of poor data quality on the typical enterprise. Comm. ACM (41:2), pp. 79-82.

    Article  Google Scholar 

  • Snir M, Otto S, Huss S, Walker D, Dongarra J (1995) MPI : The Complete Reference. MIT Press.

    Google Scholar 

  • Zhang J, Bheemavaram R, Li W (2006) Transitive Closure of Data Records: Application and Computation. Proceedings of the Acxiom Laboratory for Applied Research (ALAR) 2006 conference on Applied Research in Information Technology, pp. 71-81.

    Google Scholar 

Download references

Acknowledgments

This research was supported in part by Acxiom Corporation through the Acxiom Laboratory for Applied Research.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Li, W.N., Bheemavaram, R., Zhang, X. (2009). Transitive Closure of Data Records: Application and Computation. In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0176-7_3

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0175-0

  • Online ISBN: 978-1-4419-0176-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics