Skip to main content

Semantic Data Matching: Principles and Performance

  • Chapter
  • First Online:
Book cover Data Engineering

Abstract

Automated and real-time management of customer relationships requires robust and intelligent data matching across widespread and diverse data sources. Simple string matching algorithms, such as dynamic programming, can handle typographical errors in the data, but are less able to match records that require contextual and experiential knowledge. Latent Semantic Indexing (LSI) (Berry et al. ; Deerwester et al. is a machine intelligence technique that can match data based upon higher order structure, and is able to handle difficult problems, such as words that have different meanings but the same spelling, are synonymous, or have multiple meanings. Essentially, the technique matches records based upon context, or mathematically quantifying when terms occur in the same record.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. ACM Press, New York.

    Google Scholar 

  • Berry MW, Dumais ST, O’Brien GW (1995) Using Linear Algebra for Intelligent Information Retrieval. Siam Review 37 pp 573–595.

    Article  MathSciNet  MATH  Google Scholar 

  • Deerwester S, Dumai ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41 pp 391–407.

    Article  Google Scholar 

  • Fellbaum C (1998) WordNet. MIT Press, Cambridge, MA.

    MATH  Google Scholar 

  • Gibbons A (1985) Algorithmic Graph Theory. Cambridge University Press, Cambridge, England.

    MATH  Google Scholar 

  • Hwa T, Lassig M (1996) Similarity Detection and Localization. Phys. Rev. Lett. 76 pp 2591–2594.

    Article  Google Scholar 

  • Manning CD, Schutze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

    MATH  Google Scholar 

  • Telecordia (2007) Latent Semantic Indexing. Retrieved from http://lsi.research.telcordia.com.

  • Watts DJ, Strogatz SH (1998) Collective Dynamics of Small World Networks. Nature 393 p 440.

    Article  Google Scholar 

  • Wild F, Stahl C, Stermsek G, Neumann G (2005) Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In Danson, M., ed.: Proceedings of the 9th CAA, Loughborough, Professional Development pp 485–494.

    Google Scholar 

Download references

Acknowledgments

This work was supported by a grant from the Acxiom Corporation. We also thank Ms. Ameera Jaradat for contributions to the small world model.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Deaton, R., Doan, T., Schweiger, T. (2009). Semantic Data Matching: Principles and Performance. In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0176-7_4

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0175-0

  • Online ISBN: 978-1-4419-0176-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics