Programming and Computer Software

, Volume 39, Issue 3, pp 115–123 | Cite as

Automation of data normalization for implementing master data management systems

  • Ya. R. Nedumov
  • D. Yu. Turdakov
  • V. D. Maiorov
  • P. E. Ovchinnikov
Article
  • 348 Downloads

Abstract

Data normalization is a laborious and costly process taking place in master data management soft-ware development in enterprises. We analyze the subtasks of the normalization and propose an approach to automating the most laborious of these subtasks. Also, we describe a software system implementing the proposed approach and automatically learning the expert skills.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Smyth, W., Computing Patterns in Strings, Addison-Wesley, 2003.Google Scholar
  2. 2.
    Chernyak, L., Data integration: syntax and semantics, Otkrytye Systemy, 2009, no. 10.Google Scholar
  3. 3.
    Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., and Widom, J., Swoosh: a generic approach to entity resolution, The VLDB Journal, 2009, vol. 18, no. 1, pp. 255–276.CrossRefGoogle Scholar
  4. 4.
    Brizan, D.G. and Tansel, A.U., A survey of entity resolution and record linkage methodologies, Commun. IIMA, 2006, vol. 6, no, 3, pp. 41–50.Google Scholar
  5. 5.
    Califf, M.E. and Mooney, R.J., Relational learning of pattern-match rules for information extraction, Proc. of the Sixteenth Natl. Conf. on Artificial Intelligence (AAAI-99), Menlo Park, CA, American Association for Artificial Intelligence, 1999, pp. 328–334.Google Scholar
  6. 6.
    Cheung, S.N.S., Economic organization and transaction costs, The New Palgrave: A Dictionary of Economics, Macmillan, 1987, vol. 2, pp. 55–58.Google Scholar
  7. 7.
    Churches, T., Christen, P., Lim, K., and Zhu, J., Preparation of name and address data for record linkage using hidden Markov models, BMC Med. Inf. Decis. Making, 2002, vol. 2, no. 9.Google Scholar
  8. 8.
    Dreibelbis, A., Hechler, E., Milman, I., Oberhofer, M., van Run, P., and Wolfson, D., Enterprise Master Data Management: An SOA Approach to Managing Core Information, IBM, 2008.Google Scholar
  9. 9.
    Elmagarmid, A.K., Ipeirotis, P.G., and Verykios, V.S., Duplicate record detection: a survey, IEEE Trans. Knowl. Data Eng., 2007, vol. 19, no. 1, pp. 1–16.CrossRefGoogle Scholar
  10. 10.
    Jurafsky, D. and Martin, J.H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, 2008.Google Scholar
  11. 11.
    Klaes, M., History of transaction costs, The New Palgrave: Dictionary of Economics, Macmillan, 2008, vol. 8, pp. 363–366.CrossRefGoogle Scholar
  12. 12.
    Köpcke, H. and Rahm, E., Frameworks for entity matching: a comparison, Data Knowl. Eng., 2010, vol. 69, no. 2, pp. 197–210.CrossRefGoogle Scholar
  13. 13.
    Littlestone, N., Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm, Mach. Learn., 1988, vol. 2, no. 4, pp. 285–318.Google Scholar
  14. 14.
    Maluf, D.A., Bell, D.G., and Ashish, N., Lean middleware, Proc. of the 2005 ACM SIGMOD Int. Conf. on Management of Data SIGMOD’05, New York: ACM, 2005, pp. 788–791.CrossRefGoogle Scholar
  15. 15.
    Ouaguenouni, S., Sivaraman, K., and Braun, T., Identity resolution and data quality algorithms for master person index. An Oracle white paper, August 2010.Google Scholar
  16. 16.
    Rahm, E. and Do, H.H., Data cleaning: problems and current approaches, IEEE Data Eng. Bull., 2000, vol. 23, no. 4, pp. 3–13.Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2013

Authors and Affiliations

  • Ya. R. Nedumov
    • 1
  • D. Yu. Turdakov
    • 1
  • V. D. Maiorov
    • 1
  • P. E. Ovchinnikov
    • 2
  1. 1.Institute for System ProgrammingMoscowRussia
  2. 2.Moscow Institute of Physics and TechnologyMoscowRussia

Personalised recommendations