Skip to main content
Log in

Automation of data normalization for implementing master data management systems

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Data normalization is a laborious and costly process taking place in master data management soft-ware development in enterprises. We analyze the subtasks of the normalization and propose an approach to automating the most laborious of these subtasks. Also, we describe a software system implementing the proposed approach and automatically learning the expert skills.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Smyth, W., Computing Patterns in Strings, Addison-Wesley, 2003.

    Google Scholar 

  2. Chernyak, L., Data integration: syntax and semantics, Otkrytye Systemy, 2009, no. 10.

    Google Scholar 

  3. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., and Widom, J., Swoosh: a generic approach to entity resolution, The VLDB Journal, 2009, vol. 18, no. 1, pp. 255–276.

    Article  Google Scholar 

  4. Brizan, D.G. and Tansel, A.U., A survey of entity resolution and record linkage methodologies, Commun. IIMA, 2006, vol. 6, no, 3, pp. 41–50.

    Google Scholar 

  5. Califf, M.E. and Mooney, R.J., Relational learning of pattern-match rules for information extraction, Proc. of the Sixteenth Natl. Conf. on Artificial Intelligence (AAAI-99), Menlo Park, CA, American Association for Artificial Intelligence, 1999, pp. 328–334.

    Google Scholar 

  6. Cheung, S.N.S., Economic organization and transaction costs, The New Palgrave: A Dictionary of Economics, Macmillan, 1987, vol. 2, pp. 55–58.

    Google Scholar 

  7. Churches, T., Christen, P., Lim, K., and Zhu, J., Preparation of name and address data for record linkage using hidden Markov models, BMC Med. Inf. Decis. Making, 2002, vol. 2, no. 9.

    Google Scholar 

  8. Dreibelbis, A., Hechler, E., Milman, I., Oberhofer, M., van Run, P., and Wolfson, D., Enterprise Master Data Management: An SOA Approach to Managing Core Information, IBM, 2008.

    Google Scholar 

  9. Elmagarmid, A.K., Ipeirotis, P.G., and Verykios, V.S., Duplicate record detection: a survey, IEEE Trans. Knowl. Data Eng., 2007, vol. 19, no. 1, pp. 1–16.

    Article  Google Scholar 

  10. Jurafsky, D. and Martin, J.H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, 2008.

    Google Scholar 

  11. Klaes, M., History of transaction costs, The New Palgrave: Dictionary of Economics, Macmillan, 2008, vol. 8, pp. 363–366.

    Article  Google Scholar 

  12. Köpcke, H. and Rahm, E., Frameworks for entity matching: a comparison, Data Knowl. Eng., 2010, vol. 69, no. 2, pp. 197–210.

    Article  Google Scholar 

  13. Littlestone, N., Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm, Mach. Learn., 1988, vol. 2, no. 4, pp. 285–318.

    Google Scholar 

  14. Maluf, D.A., Bell, D.G., and Ashish, N., Lean middleware, Proc. of the 2005 ACM SIGMOD Int. Conf. on Management of Data SIGMOD’05, New York: ACM, 2005, pp. 788–791.

    Chapter  Google Scholar 

  15. Ouaguenouni, S., Sivaraman, K., and Braun, T., Identity resolution and data quality algorithms for master person index. An Oracle white paper, August 2010.

    Google Scholar 

  16. Rahm, E. and Do, H.H., Data cleaning: problems and current approaches, IEEE Data Eng. Bull., 2000, vol. 23, no. 4, pp. 3–13.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Yu. Turdakov.

Additional information

Original Russian Text © Ya.R. Nedumov, D.Yu. Turdakov, V.D. Maiorov, P.E. Ovchinnikov, 2013, published in Programmirovanie, 2013, Vol. 39, No. 3.

This is a joint study of 1C company and MIPT Innovation Lab within the project approved by Decree no. 218 (April 9, 2010) of the Russian Government.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nedumov, Y.R., Turdakov, D.Y., Maiorov, V.D. et al. Automation of data normalization for implementing master data management systems. Program Comput Soft 39, 115–123 (2013). https://doi.org/10.1134/S0361768813030055

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768813030055

Keywords

Navigation