Skip to main content

Detecting incorrect product names in online sources for product master data

Abstract

The global trade item number (GTIN) is traditionally used to identify trade items and look up corresponding information within industrial supply chains. Recently, consumers have also started using GTINs to access additional product information with mobile barcode scanning applications. Providers of these applications use different sources to provide product names for scanned GTINs. In this paper we analyze data from eight publicly available sources for a set of GTINs scanned by users of a mobile barcode scanning application. Our aim is to measure the correctness of product names in online sources and to quantify the problem of product data quality. We use a combination of string matching and supervised learning to estimate the number of incorrect product names. Our results show that approximately 2 % of all product names are incorrect. The applied method is useful for brand owners to monitor the data quality for their products and enables efficient data integration for application providers.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Adelmann, R., Langheinrich, M., & Flörkemeier, C. (2006). Toolkit for bar code recognition and resolving on camera phones—Jump starting the internet of things. Proceedings of Workshop Mobile and Embedded Interactive Systems (MEIS06) at Informatik. Dresden, Germany.

  2. Anarkat, D., Horwood, J., Green, C., & Bowden, M. (2012). GS1 trusted source of data pilot report. Retrieved February 21, 2012, from http://www.gs1.org/docs/b2c/GS1_TSD_Pilot_Report.pdf

  3. Ballou, D., Madnick, S., & Wang, R. (2004). Special section: assuring information quality. Journal of Management Information Systems, 20(3), 9–11.

    Google Scholar 

  4. Batini, C., & Scannapieco, M. (2006). Data quality: Concepts, methodologies and techniques. Springer.

  5. Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1–52.

    Article  Google Scholar 

  6. Bilenko, M., Basu, S., & Sahami, M. (2005). Adaptive product normalization: Using online learning for record linkage in comparison shopping. Fifth IEEE International Conference on Data Mining (ICDM’05), 58–65.

  7. Bishop, C. M. (2009). Pattern recognition and machine learning. Springer.

  8. Brody, A. B., & Gottsman, E. J. (1999). Pocket bargain finder: A handheld device for augmented commerce. HUC’99 Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing (pp. 44–51).

  9. Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. ICML’06 Proceedings of the 23rd international conference on Machine learning, (pp. 161–168).

  10. Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003a). A comparison of string distance metrics for name-matching tasks. In S. Kambhampati & C. A. Knoblock (Eds.), Proceedings of the IJCAI2003 Workshop on Information Integration on the Web IIWeb03 (pp. 73–78).

  11. Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003b). A comparison of string metrics for matching names and records. Proceedings of the workshop on Data Cleaning and Object Consolidation at the International Conference on Knowledge Discovery and Data Mining (KDD) (Vol. 3, pp. 73–78).

  12. Coussins, O., Beston, T., Adnan-Ariffin, S., Griffiths, R., & Rossi, S. (2011). Mobile-savvy shopper report. Retrieved February 21, 2012, from http://www.gs1uk.org/resources/help_support/WhitePapers/GS1_UK_Mobile-Savvy_Shopper_Report_2011.pdf

  13. Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.

    Article  Google Scholar 

  14. English, L. P. (2005). To a High IQ! Defining information quality: More than meets the eye. Retrieved February 21, 2012, from http://iaidq.org/publications/doc2/english-2005-04.shtml

  15. Forman, G., & Scholz, M. (2010). Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter, 12(1), 49.

    Article  Google Scholar 

  16. GS1. (2012). GS1 general specifications, version 12. GS1.

  17. Haug, A., & Arlbjørn, J. S. (2011). Barriers to master data quality. Journal of Enterprise Information Management, 24(3), 288–303.

    Article  Google Scholar 

  18. Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193.

    Article  Google Scholar 

  19. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  20. Hsu, C.-w., Chang, C.-c., & Lin, C.-j. (2010). A practical guide to support vector classification (pp. 1–16).

  21. Hüner, K. M., Ofner, M., & Otto, B. (2009). Towards a maturity model for corporate data quality management. Proceedings of the 2009 ACM symposium on Applied Computing SAC 09.

  22. Hüner, K. M., Schierning, A., Otto, B., & Österle, H. (2011). Product data quality in supply chains: the case of Beiersdorf. Electronic Markets, 21(2), 141–154.

    Article  Google Scholar 

  23. Joshi, M. V. (2002). On evaluating performance of classifiers for rare classes. IEEE International Conference on Data Mining (pp. 641–644).

  24. Knolmayer, G. F., & Röthlin, M. (2006). Quality of material master data and its effect on the usefulness of distributed ERP systems. Advances in Conceptual Modeling-Theory and Practice, 362–371.

  25. Köpcke, H., Thor, A., & Rahm, E. (2010). Learning-based approaches for matching web data entities. IEEE Internet Computing, 14, 23–31.

    Article  Google Scholar 

  26. Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information Management, 40(2), 133–146.

    Article  Google Scholar 

  27. Legner, C., & Schemm, J. W. (2008). Toward the inter-organizational product information supply chain—evidence from the retail and consumer goods industries. Journal of the Association for Information Systems, 9(4), 119–150.

    Google Scholar 

  28. Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.

  29. Mitchell, T. M. (1997). Machine learning. Mcgraw-Hill International Editions.

  30. Nakatani, K., Chuang, T.-T., & Zhou, D. (2006). Data synchronization technology: standards, business values and implications. Communications of the Association for Information Systems, 17(1), 2–60.

    Google Scholar 

  31. Ng, A. (2011). Machine learning class. Retrieved 15 December, 2011, from http://ml-class.org

  32. Ohbuchi, E., Hanaizumi, H., & Hock, L. A. (2004). Barcode readers using the camera device in mobile phones. 2004 International Conference on Cyberworlds, 260–265.

  33. Otto, B., Lee, Y. W., & Caballero, I. (2011a). Information and data quality in business networking: a key concept for enterprises in its early stages of development. Electronic Markets, 21(2), 83–97.

    Article  Google Scholar 

  34. Otto, B., Hüner, K. M., & Österle, H. (2011b). Toward a functional reference model for master data quality management. Information Systems and e-Business Management, 1–31.

  35. Reischach, F., Karpischek, S., Adelmann, R., & Michahelles, F. (2010). Evaluation of 1D barcode scanning on mobile phones. Internet of Things 2010 Conference (IoT2010).

  36. Scandit. (2011). New Codecheck.info Android app now powered by Scandit. Retrieved February 21, 2012, from http://www.scandit.com/2011/10/07/new-codecheck-info-android-app-now-powered-by-scandit/

  37. Schemm, J. W., & Legner, C. (2008). The role and emerging landscape of data pools in the retail and consumer goods industries. Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (pp. 320–320).

  38. Schemm, J. W., Legner, C., & Otto, B. (2007). Global data synchronization—Current status and future trends. Institute of Information Management, University of St. Gallen.

  39. Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95.

    Article  Google Scholar 

  40. Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stephan Karpischek.

Additional information

Responsible editor: Hans-Dieter Zimmermann

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Karpischek, S., Michahelles, F. & Fleisch, E. Detecting incorrect product names in online sources for product master data. Electron Markets 24, 151–160 (2014). https://doi.org/10.1007/s12525-013-0136-4

Download citation

Keywords

  • Correctness
  • Data quality
  • GTIN
  • Product master data
  • Product names
  • Quality assessment

JEL classification

  • L15