Detecting incorrect product names in online sources for product master data

Karpischek, Stephan; Michahelles, Florian; Fleisch, Elgar

doi:10.1007/s12525-013-0136-4

Detecting incorrect product names in online sources for product master data

General Research
Published: 04 August 2013

Volume 24, pages 151–160, (2014)
Cite this article

Electronic Markets Aims and scope Submit manuscript

Stephan Karpischek¹,
Florian Michahelles¹ &
Elgar Fleisch¹

586 Accesses
4 Citations
7 Altmetric
Explore all metrics

Abstract

The global trade item number (GTIN) is traditionally used to identify trade items and look up corresponding information within industrial supply chains. Recently, consumers have also started using GTINs to access additional product information with mobile barcode scanning applications. Providers of these applications use different sources to provide product names for scanned GTINs. In this paper we analyze data from eight publicly available sources for a set of GTINs scanned by users of a mobile barcode scanning application. Our aim is to measure the correctness of product names in online sources and to quantify the problem of product data quality. We use a combination of string matching and supervised learning to estimate the number of incorrect product names. Our results show that approximately 2 % of all product names are incorrect. The applied method is useful for brand owners to monitor the data quality for their products and enables efficient data integration for application providers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions

A self-verifying clustering approach to unsupervised matching of product titles

Article 13 February 2020

Data Driven Discovery of Attribute Dictionaries

References

Adelmann, R., Langheinrich, M., & Flörkemeier, C. (2006). Toolkit for bar code recognition and resolving on camera phones—Jump starting the internet of things. Proceedings of Workshop Mobile and Embedded Interactive Systems (MEIS06) at Informatik. Dresden, Germany.
Anarkat, D., Horwood, J., Green, C., & Bowden, M. (2012). GS1 trusted source of data pilot report. Retrieved February 21, 2012, from http://www.gs1.org/docs/b2c/GS1_TSD_Pilot_Report.pdf
Ballou, D., Madnick, S., & Wang, R. (2004). Special section: assuring information quality. Journal of Management Information Systems, 20(3), 9–11.
Google Scholar
Batini, C., & Scannapieco, M. (2006). Data quality: Concepts, methodologies and techniques. Springer.
Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1–52.
Article Google Scholar
Bilenko, M., Basu, S., & Sahami, M. (2005). Adaptive product normalization: Using online learning for record linkage in comparison shopping. Fifth IEEE International Conference on Data Mining (ICDM’05), 58–65.
Bishop, C. M. (2009). Pattern recognition and machine learning. Springer.
Brody, A. B., & Gottsman, E. J. (1999). Pocket bargain finder: A handheld device for augmented commerce. HUC’99 Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing (pp. 44–51).
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. ICML’06 Proceedings of the 23rd international conference on Machine learning, (pp. 161–168).
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003a). A comparison of string distance metrics for name-matching tasks. In S. Kambhampati & C. A. Knoblock (Eds.), Proceedings of the IJCAI2003 Workshop on Information Integration on the Web IIWeb03 (pp. 73–78).
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003b). A comparison of string metrics for matching names and records. Proceedings of the workshop on Data Cleaning and Object Consolidation at the International Conference on Knowledge Discovery and Data Mining (KDD) (Vol. 3, pp. 73–78).
Coussins, O., Beston, T., Adnan-Ariffin, S., Griffiths, R., & Rossi, S. (2011). Mobile-savvy shopper report. Retrieved February 21, 2012, from http://www.gs1uk.org/resources/help_support/WhitePapers/GS1_UK_Mobile-Savvy_Shopper_Report_2011.pdf
Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.
Article Google Scholar
English, L. P. (2005). To a High IQ! Defining information quality: More than meets the eye. Retrieved February 21, 2012, from http://iaidq.org/publications/doc2/english-2005-04.shtml
Forman, G., & Scholz, M. (2010). Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter, 12(1), 49.
Article Google Scholar
GS1. (2012). GS1 general specifications, version 12. GS1.
Haug, A., & Arlbjørn, J. S. (2011). Barriers to master data quality. Journal of Enterprise Information Management, 24(3), 288–303.
Article Google Scholar
Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193.
Article Google Scholar
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
Hsu, C.-w., Chang, C.-c., & Lin, C.-j. (2010). A practical guide to support vector classification (pp. 1–16).
Hüner, K. M., Ofner, M., & Otto, B. (2009). Towards a maturity model for corporate data quality management. Proceedings of the 2009 ACM symposium on Applied Computing SAC 09.
Hüner, K. M., Schierning, A., Otto, B., & Österle, H. (2011). Product data quality in supply chains: the case of Beiersdorf. Electronic Markets, 21(2), 141–154.
Article Google Scholar
Joshi, M. V. (2002). On evaluating performance of classifiers for rare classes. IEEE International Conference on Data Mining (pp. 641–644).
Knolmayer, G. F., & Röthlin, M. (2006). Quality of material master data and its effect on the usefulness of distributed ERP systems. Advances in Conceptual Modeling-Theory and Practice, 362–371.
Köpcke, H., Thor, A., & Rahm, E. (2010). Learning-based approaches for matching web data entities. IEEE Internet Computing, 14, 23–31.
Article Google Scholar
Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information Management, 40(2), 133–146.
Article Google Scholar
Legner, C., & Schemm, J. W. (2008). Toward the inter-organizational product information supply chain—evidence from the retail and consumer goods industries. Journal of the Association for Information Systems, 9(4), 119–150.
Google Scholar
Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
Mitchell, T. M. (1997). Machine learning. Mcgraw-Hill International Editions.
Nakatani, K., Chuang, T.-T., & Zhou, D. (2006). Data synchronization technology: standards, business values and implications. Communications of the Association for Information Systems, 17(1), 2–60.
Google Scholar
Ng, A. (2011). Machine learning class. Retrieved 15 December, 2011, from http://ml-class.org
Ohbuchi, E., Hanaizumi, H., & Hock, L. A. (2004). Barcode readers using the camera device in mobile phones. 2004 International Conference on Cyberworlds, 260–265.
Otto, B., Lee, Y. W., & Caballero, I. (2011a). Information and data quality in business networking: a key concept for enterprises in its early stages of development. Electronic Markets, 21(2), 83–97.
Article Google Scholar
Otto, B., Hüner, K. M., & Österle, H. (2011b). Toward a functional reference model for master data quality management. Information Systems and e-Business Management, 1–31.
Reischach, F., Karpischek, S., Adelmann, R., & Michahelles, F. (2010). Evaluation of 1D barcode scanning on mobile phones. Internet of Things 2010 Conference (IoT2010).
Scandit. (2011). New Codecheck.info Android app now powered by Scandit. Retrieved February 21, 2012, from http://www.scandit.com/2011/10/07/new-codecheck-info-android-app-now-powered-by-scandit/
Schemm, J. W., & Legner, C. (2008). The role and emerging landscape of data pools in the retail and consumer goods industries. Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (pp. 320–320).
Schemm, J. W., Legner, C., & Otto, B. (2007). Global data synchronization—Current status and future trends. Institute of Information Management, University of St. Gallen.
Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95.
Article Google Scholar
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.
Google Scholar

Download references

Author information

Authors and Affiliations

ETH Zürich, WEV G 222.2, Weinbergstrasse 56/58, 8092, Zürich, Switzerland
Stephan Karpischek, Florian Michahelles & Elgar Fleisch

Authors

Stephan Karpischek
View author publications
You can also search for this author in PubMed Google Scholar
Florian Michahelles
View author publications
You can also search for this author in PubMed Google Scholar
Elgar Fleisch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Karpischek.

Additional information

Responsible editor: Hans-Dieter Zimmermann

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karpischek, S., Michahelles, F. & Fleisch, E. Detecting incorrect product names in online sources for product master data. Electron Markets 24, 151–160 (2014). https://doi.org/10.1007/s12525-013-0136-4

Download citation

Received: 04 December 2012
Accepted: 20 June 2013
Published: 04 August 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s12525-013-0136-4

Keywords

JEL classification

L15

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting incorrect product names in online sources for product master data

Abstract

Access this article

Similar content being viewed by others

Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions

A self-verifying clustering approach to unsupervised matching of product titles

Data Driven Discovery of Attribute Dictionaries

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

JEL classification

Navigation

Detecting incorrect product names in online sources for product master data

Abstract

Access this article

Similar content being viewed by others

Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions

A self-verifying clustering approach to unsupervised matching of product titles

Data Driven Discovery of Attribute Dictionaries

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL classification

Search

Navigation