Abstract
The global trade item number (GTIN) is traditionally used to identify trade items and look up corresponding information within industrial supply chains. Recently, consumers have also started using GTINs to access additional product information with mobile barcode scanning applications. Providers of these applications use different sources to provide product names for scanned GTINs. In this paper we analyze data from eight publicly available sources for a set of GTINs scanned by users of a mobile barcode scanning application. Our aim is to measure the correctness of product names in online sources and to quantify the problem of product data quality. We use a combination of string matching and supervised learning to estimate the number of incorrect product names. Our results show that approximately 2 % of all product names are incorrect. The applied method is useful for brand owners to monitor the data quality for their products and enables efficient data integration for application providers.
Similar content being viewed by others
References
Adelmann, R., Langheinrich, M., & Flörkemeier, C. (2006). Toolkit for bar code recognition and resolving on camera phones—Jump starting the internet of things. Proceedings of Workshop Mobile and Embedded Interactive Systems (MEIS06) at Informatik. Dresden, Germany.
Anarkat, D., Horwood, J., Green, C., & Bowden, M. (2012). GS1 trusted source of data pilot report. Retrieved February 21, 2012, from http://www.gs1.org/docs/b2c/GS1_TSD_Pilot_Report.pdf
Ballou, D., Madnick, S., & Wang, R. (2004). Special section: assuring information quality. Journal of Management Information Systems, 20(3), 9–11.
Batini, C., & Scannapieco, M. (2006). Data quality: Concepts, methodologies and techniques. Springer.
Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1–52.
Bilenko, M., Basu, S., & Sahami, M. (2005). Adaptive product normalization: Using online learning for record linkage in comparison shopping. Fifth IEEE International Conference on Data Mining (ICDM’05), 58–65.
Bishop, C. M. (2009). Pattern recognition and machine learning. Springer.
Brody, A. B., & Gottsman, E. J. (1999). Pocket bargain finder: A handheld device for augmented commerce. HUC’99 Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing (pp. 44–51).
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. ICML’06 Proceedings of the 23rd international conference on Machine learning, (pp. 161–168).
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003a). A comparison of string distance metrics for name-matching tasks. In S. Kambhampati & C. A. Knoblock (Eds.), Proceedings of the IJCAI2003 Workshop on Information Integration on the Web IIWeb03 (pp. 73–78).
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003b). A comparison of string metrics for matching names and records. Proceedings of the workshop on Data Cleaning and Object Consolidation at the International Conference on Knowledge Discovery and Data Mining (KDD) (Vol. 3, pp. 73–78).
Coussins, O., Beston, T., Adnan-Ariffin, S., Griffiths, R., & Rossi, S. (2011). Mobile-savvy shopper report. Retrieved February 21, 2012, from http://www.gs1uk.org/resources/help_support/WhitePapers/GS1_UK_Mobile-Savvy_Shopper_Report_2011.pdf
Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.
English, L. P. (2005). To a High IQ! Defining information quality: More than meets the eye. Retrieved February 21, 2012, from http://iaidq.org/publications/doc2/english-2005-04.shtml
Forman, G., & Scholz, M. (2010). Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter, 12(1), 49.
GS1. (2012). GS1 general specifications, version 12. GS1.
Haug, A., & Arlbjørn, J. S. (2011). Barriers to master data quality. Journal of Enterprise Information Management, 24(3), 288–303.
Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Hsu, C.-w., Chang, C.-c., & Lin, C.-j. (2010). A practical guide to support vector classification (pp. 1–16).
Hüner, K. M., Ofner, M., & Otto, B. (2009). Towards a maturity model for corporate data quality management. Proceedings of the 2009 ACM symposium on Applied Computing SAC 09.
Hüner, K. M., Schierning, A., Otto, B., & Österle, H. (2011). Product data quality in supply chains: the case of Beiersdorf. Electronic Markets, 21(2), 141–154.
Joshi, M. V. (2002). On evaluating performance of classifiers for rare classes. IEEE International Conference on Data Mining (pp. 641–644).
Knolmayer, G. F., & Röthlin, M. (2006). Quality of material master data and its effect on the usefulness of distributed ERP systems. Advances in Conceptual Modeling-Theory and Practice, 362–371.
Köpcke, H., Thor, A., & Rahm, E. (2010). Learning-based approaches for matching web data entities. IEEE Internet Computing, 14, 23–31.
Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information Management, 40(2), 133–146.
Legner, C., & Schemm, J. W. (2008). Toward the inter-organizational product information supply chain—evidence from the retail and consumer goods industries. Journal of the Association for Information Systems, 9(4), 119–150.
Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
Mitchell, T. M. (1997). Machine learning. Mcgraw-Hill International Editions.
Nakatani, K., Chuang, T.-T., & Zhou, D. (2006). Data synchronization technology: standards, business values and implications. Communications of the Association for Information Systems, 17(1), 2–60.
Ng, A. (2011). Machine learning class. Retrieved 15 December, 2011, from http://ml-class.org
Ohbuchi, E., Hanaizumi, H., & Hock, L. A. (2004). Barcode readers using the camera device in mobile phones. 2004 International Conference on Cyberworlds, 260–265.
Otto, B., Lee, Y. W., & Caballero, I. (2011a). Information and data quality in business networking: a key concept for enterprises in its early stages of development. Electronic Markets, 21(2), 83–97.
Otto, B., Hüner, K. M., & Österle, H. (2011b). Toward a functional reference model for master data quality management. Information Systems and e-Business Management, 1–31.
Reischach, F., Karpischek, S., Adelmann, R., & Michahelles, F. (2010). Evaluation of 1D barcode scanning on mobile phones. Internet of Things 2010 Conference (IoT2010).
Scandit. (2011). New Codecheck.info Android app now powered by Scandit. Retrieved February 21, 2012, from http://www.scandit.com/2011/10/07/new-codecheck-info-android-app-now-powered-by-scandit/
Schemm, J. W., & Legner, C. (2008). The role and emerging landscape of data pools in the retail and consumer goods industries. Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (pp. 320–320).
Schemm, J. W., Legner, C., & Otto, B. (2007). Global data synchronization—Current status and future trends. Institute of Information Management, University of St. Gallen.
Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95.
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Hans-Dieter Zimmermann
Rights and permissions
About this article
Cite this article
Karpischek, S., Michahelles, F. & Fleisch, E. Detecting incorrect product names in online sources for product master data. Electron Markets 24, 151–160 (2014). https://doi.org/10.1007/s12525-013-0136-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12525-013-0136-4