Skip to main content

Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12743))

Abstract

In recent years Machine Learning and Artificial Intelligence are reshaping the landscape of e-commerce and retail. Using advanced analytics, behavioral modeling, and inference, representatives of these industries can leverage collected data and increase their market performance. To perform assortment optimization – one of the most fundamentals problems in retail – one has to identify products that are present in the competitors’ portfolios. It is not possible without effective product matching. The paper deals with finding identical products in the offer of different retailers. The task is performed using a text-mining approach, assuming that the data may contain incomplete information. Besides the description of the algorithm, the results for real-world data fetched from the offers of two consumer electronics retailers are being demonstrated.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amshakala, K., Nedunchezhian, R.: Using fuzzy logic for product matching. In: Krishnan, G.S.S., Anitha, R., Lekshmi, R.S., Kumar, M.S., Bonato, A., Graña, M. (eds.) Computational Intelligence, Cyber Security and Computational Models. AISC, vol. 246, pp. 171–179. Springer, New Delhi (2014). https://doi.org/10.1007/978-81-322-1680-3_20

    Chapter  Google Scholar 

  2. Rusdah, D.A., Murfi, H.: XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2(8), 1–10 (2020). https://doi.org/10.1007/s42452-020-3128-y

    Article  Google Scholar 

  3. Bernstein, F., Kök, A.G., Xie, L.: Dynamic assortment customization with limited inventories. Manuf. Serv. Oper. Manag. 17, 538–553 (2015)

    Article  Google Scholar 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016). http://arxiv.org/abs/1607.04606

  5. Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S.: Discrimination of wheat grain varieties using x-ray images. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 39–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39796-2_4

    Chapter  Google Scholar 

  6. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. KDD 2016, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785, https://doi.org/10.1145/2939672.2939785

  7. Damerau, F.J.: A technique for computer detection and correction of spellingerrors. Commun. ACM 7(3), 171–176 (1964). https://doi.org/10.1145/363958.363994

    Article  Google Scholar 

  8. Edelman: 2019 Edelman AI Survey. Whitepaper, Edelman (2019)

    Google Scholar 

  9. Faris, H., Aljarah, I., Mirjalili, S.: Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl. Intell. 45(2), 322–332 (2016). https://doi.org/10.1007/s10489-016-0767-1

    Article  Google Scholar 

  10. Gaspar, P., Carbonell, J., Oliveira, J.: On the parameter optimization of support vector machines for binary classification. J. Integr. Bioinform. 9(3), 201 (2012). https://doi.org/10.2390/biecoll-jib-2012-201

    Article  Google Scholar 

  11. Gomaa, W., Fahmy, A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118

    Article  Google Scholar 

  12. Ismail, M., Ibrahim, M., Sanusi, Z., Cemal Nat, M.: Data mining in electronic commerce: benefits and challenges. Int. J. Commun. Netw. Syst. Sci. 8, 501–509 (2015). https://doi.org/10.4236/ijcns.2015.812045

    Article  Google Scholar 

  13. Ito, S., Fujimaki, R.: Large-scale price optimization via network flow. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 3855–3863. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6301-large-scale-price-optimization-via-network-flow.pdf

  14. Ivchenko, G., Honov, S.: On the jaccard similarity test. J. Math. Sci. 88(6), 789–794 (1998)

    Article  MathSciNet  Google Scholar 

  15. Jolliffe, I.: Principal Component Analysis. Springer Verlag, New York (2002)

    MATH  Google Scholar 

  16. Köpcke, H., Thor, A., Thomas, S., Rahm, E.: Tailoring entity resolution for matching product offers. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 545–550. EDBT 2012, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2247596.2247662, https://doi.org/10.1145/2247596.2247662

  17. Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197 – 210 (2010).https://doi.org/10.1016/j.datak.2009.10.003,http://www.sciencedirect.com/science/article/pii/S0169023X09001451

  18. Liu, L., Anlong Ming, Ma, H., Zhang, X.: A binary-classification-tree based framework for distributed target classification in multimedia sensor networks. In: 2012 Proceedings IEEE INFOCOM, pp. 594–602 (March 2012). https://doi.org/10.1109/INFCOM.2012.6195802

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013)

    Google Scholar 

  20. Ristoski, P., Petrovski, P., Mika, P., Paulheim, H.: A machine learning approach for product matching and categorization: use case: enriching product ads with semantic structured data. Semant. Web 9, 1–22 (2018). https://doi.org/10.3233/SW-180300

    Article  Google Scholar 

  21. Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 298–307 (2015)

    Google Scholar 

  22. Shah, K., Kopru, S., Ruvini, J.D.: Neural network based extreme classification and similarity models for product matching. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 3 (Industry Papers), pp. 8–15. Association for Computational Linguistics, New Orleans - Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-3002, https://www.aclweb.org/anthology/N18-3002

  23. Srinivasa Raghavan, N.R.: Data mining in e-commerce: a survey. Sadhana 30(2), 275–289 (2005). https://doi.org/10.1007/BF02706248

    Article  MathSciNet  Google Scholar 

  24. US Census Bureau: quarterly retail e-commerce sales. News report CB19-170, US Census Bureau,19 November 2019

    Google Scholar 

  25. Vieira, A., Ribeiro, B.: Introduction to deep learning business applications for developers: from Conversational Bots in Customer Service to Medical Image Processing. Apress (2018). https://books.google.pl/books?id=K3ZZDwAAQBAJ

  26. Yu, G., Xia, C., Guo, X.: Research on web data mining and its application in electronic commerce. In: 2009 International Conference on Computational Intelligence and Software Engineering, pp. 1–3 (December 2009). https://doi.org/10.1109/CISE.2009.5363366

  27. Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007). https://doi.org/10.1109/TPAMI.2007.1078

    Article  Google Scholar 

Download references

Acknowledgment

The work was supported by the Faculty of Physics and Applied Computer Science AGH UST statutory tasks within the subsidy of MEiN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Szymon Łukasik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Łukasik, S., Michałowski, A., Kowalski, P.A., Gandomi, A.H. (2021). Text-Based Product Matching with Incomplete and Inconsistent Items Descriptions. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77964-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77963-4

  • Online ISBN: 978-3-030-77964-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics