A Comparative Study of Machine Learning Techniques for Automatic Product Categorisation

  • Chanawee Chavaltada
  • Kitsuchart PasupaEmail author
  • David R. Hardoon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10261)


The revolution of the digital age has resulted in e-commerce where consumers’ shopping is facilitated and flexible such as able to enquire about product availability and get instant response as well as able to search flexibly for products by using specific keywords, hence having an easy and precise search capability along with proper product categorisation through keywords that allow better overall shopping experience. This paper compared the performances of different machine learning techniques on product categorisation in our proposed framework. We measured the performance of each algorithm by an Area Under Receiver Operating Characteristic Curve (AUROC). Furthermore, we also applied Analysis of Variance (ANOVA) to our results to find out whether the differences were significant or not. Naïve Bayes was found to be the most effective algorithm in this investigation.


Product classification Product categorisation Machine learning 


  1. 1.
    Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, V., Zykov, V., Klein, M., Schulten, E., Fensel, D.: GoldenBullet: automated classification of product data in e-commerce. In: Proceedings of the 5th International Conference on Business Information Systems (BIS 2002) (2002)Google Scholar
  2. 2.
    Simon, P.: Too Big to Ignore: The Business Case for Big Data. Wiley, Hoboken (2013)Google Scholar
  3. 3.
    Shankar, S., Lin, I.: Applying machine learning to product categorization. Technical report, Stanford University (2011)Google Scholar
  4. 4.
    Kozareva, Z.: Everyone likes shopping! multi-class product categorization for e-commerce. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1329–1333 (2015)Google Scholar
  5. 5.
    Zhang, H., Li, D.: Naïve bayes text classifier. In: Proceedings of the 2007 IEEE International Conference on Granular Computing (GRC 2007), p. 708 (2007)Google Scholar
  6. 6.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)zbMATHGoogle Scholar
  7. 7.
    Wermter, S.: Neural network agents for learning semantic text classification. Inf. Retr. 3(2), 87–103 (2000)CrossRefGoogle Scholar
  8. 8.
    Wang, Z., Qian, X.: Text categorization based on LDA and SVM. In: 2008 International Conference on Computer Science and Software Engineering, vol. 1, pp. 674–677 (2008)Google Scholar
  9. 9.
    Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)CrossRefGoogle Scholar
  10. 10.
    Bishop, C.: Pattern Recognition and Machine Learning, vol. 128, 1st edn. Springer, New York (2006). pp. 1–58, ISSN 1613-9011zbMATHGoogle Scholar
  11. 11.
    Jurafsky, D., Martin, J.H.: Speech and language processing. Int. Ed. 710, 117–119 (2000)Google Scholar
  12. 12.
    Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). doi: 10.1007/BFb0026666 CrossRefGoogle Scholar
  13. 13.
    Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefzbMATHGoogle Scholar
  14. 14.
    Yuth, K.: Principle and using logistic regression analysis for research. RMUTSV Res. J. 4(1), 1–12 (2012)Google Scholar
  15. 15.
    Ling, X.C., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), vol. 3, pp. 519–524 (2003)Google Scholar
  16. 16.
    Viaene, S., Derrig, R.A., Baesens, B., Dedene, G.: A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69(3), 373–421 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Chanawee Chavaltada
    • 1
  • Kitsuchart Pasupa
    • 1
    Email author
  • David R. Hardoon
    • 2
  1. 1.Faculty of Information TechnologyKing Mongkut’s Institute of Technology LadkrabangBangkokThailand
  2. 2.PriceTrolley Pte. Ltd.SingaporeSingapore

Personalised recommendations