Skip to main content

Subsumption Prediction for E-Commerce Taxonomies

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13870)


Taxonomy plays a key role in e-commerce, categorising items and facilitating both search and inventory management. Concept subsumption prediction is critical for taxonomy curation, and has been the subject of several studies, but they do not fully utilise the categorical information available in e-commerce settings. In this paper, we study the characteristics of e-commerce taxonomies, and propose a new subsumption prediction method based on the pre-trained language model BERT that is well adapted to the e-commerce setting. The proposed model utilises textual and structural semantics in a taxonomy, as well as the rich and noisy instance (item) information. We show through extensive evaluation on two large-scale e-commerce taxonomies from eBay and AliOpenKG, that our method offers substantial improvement over strong baselines.


  • Subsumption Prediction
  • E-Commerce Taxonomy
  • Pre-trained Language Model
  • BERT

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    Browse this category and the taxonomy around it at

  2. 2.

  3. 3. and

  4. 4.

    Code and data available at

  5. 5.

    The difficulty of such task is illustrated by labels featuring a mixture of conjunction and disjunction, e.g., Suit Jackets & Blazers, which means “Suit Jackets” \(\vee \) “Blazers”, and North & Central America, which means “North America” \(\vee \) “Central America”.

  6. 6.

    For the Chinese AliOpenKG dataset (see Sect. 5.1), the template is “

  7. 7.

  8. 8. and


  1. Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)

    CrossRef  Google Scholar 

  2. Bechhofer, S.: OWL web ontology language reference, W3C recommendation (2004).

  3. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26 (2013)

    Google Scholar 

  4. Centelles, M.: Taxonomies for categorization and organization in web sites. (3) (2005).

  5. Chen, J., He, Y., Jimenez-Ruiz, E., Dong, H., Horrocks, I.: Contextual semantic embeddings for ontology subsumption prediction. arXiv preprint arXiv:2202.09791 (2022)

  6. Chen, J., Hu, P., Jimenez-Ruiz, E., Holter, O.M., Antonyrajah, D., Horrocks, I.: OWL2Vec*: embedding of owl ontologies. Mach. Learn. 110(7), 1813–1845 (2021)

    CrossRef  MathSciNet  MATH  Google Scholar 

  7. Cruse, D.A.: Hyponymy and Its Varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships, pp. 3–21. Springer, Dordrecht (2002).

    CrossRef  Google Scholar 

  8. Dasgupta, S., Boratko, M., Zhang, D., Vilnis, L., Li, X., McCallum, A.: Improving local identifiability in probabilistic box embeddings. Adv. Neural. Inf. Process. Syst. 33, 182–192 (2020)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Dhingra, B., Shallue, C.J., Norouzi, M., Dai, A.M., Dahl, G.E.: Embedding text in hyperbolic spaces. arXiv preprint arXiv:1806.04313 (2018)

  11. Dong, H., Wang, W., Coenen, F.: Rules for inducing hierarchies from social tagging data. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds.) iConference 2018. LNCS, vol. 10766, pp. 345–355. Springer, Cham (2018).

    CrossRef  Google Scholar 

  12. Gumbel, E.J.: Les valeurs extrêmes des distributions statistiques. In: Annales de l’institut Henri Poincaré, vol. 5, pp. 115–158 (1935)

    Google Scholar 

  13. Iqbal, R., Murad, M.A.A., Mustapha, A., Sharef, N.M., et al.: An analysis of ontology engineering methodologies: a literature review. Res. J. Appl. Sci. Eng. Technol. 6(16), 2993–3000 (2013)

    CrossRef  Google Scholar 

  14. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)

    CrossRef  Google Scholar 

  15. Jurgens, D., Pilehvar, M.T.: Semeval-2016 task 14: semantic taxonomy enrichment. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1092–1102 (2016)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  17. Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022)

  18. Lee, S., Park, Y.: The classification and strategic management of services in e-commerce: development of service taxonomy based on customer perception. Expert Syst. Appl. 36(6), 9618–9624 (2009)

    CrossRef  Google Scholar 

  19. Lees, A., Welty, C., Zhao, S., Korycki, J., Mc Carthy, S.: Embedding semantic taxonomies. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1279–1291 (2020)

    Google Scholar 

  20. Li, X., Vilnis, L., Zhang, D., Boratko, M., McCallum, A.: Smoothing the geometry of probabilistic box embeddings. In: International Conference on Learning Representations (2018)

    Google Scholar 

  21. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  22. Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)

  23. Luo, X., et al.: Alicoco: alibaba e-commerce cognitive concept net. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 313–327 (2020)

    Google Scholar 

  24. Mao, Y., et al.: Octet: online catalog taxonomy enrichment with self-supervision. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2247–2257 (2020)

    Google Scholar 

  25. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    CrossRef  Google Scholar 

  26. Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  27. Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  28. Nickel, M., Kiela, D.: Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In: International Conference on Machine Learning, pp. 3779–3788. PMLR (2018)

    Google Scholar 

  29. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)

    CrossRef  Google Scholar 

  30. Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data (TKDD) 15(2), 1–49 (2021)

    CrossRef  Google Scholar 

  31. Schafer, J.B., Konstan, J.A., Riedl, J.: E-commerce recommendation applications. Data Min. Knowl. Disc. 5(1), 115–153 (2001)

    CrossRef  MATH  Google Scholar 

  32. Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152. IEEE (2012)

    Google Scholar 

  33. Smaili, F.Z., Gao, X., Hoehndorf, R.: Onto2vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics 34(13), i52–i60 (2018)

    CrossRef  Google Scholar 

  34. Smaili, F.Z., Gao, X., Hoehndorf, R.: OPA2vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 35(12), 2133–2140 (2019)

    CrossRef  Google Scholar 

  35. Stock, W.G.: Concepts and semantic relations in information science. J. Am. Soc. Inform. Sci. Technol. 61(10), 1951–1969 (2010)

    CrossRef  Google Scholar 

  36. Vendrov, I., Kiros, R., Fidler, S., Urtasun, R.: Order-embeddings of images and language. arXiv preprint arXiv:1511.06361 (2015)

  37. Vilnis, L., Li, X., Murty, S., McCallum, A.: Probabilistic embedding of knowledge graphs with box lattice measures. arXiv preprint arXiv:1805.06627 (2018)

  38. Yang, B., Yih, W.t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)

  39. Zhu, X., et al.: Multi-modal knowledge graph construction and application: a survey. arXiv preprint arXiv:2202.05786 (2022)

Download references


We would like to thank Mingjian Lu and Canran Xu for their work and constructive ideas. This work was supported by eBay and the EPSRC projects OASIS (EP/S032347/1), UK FIRES (EP/S019111/1) and ConCur (EP/V050869/1).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jingchuan Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, J. et al. (2023). Subsumption Prediction for E-Commerce Taxonomies. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33454-2

  • Online ISBN: 978-3-031-33455-9

  • eBook Packages: Computer ScienceComputer Science (R0)