Abstract
Taxonomy plays a key role in e-commerce, categorising items and facilitating both search and inventory management. Concept subsumption prediction is critical for taxonomy curation, and has been the subject of several studies, but they do not fully utilise the categorical information available in e-commerce settings. In this paper, we study the characteristics of e-commerce taxonomies, and propose a new subsumption prediction method based on the pre-trained language model BERT that is well adapted to the e-commerce setting. The proposed model utilises textual and structural semantics in a taxonomy, as well as the rich and noisy instance (item) information. We show through extensive evaluation on two large-scale e-commerce taxonomies from eBay and AliOpenKG, that our method offers substantial improvement over strong baselines.
Keywords
- Subsumption Prediction
- E-Commerce Taxonomy
- Pre-trained Language Model
- BERT
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Browse this category and the taxonomy around it at https://www.ebay.com/b/175781.
- 2.
- 3.
- 4.
Code and data available at https://github.com/jingcshi/bert_subsumption.
- 5.
The difficulty of such task is illustrated by labels featuring a mixture of conjunction and disjunction, e.g., Suit Jackets & Blazers, which means “Suit Jackets” \(\vee \) “Blazers”, and North & Central America, which means “North America” \(\vee \) “Central America”.
- 6.
For the Chinese AliOpenKG dataset (see Sect. 5.1), the template is “
- 7.
- 8.
References
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Bechhofer, S.: OWL web ontology language reference, W3C recommendation (2004). http://www.w3.org/TR/owl-ref/
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Centelles, M.: Taxonomies for categorization and organization in web sites. Hipertext.net (3) (2005). https://www.upf.edu/hipertextnet/en/numero-3/taxonomias.html
Chen, J., He, Y., Jimenez-Ruiz, E., Dong, H., Horrocks, I.: Contextual semantic embeddings for ontology subsumption prediction. arXiv preprint arXiv:2202.09791 (2022)
Chen, J., Hu, P., Jimenez-Ruiz, E., Holter, O.M., Antonyrajah, D., Horrocks, I.: OWL2Vec*: embedding of owl ontologies. Mach. Learn. 110(7), 1813–1845 (2021)
Cruse, D.A.: Hyponymy and Its Varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships, pp. 3–21. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0073-3_1
Dasgupta, S., Boratko, M., Zhang, D., Vilnis, L., Li, X., McCallum, A.: Improving local identifiability in probabilistic box embeddings. Adv. Neural. Inf. Process. Syst. 33, 182–192 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhingra, B., Shallue, C.J., Norouzi, M., Dai, A.M., Dahl, G.E.: Embedding text in hyperbolic spaces. arXiv preprint arXiv:1806.04313 (2018)
Dong, H., Wang, W., Coenen, F.: Rules for inducing hierarchies from social tagging data. In: Chowdhury, G., McLeod, J., Gillet, V., Willett, P. (eds.) iConference 2018. LNCS, vol. 10766, pp. 345–355. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78105-1_38
Gumbel, E.J.: Les valeurs extrêmes des distributions statistiques. In: Annales de l’institut Henri Poincaré, vol. 5, pp. 115–158 (1935)
Iqbal, R., Murad, M.A.A., Mustapha, A., Sharef, N.M., et al.: An analysis of ontology engineering methodologies: a literature review. Res. J. Appl. Sci. Eng. Technol. 6(16), 2993–3000 (2013)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Jurgens, D., Pilehvar, M.T.: Semeval-2016 task 14: semantic taxonomy enrichment. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1092–1102 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022)
Lee, S., Park, Y.: The classification and strategic management of services in e-commerce: development of service taxonomy based on customer perception. Expert Syst. Appl. 36(6), 9618–9624 (2009)
Lees, A., Welty, C., Zhao, S., Korycki, J., Mc Carthy, S.: Embedding semantic taxonomies. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1279–1291 (2020)
Li, X., Vilnis, L., Zhang, D., Boratko, M., McCallum, A.: Smoothing the geometry of probabilistic box embeddings. In: International Conference on Learning Representations (2018)
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021)
Luo, X., et al.: Alicoco: alibaba e-commerce cognitive concept net. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 313–327 (2020)
Mao, Y., et al.: Octet: online catalog taxonomy enrichment with self-supervision. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2247–2257 (2020)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Nickel, M., Kiela, D.: Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In: International Conference on Machine Learning, pp. 3779–3788. PMLR (2018)
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)
Rossi, A., Barbosa, D., Firmani, D., Matinata, A., Merialdo, P.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data (TKDD) 15(2), 1–49 (2021)
Schafer, J.B., Konstan, J.A., Riedl, J.: E-commerce recommendation applications. Data Min. Knowl. Disc. 5(1), 115–153 (2001)
Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152. IEEE (2012)
Smaili, F.Z., Gao, X., Hoehndorf, R.: Onto2vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics 34(13), i52–i60 (2018)
Smaili, F.Z., Gao, X., Hoehndorf, R.: OPA2vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 35(12), 2133–2140 (2019)
Stock, W.G.: Concepts and semantic relations in information science. J. Am. Soc. Inform. Sci. Technol. 61(10), 1951–1969 (2010)
Vendrov, I., Kiros, R., Fidler, S., Urtasun, R.: Order-embeddings of images and language. arXiv preprint arXiv:1511.06361 (2015)
Vilnis, L., Li, X., Murty, S., McCallum, A.: Probabilistic embedding of knowledge graphs with box lattice measures. arXiv preprint arXiv:1805.06627 (2018)
Yang, B., Yih, W.t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)
Zhu, X., et al.: Multi-modal knowledge graph construction and application: a survey. arXiv preprint arXiv:2202.05786 (2022)
Acknowledgment
We would like to thank Mingjian Lu and Canran Xu for their work and constructive ideas. This work was supported by eBay and the EPSRC projects OASIS (EP/S032347/1), UK FIRES (EP/S019111/1) and ConCur (EP/V050869/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, J. et al. (2023). Subsumption Prediction for E-Commerce Taxonomies. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-33455-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33454-2
Online ISBN: 978-3-031-33455-9
eBook Packages: Computer ScienceComputer Science (R0)