Extracting Semantic Information for e-Commerce

  • Bruno Charron
  • Yu Hirate
  • David Purcell
  • Martin Rezk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9982)


Rakuten Ichiba uses a taxonomy to organize the items it sells. Currently, the taxonomy classes that are relevant in terms of profit generation and difficulty of exploration are being manually extended with data properties deemed helpful to create pages that improve the user search experience and ultimately the conversion rate. In this paper we present a scalable approach that aims to automate this process, automatically selecting the relevant and semantically homogenous subtrees in the taxonomy, extracting from semi-structured text in items descriptions a core set of properties and a popular subset of their ranges, then extending the covered range using relational similarities in free text. Additionally, our process automatically tags the items with the new semantic information and exposes them as RDF triples. We present a set of experiments showing the effectiveness of our approach in this business context.


  1. 1.
    Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. CoRR abs/1302.1335 (2013)Google Scholar
  2. 2.
    Barkschat, K.: Semantic information extraction on domain specific data sheets. In: Presutti, V., dAmato, C., Gandon, F., dAquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 864–873. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07443-6_60 CrossRefGoogle Scholar
  3. 3.
    Berardi, D., Calvanese, D., Giacomo, G.D.: Reasoning on UML class diagrams. Artif. Intell. 168(1–2), 70–118 (2005)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bing, L., Wong, T.L., Lam, W.: Unsupervised extraction of popular product attributes from e-commerce web sites by considering customer reviews. ACM Trans. Internet Technol. 16(2), 12:1–12:17 (2016)Google Scholar
  5. 5.
    Calvanese, D., Cogrel, B., Ebri, S.K., Kontchakov, R., Lanti, D., M.R., Rodriguez-Muro, M., Xiao, G.: Ontop: Answering SPARQL queries over relational databases. Semant. Web J. (2016)Google Scholar
  6. 6.
    Chen, H.H., Tsai, S.C., Tsai, J.H.: Mining tables from large scale html texts. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 166–172. Association for Computational Linguistics (2000)Google Scholar
  7. 7.
    Ding, Y., Fensel, D., Klein, M.C.A., Omelayenko, B., Schulten, E.: The role of ontologies in ecommerce. In: Handbook on Ontologies, pp. 593–616. Springer (2004)Google Scholar
  8. 8.
    Exner, P., Nugues, P.: Entity extraction: from unstructured text to dbpedia rdf triples. In: International the Semantic Web Conference, ISWC 2012, pp. 58–69. CEUR (2012)Google Scholar
  9. 9.
    Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C.: Real understanding of real estate forms. In: WIMS, p. 13 (2011)Google Scholar
  10. 10.
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80. ACM (2007)Google Scholar
  11. 11.
    Giese, M., Soylu, A., Vega-Gorgojo, G., Waaler, A., Haase, P., Jiménez-Ruiz, E., Lanti, D., Rezk, M., Xiao, G., Özçep, Ö.L., Rosati, R.: Optique: zooming in on big data. IEEE Comput. 48(3), 60–67 (2015)CrossRefGoogle Scholar
  12. 12.
    Gupta, R., Halevy, A., Wang, X., Whang, S., Wu, F.: Biperpedia: an ontology forsearch applications. In: Proceedings of the 40th International Conference on Very Large Data Bases (PVLDB) (2014)Google Scholar
  13. 13.
    He, H., Meng, W., Lu, Y., Yu, C., Wu, Z.: Towards deeper understanding of the search interfaces of the deep web. World Wide Web 10(2), 133–155 (2007)CrossRefGoogle Scholar
  14. 14.
    Krestel, R., Witte, R., Bergler, S.: Predicate-Argument EXtractor (PAX). In: New Challenges for NLP Frameworks, pp. 51–54. ELRA (2010)Google Scholar
  15. 15.
    Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Conference on Computational Natural Language Learning, pp. 171–180. Ann Arbor, Michigan, June 2014Google Scholar
  16. 16.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)Google Scholar
  17. 17.
    Mitchell, T.M., Cohen Jr., W.W., Hruschka, E.R., Talukdar, P.P., Betteridge, J., Carlson, A., Mishra, B.D., Gardner, M., Kisiel, B., Krishnamurthy, J., Lao, N., Mazaitis, K., Mohamed, T., Nakashole, N., Platanios, E.A., Ritter, A., Samadi, M., Settles, B., Wang, R.C., Wijaya, D.T., Gupta, A., Chen, X., Saparov, A., Greaves, M., Welling, J.: Never-ending learning. In: AAAI, Texas, USA, pp. 2302–2310 (2015)Google Scholar
  18. 18.
    Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: DEXTER: large-scale discovery and extraction of product specifications on the web. PVLDB 8(13), 2194–2205 (2015)Google Scholar
  19. 19.
    Reinberger, M.L., Spyns, P.: Discovering knowledge in texts for the learning of dogma-inspired ontologies. In: ECAI 2004 Workshop on Ontology Learning and Population (2004)Google Scholar
  20. 20.
    Rodriguez-Muro, M., Rezk, M.: Efficient SPARQL-to-SQL with R2RML mappings. J. Web Sem. 33, 141–169 (2015)CrossRefGoogle Scholar
  21. 21.
    Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_61 CrossRefGoogle Scholar
  22. 22.
    Schutz, A., Buitelaar, P.: RelExt: a tool for relation extraction from text in ontology extension. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 593–606. Springer, Heidelberg (2005). doi:10.1007/11574620_43 CrossRefGoogle Scholar
  23. 23.
    Shinzato, K., Sekine, S.: Unsupervised extraction of attributes and their values from product description. In: Sixth International Joint Conference on Natural Language Processing, IJCNLP 2013, pp. 1339–1347 (2013)Google Scholar
  24. 24.
    Wang, C., Kalyanpur, A., Fan, J., Boguraev, B., Gondek, D.: Relation extraction and scoring in deepqa. IBM J. Res. Dev. 56(3), 9 (2012)Google Scholar
  25. 25.
    Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)CrossRefGoogle Scholar
  26. 26.
    Yoshida, M., Torisawa, K.: A method to integrate tables of the world wide web. In: International Workshop on Web Document Analysis (WDA 2001), pp. 31–34 (2001)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Bruno Charron
    • 1
  • Yu Hirate
    • 1
  • David Purcell
    • 1
  • Martin Rezk
    • 1
  1. 1.Rakuten Inc.TokyoJapan

Personalised recommendations