Content-Based Recommendations in an E-Commerce Platform

  • Łukasz Dragan
  • Anna WróblewskaEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 945)


Recommendation systems play an important role in modern e-commerce services. The more relevant items are presented to the user, the more likely s/he is to stay on a website and eventually make a transaction. In this paper, we adapt some state-of-the-art methods for determining similarities between text documents to content-based recommendations problem. The goal is to investigate variety of recommendation methods using semantic text analysis techniques and compare them to querying search engine index of documents. As a conclusion we show, that there is no significant difference between examined methods. However using query based recommendations we need more precise meta-data prepared by content creators. We compare these algorithms on a database of product articles of the biggest e-commerce marketplace platform in Eastern Europe - Allegro. (The primary version of this paper was presented at the 3rd Conference on Information Technology, Systems Research and Computational Physics, 2–5 July 2018, Cracow, Poland [4].)


Content-based recommendations Natural language processing Distributional semantics Word embeddings 



This paper provides description of graduate work by Łukasz Dragan, that was conducted and supervised by Anna Wróblewska. This work was made with cooperation of Allegro team, that provided business case and the valuable dataset of 20 thousands product articles available through the platform

The work was conducted as Anna Wróblewska was an employee of Allegro and after that during cooperation as a research advisor from Warsaw University of Technology.

The work was also partially supported as the RENOIR Project by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement No. 691152 and by Ministry of Science and Higher Education (Poland), grant Nos. W34/H2020/2016.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 4–5 (2003)zbMATHGoogle Scholar
  2. 2.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  3. 3.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. Facebook AI Research (2016)Google Scholar
  4. 4.
    Dragan, Ł., Wróblewska, A.: Contemporary computational science. In: Kulczycki, P., Kowalski, P.A., Łukasik, S. (eds.), p. 22. AGH-UST Press, Cracow (2018)Google Scholar
  5. 5.
    Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From Word Embeddings to Document Distances. In: International Conference on Machine Learning (ICML) (2015)Google Scholar
  6. 6.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Machine Learning (ICML) (2013)Google Scholar
  7. 7.
  8. 8.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation. Computer Science Department, Stanford University, Stanford, CA 94305 (2014)Google Scholar
  9. 9.
    Kedzia, P., Czachor, G., Piasecki, M., Kocoń, J.: Vector representations of polish words (Word2Vec method)., Wrocław University of Technology (2016).
  10. 10.
  11. 11.
  12. 12.
    Przepiórkowski, A., Bańko, M., Górski, R., Lewandowska-Tomaszczyk, B. (eds.) Narodowy Korpus Jezyka Polskiego/National Corpus of Polish Language. Publisher PWN, Warsaw (2012). (in Polish).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations