Enriching Scientific Publications from LOD Repositories Through Word Embeddings Approach

  • Arben HajraEmail author
  • Klaus Tochtermann
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 672)


The era of digitalization is increasingly emphasizing the role of Digital Libraries (DL), by increasing requirements and expectations of services provided by them. The interoperability among repositories and other resources continues to be a subject of research in the field. Retrieving publications related to a particular topic from different DLs, especially from diverse domains, require several clicks and online visits of many different points of access. However, achieving interoperability by cross-linking publications, authors and other related data would facilitate the scholarly communication in general. Starting from a single point, a scholar would be able to find resources i.e., publications and authors, previously enriched with several other information from different repositories. Repositories available as semantic web content, such as bibliographic Linked Open Data (LOD) datasets are the focus of this study. Primarily, we consider existing alignments among concepts between repositories. Improvements regarding the semantic measurements of relatedness of different resources are possible by the application of text-mining techniques. The paper introduces preliminary experiments conducted by vector space models through the application of TF-IDF and Cosine Similarity (CS). Additionally, the paper discusses experiments of applying a word embedding approach, with which we are focusing mainly on the context by distributed word representations, instead of word frequency, weighting and string matching. We apply the contemporary Word2Vec model as a similar deep learning approach to model semantic word representations.


Digital Libraries Linked Open Data Semantic web Word embeddings Data mining Recommended systems 


  1. 1.
    Lebret, R., Collobert, R.: Rehabilitation of count-based models for word vector representations. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 417–429. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-18111-0_31 Google Scholar
  2. 2.
    Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)Google Scholar
  3. 3.
    Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, vol. 194, pp. 137–187. Springer, Heidelberg (2006). doi: 10.1007/3-540-33486-6_6 CrossRefGoogle Scholar
  4. 4.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)Google Scholar
  5. 5.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)Google Scholar
  6. 6.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  8. 8.
    Hajra, A., Latif, A., Tochtermann, K.: Retrieving and ranking scientific publications from linked open data repositories. In: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business, p. 29. ACM (2014)Google Scholar
  9. 9.
    Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)Google Scholar
  10. 10.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Inf. Retr. 3(4), 333–389 (2009)Google Scholar
  11. 11.
    Mooney, R.J., Roy, L.: Content-based book recommending using learning for text categorization. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 195–204. ACM (2000)Google Scholar
  12. 12.
    Huang, Z., Chung, W., Ong, T.H., Chen, H.: A graph-based recommender system for digital library. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 65–73. ACM (2002)Google Scholar
  13. 13.
    Smeaton, A.F., Callan, J.: Personalisation and recommender systems in digital libraries. Int. J. Digit. Lib. 5(4), 299–308 (2005)CrossRefGoogle Scholar
  14. 14.
    Kling, R., McKim, G.: Scholarly communication and the continuum of electronic publishing. arXiv preprint arXiv:cs/9903015 (1999)
  15. 15.
    Paepcke, A., Chang, C.C.K., Winograd, T., García-Molina, H.: Interoperability for digital libraries worldwide. Commun. ACM 41(4), 33–42 (1998)CrossRefGoogle Scholar
  16. 16.
    Borgman, C.L.: Challenges in building digital libraries for the 21st Century. In: Lim, E.-P., Foo, S., Khoo, C., Chen, H., Fox, E., Urs, S., Costantino, T. (eds.) ICADL 2002. LNCS, vol. 2555, pp. 1–13. Springer, Heidelberg (2002). doi: 10.1007/3-540-36227-4_1 CrossRefGoogle Scholar
  17. 17.
    Besser, H.: The next stage: moving from isolated digital collections to interoperable digital libraries. First Monday 7(6) (2002). doi: 10.5210/fm.v7i6.958
  18. 18.
    Sheth, A.P.: Changing focus on interoperability in information systems: from system, syntax, structure to semantics. In: Goodchild, M., Egenhofer, M., Fegeas, R., Kottman, C. (eds.) Interoperating Geographic Information Systems, vol. 495, pp. 5–29. Springer, Heidelberg (1999). doi: 10.1007/978-1-4615-5189-8_2 CrossRefGoogle Scholar
  19. 19.
    Dietze, S., Sanchez-Alonso, S., Ebner, H., Qing, Y.H., Giordano, D., Marenzi, I., Pereira, N.B.: Interlinking educational resources and the web of data: a survey of challenges and approaches. Program 47(1), 60–91 (2013)CrossRefGoogle Scholar
  20. 20.
    Horava, T.: Challenges and possibilities for collection management in a digital age. Lib. Resour. Tech. Serv. 54(3), 142–152 (2011)CrossRefGoogle Scholar
  21. 21.
    Park, D.H., Kim, H.K., Choi, I.Y., Kim, J.K.: A literature review and classification of recommender systems research. Expert Syst. Appl. 39(11), 10059–10072 (2012)CrossRefGoogle Scholar
  22. 22.
    Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109–132 (2013)CrossRefGoogle Scholar
  23. 23.
    Lops, P., De Gemmis, M., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 73–105. Springer, US (2011)CrossRefGoogle Scholar
  24. 24.
    Sugiyama, K., Kan, M.Y.: Scholarly paper recommendation via user’s recent research interests. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 29–38. ACM (2010)Google Scholar
  25. 25.
    Hajra, A., Radevski, V., Tochtermann, K.: Author profile enrichment for cross-linking digital libraries. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 124–136. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24592-8_10 CrossRefGoogle Scholar
  26. 26.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 957–966 (2015)Google Scholar
  27. 27.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (1), pp. 238–247 (2014)Google Scholar
  28. 28.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP. 14, 1532–1543 (2014)Google Scholar
  29. 29.
    Latif, A., Scherp, A., Tochtermann, K.: LOD for library science: benefits of applying linked open data in the digital library setting. KI-Künstliche Intelligenz 30(2), 149–157 (2016)CrossRefGoogle Scholar
  30. 30.
    Berners- Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 28–37 (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.South East European University (SEEU)Tetovo/SkopjeRepublic of Macedonia
  2. 2.Leibniz Information Centre for Economics (ZBW)Kiel/HamburgGermany

Personalised recommendations