Towards Automatic Cataloging of Image and Textual Collections with Wikipedia

  • Tokinori SuzukiEmail author
  • Daisuke Ikeda
  • Petra Galuščáková
  • Douglas Oard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11853)


In recent years, a large amount of multimedia data consisting of images and text have been generated in libraries through the digitization of physical materials into data for their preservation. When they are archived, appropriate cataloging metadata are assigned to them by librarians. Automatic annotations are helpful for reducing the cost of manual annotations. To this end, we propose a mapping system that links images and the associated text to entries on Wikipedia as a replacement for annotation by targeting images and associated text from photo-sharing sites. The uploaded images are accompanied by descriptive labels of contents of the sites that can be indexed for the catalogue. However, because users freely tag images with labels, these user-assigned labels are often ambiguous. The label “albatross”, for example, may refer to a type of bird or aircraft. If the ambiguities are resolved, we can use Wikipedia entries for cataloging as an alternative to ontologies. To formalize this, we propose a task called image label disambiguation where, given an image and assigned target labels to be disambiguated, an appropriate Wikipedia page is selected for the given labels. We propose a hybrid approach for this task that makes use of both user tags as textual information and features of images generated through image recognition. To evaluate the proposed task, we develop a freely available test collection containing 450 images and 2,280 ambiguous labels. The proposed method outperformed prevalent text-based approaches in terms of the mean reciprocal rank, attaining a value of over 0.6 on both our collection and the ImageCLEF collection.


Entity linking Multimedia Wikipedia Test collection 


  1. 1.
    Corrado, E.M.: Digital Preservation for Libraries, Archives, and Museums, 2nd edn. Rowman & Littlefield Publishers, Cambridge (2017)Google Scholar
  2. 2.
    Page, K.R., Bechhofer, S., Fazekas, G., Weigl, D.M., Wilmering, T.: Realizing a layered digital library: exploration and analysis of the live music archive through linked data. In: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, pp. 89–98 (2017)Google Scholar
  3. 3.
    Kroegen, A.: The road to BBIBFRAME: the evolution of the idea of bibliographic transition into a post-MARC Future. Cataloging Classif. Q. 51(8), 873–890 (2013)CrossRefGoogle Scholar
  4. 4.
    Nurmikko-Fuller, T., Jett, J., Cole, T., Maden, C., Page, K.R., Downie, J.S.: A comparative analysis of bibliographic ontologies: implications for digital humanities. In: Digital Humanities 2016: Conference Abstracts, pp. 639–642 (2016)Google Scholar
  5. 5.
    Sawant, N., Li, J., Wang, J.Z.: Automatic image semantic interpretation using social action and tagging data. Multimedia Tools Appl. 51(1), 213–246 (2011)CrossRefGoogle Scholar
  6. 6.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 708–716 (2007)Google Scholar
  7. 7.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1375–1384 (2011)Google Scholar
  8. 8.
    Cheng, X., Roth, D.: Relational Inference for wikification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1787–1796 (2013)Google Scholar
  9. 9.
    Ganea, O.E., Ganea, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th International Conference on World Wide Web, pp. 927–938 (2016)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Szegedy, C., Loffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2015)Google Scholar
  14. 14.
    Weegar, R., Hammarlund, L., Tegen, A., Oskarsson, M., Åström, K., Nugues, P.: Visual entity linking: a preliminary study. In: Proceedings of Cognitive Computing for Augmented Human Intelligence Workshop at the 28th AAAI Conference on Artificial Intelligence, pp. 46–49 (2014)Google Scholar
  15. 15.
    Tilak, N., Gandhi, S., Oates, T.: Visual entity linking. In: Proceedings of the 2017 International Joint Conference on Neural Networks, pp. 665–672 (2017)Google Scholar
  16. 16.
    Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47(1), 853–899 (2013)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2014)CrossRefGoogle Scholar
  18. 18.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  19. 19.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  21. 21.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the 8th ACM International Conference on Image and Video Retrieval (2009)Google Scholar
  22. 22.
    Gilbert, A., et al.: Overview of the ImageCLEF 2016 scalable web image annotation task. In: Proceedings of the Seventh International Conference of the CLEF Association (2016)Google Scholar
  23. 23.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)CrossRefGoogle Scholar
  24. 24.
    Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)Google Scholar
  25. 25.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  26. 26.
    Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A Dual Embedding Space Model for Document Ranking. arXiv preprint arXiv:1602.01137 (2016)
  27. 27.
    Ye, X., Shen, H., Ma, X., Bunescu, R., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th IEEE International Conference on Software Engineering, pp. 404–415 (2016)Google Scholar
  28. 28.
    Robertson, S.E., Walker, S., Jones S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the third Text REtrieval Conference, pp. 109–126 (1996)Google Scholar
  29. 29.
    Craswell, N.: Mean reciprocal rank. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer, Boston (2009). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Kyushu UniversityFukuokaJapan
  2. 2.University of MarylandCollege ParkUSA

Personalised recommendations