Towards Automatic Cataloging of Image and Textual Collections with Wikipedia

Suzuki, Tokinori; Ikeda, Daisuke; Galuščáková, Petra; Oard, Douglas

doi:10.1007/978-3-030-34058-2_16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11853))

Included in the following conference series:

International Conference on Asian Digital Libraries

727 Accesses
1 Citations

Abstract

In recent years, a large amount of multimedia data consisting of images and text have been generated in libraries through the digitization of physical materials into data for their preservation. When they are archived, appropriate cataloging metadata are assigned to them by librarians. Automatic annotations are helpful for reducing the cost of manual annotations. To this end, we propose a mapping system that links images and the associated text to entries on Wikipedia as a replacement for annotation by targeting images and associated text from photo-sharing sites. The uploaded images are accompanied by descriptive labels of contents of the sites that can be indexed for the catalogue. However, because users freely tag images with labels, these user-assigned labels are often ambiguous. The label “albatross”, for example, may refer to a type of bird or aircraft. If the ambiguities are resolved, we can use Wikipedia entries for cataloging as an alternative to ontologies. To formalize this, we propose a task called image label disambiguation where, given an image and assigned target labels to be disambiguated, an appropriate Wikipedia page is selected for the given labels. We propose a hybrid approach for this task that makes use of both user tags as textual information and features of images generated through image recognition. To evaluate the proposed task, we develop a freely available test collection containing 450 images and 2,280 ambiguous labels. The proposed method outperformed prevalent text-based approaches in terms of the mean reciprocal rank, attaining a value of over 0.6 on both our collection and the ImageCLEF collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Corrado, E.M.: Digital Preservation for Libraries, Archives, and Museums, 2nd edn. Rowman & Littlefield Publishers, Cambridge (2017)
Google Scholar
Page, K.R., Bechhofer, S., Fazekas, G., Weigl, D.M., Wilmering, T.: Realizing a layered digital library: exploration and analysis of the live music archive through linked data. In: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, pp. 89–98 (2017)
Google Scholar
Kroegen, A.: The road to BBIBFRAME: the evolution of the idea of bibliographic transition into a post-MARC Future. Cataloging Classif. Q. 51(8), 873–890 (2013)
Article Google Scholar
Nurmikko-Fuller, T., Jett, J., Cole, T., Maden, C., Page, K.R., Downie, J.S.: A comparative analysis of bibliographic ontologies: implications for digital humanities. In: Digital Humanities 2016: Conference Abstracts, pp. 639–642 (2016)
Google Scholar
Sawant, N., Li, J., Wang, J.Z.: Automatic image semantic interpretation using social action and tagging data. Multimedia Tools Appl. 51(1), 213–246 (2011)
Article Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 708–716 (2007)
Google Scholar
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1375–1384 (2011)
Google Scholar
Cheng, X., Roth, D.: Relational Inference for wikification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1787–1796 (2013)
Google Scholar
Ganea, O.E., Ganea, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th International Conference on World Wide Web, pp. 927–938 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Szegedy, C., Loffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2015)
Google Scholar
Weegar, R., Hammarlund, L., Tegen, A., Oskarsson, M., Åström, K., Nugues, P.: Visual entity linking: a preliminary study. In: Proceedings of Cognitive Computing for Augmented Human Intelligence Workshop at the 28th AAAI Conference on Artificial Intelligence, pp. 46–49 (2014)
Google Scholar
Tilak, N., Gandhi, S., Oates, T.: Visual entity linking. In: Proceedings of the 2017 International Joint Conference on Neural Networks, pp. 665–672 (2017)
Google Scholar
Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47(1), 853–899 (2013)
Article MathSciNet Google Scholar
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2014)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the 8th ACM International Conference on Image and Video Retrieval (2009)
Google Scholar
Gilbert, A., et al.: Overview of the ImageCLEF 2016 scalable web image annotation task. In: Proceedings of the Seventh International Conference of the CLEF Association (2016)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Book Google Scholar
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A Dual Embedding Space Model for Document Ranking. arXiv preprint arXiv:1602.01137 (2016)
Ye, X., Shen, H., Ma, X., Bunescu, R., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th IEEE International Conference on Software Engineering, pp. 404–415 (2016)
Google Scholar
Robertson, S.E., Walker, S., Jones S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the third Text REtrieval Conference, pp. 109–126 (1996)
Google Scholar
Craswell, N.: Mean reciprocal rank. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-39940-9_488
Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu University, Fukuoka, 8190395, Japan
Tokinori Suzuki & Daisuke Ikeda
University of Maryland, College Park, MD, 20742, USA
Petra Galuščáková & Douglas Oard

Authors

Tokinori Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Ikeda
View author publications
You can also search for this author in PubMed Google Scholar
Petra Galuščáková
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Oard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tokinori Suzuki .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Adam Jatowt
Ritsumeikan University, Kusatsu, Japan
Akira Maeda
The Catholic University of America, Washington, DC, USA
Sue Yeon Syn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, T., Ikeda, D., Galuščáková, P., Oard, D. (2019). Towards Automatic Cataloging of Image and Textual Collections with Wikipedia. In: Jatowt, A., Maeda, A., Syn, S. (eds) Digital Libraries at the Crossroads of Digital Information for the Future. ICADL 2019. Lecture Notes in Computer Science(), vol 11853. Springer, Cham. https://doi.org/10.1007/978-3-030-34058-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-34058-2_16
Published: 29 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34057-5
Online ISBN: 978-3-030-34058-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics