Benchmarking Deep Learning Models for Classification of Book Covers

Lucieri, Adriano; Sabir, Huzaifa; Siddiqui, Shoaib Ahmed; Rizvi, Syed Tahseen Raza; Iwana, Brian Kenji; Uchida, Seiichi; Dengel, Andreas; Ahmed, Sheraz

doi:10.1007/s42979-020-00132-z

Benchmarking Deep Learning Models for Classification of Book Covers

Original Research
Published: 24 April 2020

Volume 1, article number 139, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Adriano Lucieri¹^na1,
Huzaifa Sabir¹^na1,
Shoaib Ahmed Siddiqui¹^na1,
Syed Tahseen Raza Rizvi¹^na1,
Brian Kenji Iwana²,
Seiichi Uchida²,
Andreas Dengel¹ &
…
Sheraz Ahmed¹

1469 Accesses
5 Citations
Explore all metrics

Abstract

Book covers usually provide a good depiction of a book’s content and its central idea. The classification of books in their respective genre usually involves subjectivity and contextuality. Book retrieval systems would utterly benefit from an automated framework that is able to classify a book’s genre based on an image, specifically for archival documents where digitization of the complete book for the purpose of indexing is an expensive task. While various modalities are available (e.g., cover, title, author, abstract), benchmarking the image-based classification systems based on minimal information is a particularly exciting field due to the recent advancements in the domain of image-based deep learning and its applicability. For that purpose, a natural question arises regarding the plausibility of solving the problem of book classification by only utilizing an image of its cover along with the current state-of-the-art deep learning models. To answer this question, this paper makes a three-fold contribution. First, the publicly available book cover dataset comprising of 57k book covers belonging to 30 different categories is thoroughly analyzed and corrected. Second, it benchmarks the performance on a battery of state-of-the-art image classification models for the task of book cover classification. Third, it uses explicit attention mechanisms to identify the regions that the network focused on in order to make the prediction. All of our evaluations were performed on a subset of the mentioned public book cover dataset. Analysis of the results revealed the inefficacy of the most powerful models for solving the classification task. With the obtained results, it is evident that significant efforts need to be devoted in order to solve this image-based classification task to a satisfactory level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CNN-Based Book Cover and Back Cover Recognition and Classification

Cover-based multiple book genre recognition using an improved multimodal network

Article 20 September 2022

Book Page Identification Using Convolutional Neural Networks Trained by Task-Unrelated Dataset

Notes

https://github.com/adriano-lucieri/book-dataset

References

Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M. Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th international conference on document analysis and recognition (ICDAR); 2015. p. 1111–5. https://doi.org/10.1109/ICDAR.2015.7333933.
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L. Bottom-up and top-down attention for image captioning and visual question answering. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018. vol. 3, p. 6.
Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015;10(7):2015.
Google Scholar
Buczkowski P, Sobkowicz A, Kozlowski M. Deep learning approaches towards book covers classification. In: International conference on pattern recognition applications and methods (ICPRAM); 2018. p. 309–16. https://doi.org/10.5220/0006556103090316.
Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation; 2017. arXiv:1706.05587.
Chiu CC, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2018. p. 4774–8.
Cubuk ED, Zoph B, Mané D, Vasudevan V, Le QV. Autoaugment: learning augmentation policies from data. CoRR; 2018. arXiv:abs/1805.09501.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems (NIPS); 2014. p. 2672–80.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 770–8.
Hou X, Zhang L. Saliency detection: a spectral residual approach. In: The IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2007. p. 1–8.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 7132–41.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700–8.
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. Squeezenet: alexnet-level accuracy with 50\(\times\) fewer parameters and< 0.5 mb model size; 2016. arXiv:1602.07360.
Iwana BK, Rizvi STR, Ahmed S, Dengel A, Uchida S. Judging a book by its cover; 2016. arXiv:1610.09204.
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Advances in neural information processing systems (NIPS); 2015. p. 2017–25.
Jolly S, Iwana BK, Kuroki R, Uchida S. How do convolutional neural networks learn design? In: 2018 24th international conference on pattern recognition (ICPR). IEEE; 2018. p. 1085–90.
Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers. Association for Computational Linguistics; 2017. p. 427–31.
Karayev S, Trentacoste M, Han H, Agarwala A, Darrell T, Hertzmann A, Winnemoeller H. Recognizing image style; 2013. arXiv:1311.3715.
Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of gans for improved quality, stability, and variation; 2017. arXiv:1710.10196.
Kjartansson S, Ashavsky A. Can you judge a book by its cover? Stanford CS231N; 2017. http://cs231n.stanford.edu/reports/2017/pdfs/814.pdf.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS); 2012. p. 1097–105.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Article Google Scholar
LeCun Y, Cortes C. MNIST handwritten digit database; 2010. http://yann.lecun.com/exdb/mnist/.
Libeks J, Turnbull D. You can judge an artist by an album cover: using images for music annotation. IEEE Multi Med. 2011;18(4):30–7.
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft coco: common objects in context. In: European conference on computer vision. Springer; 2014. p. 740–55.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning; 2013. arXiv:1312.5602.
Nar K, Ocal O, Sastry SS, Ramchandran K. Cross-entropy loss leads to poor margins. OpenReview; 2019. https://openreview.net/forum?id=ByfbnsA9Km.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY. Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning; 2011. vol. 2011, p. 5.
Nilsback ME, Zisserman A. Automated flower classification over a large number of classes. In: Sixth Indian conference on computer vision, graphics & image processing, 2008. ICVGIP’08. IEEE; 2008. p. 722–9.
Oramas S, Barbieri F, Nieto O, Serra X. Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retr. 2018;1(1):4–21.
Google Scholar
Oramas S, Nieto O, Barbieri F, Serra X. Multi-label music genre classification from audio, text, and images using deep features; 2017. arXiv:1707.04916.
Rodríguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Roca X. A painless attention mechanism for convolutional neural networks; 2018. https://openreview.net/forum?id=rJe7FW-Cb.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV). 2015;115(3):211–52. https://doi.org/10.1007/s11263-015-0816-y.
Article MathSciNet Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition; 2014. arXiv:1409.1556.
Sobkowicz A, Kozłowski M, Buczkowski P. Reading book by the cover-book genre detection using short descriptions. In: International conference on man–machine interactions. Springer; 2017. p. 439–48.
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence; 2017. vol. 4, p. 12.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: The IEEE conference on computer vision and pattern recognition (CVPR) 2015.
Wah C, Branson S, Welinder P, Perona P, Belongie S. The caltech-UCSD birds-200-2011 dataset 2011.
Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S, et al. Tacotron: a fully end-to-end text-to-speech synthesis model; 2017. arXiv:1703.10135.
Yao JG, Wan X, Xiao J. Recent advances in document summarization. Knowl Inf Syst. 2017;53(2):297–336. https://doi.org/10.1007/s10115-017-1042-4.
Article Google Scholar
Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J. Lsun: construction of a large-scale image dataset using deep learning with humans in the loop; 2015. arXiv:1506.03365.
Zoph B, Vasudevan V, Shlens J, Le QV. Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 8697–710.
Zujovic J, Gandy L, Friedman S, Pardo B, Pappas TN. Classifying paintings by artistic genre: an analysis of features & classifiers. In: IEEE international workshop on multimedia signal processing, 2009. MMSP’09. IEEE; 2009. p. 1–5.

Download references

Acknowledgements

This work was supported by the BMBF project DeFuseNN (Grant 01IW17002) and partially supported by JSPS KAKENHI (Grant JP17H06100). We thank all members of the Deep Learning Competence Center at the DFKI for their comments and support.

Author information

Adriano Lucieri, Huzaifa Sabir, Shoaib Ahmed Siddiqui and Syed Tahseen Raza Rizvi have contributed equally to this work.

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Adriano Lucieri, Huzaifa Sabir, Shoaib Ahmed Siddiqui, Syed Tahseen Raza Rizvi, Andreas Dengel & Sheraz Ahmed
Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
Brian Kenji Iwana & Seiichi Uchida

Authors

Adriano Lucieri
View author publications
You can also search for this author in PubMed Google Scholar
Huzaifa Sabir
View author publications
You can also search for this author in PubMed Google Scholar
Shoaib Ahmed Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar
Syed Tahseen Raza Rizvi
View author publications
You can also search for this author in PubMed Google Scholar
Brian Kenji Iwana
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Uchida
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar
Sheraz Ahmed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adriano Lucieri.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

“This article is part of the topical collection “Document Analysis and Recognition” guest edited by Michael Blumenstein, Seiichi Uchida and Cheng-Lin Liu”.

The source code and the models are available at https://github.com/adriano-lucieri/BookCoverClassification .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lucieri, A., Sabir, H., Siddiqui, S.A. et al. Benchmarking Deep Learning Models for Classification of Book Covers. SN COMPUT. SCI. 1, 139 (2020). https://doi.org/10.1007/s42979-020-00132-z

Download citation

Received: 20 January 2020
Accepted: 30 March 2020
Published: 24 April 2020
DOI: https://doi.org/10.1007/s42979-020-00132-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarking Deep Learning Models for Classification of Book Covers

Abstract

Access this article

Similar content being viewed by others

CNN-Based Book Cover and Back Cover Recognition and Classification

Cover-based multiple book genre recognition using an improved multimodal network

Book Page Identification Using Convolutional Neural Networks Trained by Task-Unrelated Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Benchmarking Deep Learning Models for Classification of Book Covers

Abstract

Access this article

Similar content being viewed by others

CNN-Based Book Cover and Back Cover Recognition and Classification

Cover-based multiple book genre recognition using an improved multimodal network

Book Page Identification Using Convolutional Neural Networks Trained by Task-Unrelated Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation