Skip to main content
Log in

Image indexing and content analysis in children’s picture books using a large-scale database

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we introduce a visual database for children’s picture book and we also present an intelligent robot trained on this database. Firstly, a large-scale image dataset is built that contains image samples of book pages. It can be used to verify image indexing algorithms and content recognition algorithms. Secondly, we study the state-of-the-art algorithms in image matching and object recognition. Several approaches are presented and compared from the aspects of computational efficiency and recognition accuracy. In order to improve the speed we proposed a novel hierarchical algorithm for fast search. Finally, using this large-scale database we are able to build a robot that can read children’s picture books and initial experimental results are presented. We can see that both the training database and the algorithms are promising, yet there are still a few open challenges concerning the costs and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Bay H, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  2. Cai H, Wu Q, Corradi T, et al (2015) The cross-depiction problem: computer vision algorithms for recognising objects in artwork and in photographs. arXiv:1505.00110

  3. Deng J, Dong W, Socher R, Li LJ, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  4. Dong J, Soatto S (2015) Domain-size pooling in local descriptors DSP-SIFT. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5097–5106

  5. Dosovitskiy A, Springenberg JT, Brox T (2013) Unsuper-vised feature learning by augmenting single images. arXiv:1312.5242

  6. Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. arXiv:1405.5769

  7. Ginosar S, Haas D, Brown T, et al (2014) Detecting people in cubist art, workshop at the European conference on computer vision. Springer International Publishing, pp 101–116

  8. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition

  9. Huang C, Efraty BA, Kurkure U, et al (2012) Facial landmark configuration for improved detection. In: IEEE international workshop on information forensics & security

  10. Jian M, Lam KM (2014) Face-image retrieval based on singular values and potential-field representation. Signal Process 100(7):9–15

    Article  Google Scholar 

  11. Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14

    Article  Google Scholar 

  12. Jian M, Lam KM, Dong J, et al (2015) Visual-patch-attention-aware saliency detection. IEEE Transactions on Cybernetics 45(8):1575

    Article  Google Scholar 

  13. Jian M, Qi Q, Dong J, et al (2018) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77 (11):14343–14360

    Article  Google Scholar 

  14. Jian M, Yin Y, Dong J, et al (2018) Content-based image retrieval via a hierarchical-local-feature extraction scheme. Multimed Tools Appl 77(21):29099–29117

    Article  Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. International conference on neural information processing systems

  16. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  17. Lin TY, Maire M, Belongie S, et al (2014) Microsoft coco: common objects in context, European conference on computer vision. Springer International Publishing, pp 740–755

  18. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  19. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference learning representations

  20. Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition

  21. Wang XY, Zhang BB, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569

    Article  Google Scholar 

  22. Yang J, Jiang YG, Hauptmann AG, et al (2007) Evaluating bag-of-visual-words representations in scene classification. Proceedings of the international workshop on workshop on multimedia information retrieval. ACM, 197-206

  23. Zhang T, Yang Z, Jia W, et al (2015) Fast and robust head detection with arbitrary pose and occlusion. Multimed Tools Appl 74(21):9365–9385

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable advices that improved this paper greatly, and Dr. Jingjie Yan for his help in data collection, technical writing and experiment set-up.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengwei Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, C., Jiang, H. Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78, 20679–20695 (2019). https://doi.org/10.1007/s11042-019-7440-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7440-8

Keywords

Navigation