Abstract
In this paper we introduce a visual database for children’s picture book and we also present an intelligent robot trained on this database. Firstly, a large-scale image dataset is built that contains image samples of book pages. It can be used to verify image indexing algorithms and content recognition algorithms. Secondly, we study the state-of-the-art algorithms in image matching and object recognition. Several approaches are presented and compared from the aspects of computational efficiency and recognition accuracy. In order to improve the speed we proposed a novel hierarchical algorithm for fast search. Finally, using this large-scale database we are able to build a robot that can read children’s picture books and initial experimental results are presented. We can see that both the training database and the algorithms are promising, yet there are still a few open challenges concerning the costs and robustness.
Similar content being viewed by others
References
Bay H, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Cai H, Wu Q, Corradi T, et al (2015) The cross-depiction problem: computer vision algorithms for recognising objects in artwork and in photographs. arXiv:1505.00110
Deng J, Dong W, Socher R, Li LJ, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Dong J, Soatto S (2015) Domain-size pooling in local descriptors DSP-SIFT. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5097–5106
Dosovitskiy A, Springenberg JT, Brox T (2013) Unsuper-vised feature learning by augmenting single images. arXiv:1312.5242
Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. arXiv:1405.5769
Ginosar S, Haas D, Brown T, et al (2014) Detecting people in cubist art, workshop at the European conference on computer vision. Springer International Publishing, pp 101–116
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition
Huang C, Efraty BA, Kurkure U, et al (2012) Facial landmark configuration for improved detection. In: IEEE international workshop on information forensics & security
Jian M, Lam KM (2014) Face-image retrieval based on singular values and potential-field representation. Signal Process 100(7):9–15
Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14
Jian M, Lam KM, Dong J, et al (2015) Visual-patch-attention-aware saliency detection. IEEE Transactions on Cybernetics 45(8):1575
Jian M, Qi Q, Dong J, et al (2018) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77 (11):14343–14360
Jian M, Yin Y, Dong J, et al (2018) Content-based image retrieval via a hierarchical-local-feature extraction scheme. Multimed Tools Appl 77(21):29099–29117
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. International conference on neural information processing systems
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lin TY, Maire M, Belongie S, et al (2014) Microsoft coco: common objects in context, European conference on computer vision. Springer International Publishing, pp 740–755
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference learning representations
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition
Wang XY, Zhang BB, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569
Yang J, Jiang YG, Hauptmann AG, et al (2007) Evaluating bag-of-visual-words representations in scene classification. Proceedings of the international workshop on workshop on multimedia information retrieval. ACM, 197-206
Zhang T, Yang Z, Jia W, et al (2015) Fast and robust head detection with arbitrary pose and occlusion. Multimed Tools Appl 74(21):9365–9385
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable advices that improved this paper greatly, and Dr. Jingjie Yan for his help in data collection, technical writing and experiment set-up.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, C., Jiang, H. Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78, 20679–20695 (2019). https://doi.org/10.1007/s11042-019-7440-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7440-8