A Deep Learning Framework on Generation of Image Descriptions with Bidirectional Recurrent Neural Networks

  • J. Joshua ThomasEmail author
  • Naris PillaiEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 866)


The aim of the paper is to develop a deep learning framework for a model that generates natural descriptions of pictures (data) and their sections so as to search out a lot of insights. Image recognition is one in all the promising applications of visual objects. In this study, a small-scale food image data set consisting of 5115 pictures of fourteen classes and an eight-layer CNN was made to acknowledge these pictures. CNN performed far better with associate degree overall accuracy 54%. The approach influences information sets of images and their patterns bi directional recurrent neural network (BRNN) will concerning the intern- model correspondences between prediction and visual information for calorie estimation. Data expansion techniques were applied to extend the dimensions of trained images, that achieved a considerably improved accuracy of 74% stop the over fitting issue that occurred to the CNN while not misclassified.


Deep learning Convolutional Neural Network Data augmentation Malaysian food chain 


  1. 1.
  2. 2.
    Gantz, J., Reinsel, D.: Extracting value from chaos (2011).
  3. 3.
    Gantz, J., Reinsel, D.: The digital universe decade-are you ready? (2010).
  4. 4.
    Howard, J.: The business impact of deep learning. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1135 (2013)Google Scholar
  5. 5.
  6. 6.
    De Sousa Ribeiro, F., Caliva, F., Swainson, M., Gudmundsson, K., Leontidis, G., & Kollias, S. (2018, May). An adaptable deep learning system for optical character verification in retail food packaging. In: IEEE Conference on Evolving and Adaptive Intelligent SystemsGoogle Scholar
  7. 7.
    Yunzhou, Z., et al.: Remote mobile health monitoring system based. J. Healthc. Eng. 6(3), 717–738 (2015)Google Scholar
  8. 8.
    Lu, Y.: Food Image Recognition by Using Convolutional Neural Networks (CNNs), Michigan (2016). eprint arXiv:1612.00983
  9. 9.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  10. 10.
    Ketkar, N.: Introduction to Keras. In: Deep Learning with Python. Apress, Berkeley (2017)CrossRefGoogle Scholar
  11. 11.
    Naik, S., Patel, B.: Machine vision based fruit classification and grading. Int. J. Comput. Appl. (0975–8887) 170(9), 22–34 (2017)Google Scholar
  12. 12.
    Karol, G., Ivo, D., Alex, G., Daan, W.: DRAW: A Recurrent Neural Network For Image Generation (2015). Last accessed 14 Apr 2018
  13. 13.
    Su, B., Lu, S: Accurate scene text recognition based on recurrent neural network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds.) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science, vol. 9003. Springer, Cham (2015)Google Scholar
  14. 14.
    Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. JMLR (2003)Google Scholar
  15. 15.
    Socher, R., Fei-Fei, L.: Connecting modalities: semisupervised segmentation and annotation of images using unaligned text corpora. In: CVPR (2010)Google Scholar
  16. 16.
    Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2036–2043. IEEE (2009)Google Scholar
  17. 17.
    Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1–8. IEEE (2009)Google Scholar
  18. 18.
    Fidler, S., Sharma, A., Urtasun, R.: A sentence is worth a thousand pixels. In: CVPR (2013)Google Scholar
  19. 19.
    Li, L.-J., Fei-Fei, L.: What, where and who? Classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  20. 20.
    Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.Y.: Grounded compositional semantics for finding and describing images with sentences. TACL (2014)Google Scholar
  21. 21.
    Kuznetsova, P., Ordonez, V., Berg, T.L., Hill, U.C., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2(10), 351–362 (2014)Google Scholar
  22. 22.
    Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.-C.: I2t: image parsing to text description. Proc. IEEE 98(8), 1485–1508 (2010)CrossRefGoogle Scholar
  23. 23.
    Yatskar, M., Vanderwende, L., Zettlemoyer, L.: See no evil, say no evil: description generation from densely labelled images. Lex. Comput. Semant. (2016)Google Scholar
  24. 24.
    Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dollar, P., Zitnick, C.L.: Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
  25. 25.
    Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: NIPS (2013)Google Scholar
  26. 26.
    Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. arXiv preprint arXiv:1406.5679 (2014)
  27. 27.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  28. 28.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)Google Scholar
  29. 29.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv preprint arXiv:1409.4842 (2014)
  30. 30.
    Google image search.
  31. 31.
  32. 32.
  33. 33.
    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), 28 January, Issue 12, pp. 265–267 (2016)Google Scholar
  34. 34.
    Zhang, X.: Deep Learning - Michael Hahsler (2017). Last accessed 25 Apr 2018

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computing, School of Engineering, Computing, and Built EnvironmentKDU Penang University CollegeGeorge TownMalaysia

Personalised recommendations