Advertisement

Latent Gaussian-Multinomial Generative Model for Annotated Data

  • Shuoran Jiang
  • Yarui ChenEmail author
  • Zhifei Qin
  • Jucheng Yang
  • Tingting Zhao
  • Chuanlei Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)

Abstract

Traditional generative models annotate images by multiple instances independently segmented, but these models have been becoming prohibitively expensive and time-consuming along with the growth of Internet data. Focusing on the annotated data, we propose a latent Gaussian-Multinomial generative model (LGMG), which generates the image-annotations using a multimodal probabilistic models. Specifically, we use a continuous latent variable with prior of Normal distribution as the latent representation summarizing the high-level semantics of images, and a discrete latent variable with prior of Multinomial distribution as the topics indicator for annotation. We compute the variational posteriors from a mapping structure among latent representation, topics indicator and image-annotation. The stochastic gradient variational Bayes estimator on variational objective is realized by combining the reparameterization trick and Monte Carlo estimator. Finally, we demonstrate the performance of LGMG on LabelMe in terms of held-out likelihood, automatic image annotation with the state-of-the-art models.

Keywords

Annotated data Gaussian-Multinomial Multimodal generative models Latent representation Topics indicator 

Notes

Ackonwledgement

This work has been partly supported by National Natural Science Foundation of China (61402332, 61502339, 61502338, 61402331); Tianjin Municipal Science and Technology Commission (17JCQNJC00400, 18JCZDJC32100); the Foundation of Tianjin University of Science and Technology (2017LG10); the Key Laboratory of food safety intelligent monitoring technology, China Light Industry; research Plan Project of Tianjin Municipal Education Commission (2017KJ034, 2017KJ035, 2018KJ106).

References

  1. 1.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM (2003).  https://doi.org/10.1145/860435.860459
  2. 2.
    Barnard, K., Duygulu, P., Forsyth, D., et al.: Matching words and pictures. J. Mach. Learn. Res. 3(Feb), 1107–1135 (2003)zbMATHGoogle Scholar
  3. 3.
    Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 127–134. ACM (2003).  https://doi.org/10.1145/860435.860460
  4. 4.
    Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation (2010).  https://doi.org/10.1109/CVPR.2010.5540000
  5. 5.
    Huang, S.J., Gao, W., Zhou, Z.H.: Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell.(2018)Google Scholar
  6. 6.
    Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603–606. ACM (2015).  https://doi.org/10.1145/2671188.2749391
  7. 7.
    Wu, J., Yu, Y., Huang, C., et al.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015).  https://doi.org/10.1109/CVPR.2015.7298968
  8. 8.
    Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_50CrossRefGoogle Scholar
  9. 9.
    Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning, pp. 1791–1799 (2014)Google Scholar
  10. 10.
    Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. Stat 1050:1 (2014)Google Scholar
  11. 11.
    Doersch, C.: Tutorial on variational autoencoders. Stat 1050:13 (2016)Google Scholar
  12. 12.
    Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008).  https://doi.org/10.1007/s11263-007-0090-8CrossRefGoogle Scholar
  13. 13.
    Uricchio, T., Ballan, L., Seidenari, L., et al.: Automatic image annotation via label transfer in the semantic space. Pattern Recogn. 71, 144–157 (2017).  https://doi.org/10.1016/j.patcog.2017.05.019CrossRefGoogle Scholar
  14. 14.
    Kumar, R.: Natural language processing. In: Machine Learning and Cognition in Enterprises, pp. 65–73. Apress, Berkeley (2017).  https://doi.org/10.1007/978-1-4842-3069-5_5
  15. 15.
    Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  16. 16.
    Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017).  https://doi.org/10.1080/01621459.2017.1285773MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kingma, D.P.: Variational inference & deep learning: a new synthesis (2017)Google Scholar
  18. 18.
    Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35289-8_25CrossRefGoogle Scholar
  19. 19.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012).  https://doi.org/10.1145/2133806.2133826CrossRefGoogle Scholar
  20. 20.
    van Ravenzwaaij, D., Cassey, P., Brown, S.D.: A simple introduction to Markov Chain Monte-Carlo sampling. Psychon. Bull. Rev. 25(1), 143–154 (2018)CrossRefGoogle Scholar
  21. 21.
    Pu, Y., Gan, Z., Henao, R., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016).  https://doi.org/10.3758/s13423-016-1015-8
  22. 22.
    Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), p. 5 (2015)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012).  https://doi.org/10.1145/3065386

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Shuoran Jiang
    • 1
  • Yarui Chen
    • 1
    Email author
  • Zhifei Qin
    • 1
  • Jucheng Yang
    • 1
  • Tingting Zhao
    • 1
  • Chuanlei Zhang
    • 1
  1. 1.Tianjin University of Science and TechnologyTianjinChina

Personalised recommendations