Skip to main content

Latent Gaussian-Multinomial Generative Model for Annotated Data

  • Conference paper
  • First Online:
  • 2685 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Abstract

Traditional generative models annotate images by multiple instances independently segmented, but these models have been becoming prohibitively expensive and time-consuming along with the growth of Internet data. Focusing on the annotated data, we propose a latent Gaussian-Multinomial generative model (LGMG), which generates the image-annotations using a multimodal probabilistic models. Specifically, we use a continuous latent variable with prior of Normal distribution as the latent representation summarizing the high-level semantics of images, and a discrete latent variable with prior of Multinomial distribution as the topics indicator for annotation. We compute the variational posteriors from a mapping structure among latent representation, topics indicator and image-annotation. The stochastic gradient variational Bayes estimator on variational objective is realized by combining the reparameterization trick and Monte Carlo estimator. Finally, we demonstrate the performance of LGMG on LabelMe in terms of held-out likelihood, automatic image annotation with the state-of-the-art models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM (2003). https://doi.org/10.1145/860435.860459

  2. Barnard, K., Duygulu, P., Forsyth, D., et al.: Matching words and pictures. J. Mach. Learn. Res. 3(Feb), 1107–1135 (2003)

    MATH  Google Scholar 

  3. Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 127–134. ACM (2003). https://doi.org/10.1145/860435.860460

  4. Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation (2010). https://doi.org/10.1109/CVPR.2010.5540000

  5. Huang, S.J., Gao, W., Zhou, Z.H.: Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell.(2018)

    Google Scholar 

  6. Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603–606. ACM (2015). https://doi.org/10.1145/2671188.2749391

  7. Wu, J., Yu, Y., Huang, C., et al.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015). https://doi.org/10.1109/CVPR.2015.7298968

  8. Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_50

    Chapter  Google Scholar 

  9. Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning, pp. 1791–1799 (2014)

    Google Scholar 

  10. Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. Stat 1050:1 (2014)

    Google Scholar 

  11. Doersch, C.: Tutorial on variational autoencoders. Stat 1050:13 (2016)

    Google Scholar 

  12. Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8

    Article  Google Scholar 

  13. Uricchio, T., Ballan, L., Seidenari, L., et al.: Automatic image annotation via label transfer in the semantic space. Pattern Recogn. 71, 144–157 (2017). https://doi.org/10.1016/j.patcog.2017.05.019

    Article  Google Scholar 

  14. Kumar, R.: Natural language processing. In: Machine Learning and Cognition in Enterprises, pp. 65–73. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-3069-5_5

  15. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  16. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017). https://doi.org/10.1080/01621459.2017.1285773

    Article  MathSciNet  Google Scholar 

  17. Kingma, D.P.: Variational inference & deep learning: a new synthesis (2017)

    Google Scholar 

  18. Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_25

    Chapter  Google Scholar 

  19. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826

    Article  Google Scholar 

  20. van Ravenzwaaij, D., Cassey, P., Brown, S.D.: A simple introduction to Markov Chain Monte-Carlo sampling. Psychon. Bull. Rev. 25(1), 143–154 (2018)

    Article  Google Scholar 

  21. Pu, Y., Gan, Z., Henao, R., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016). https://doi.org/10.3758/s13423-016-1015-8

  22. Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), p. 5 (2015)

    Google Scholar 

  23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012). https://doi.org/10.1145/3065386

Download references

Ackonwledgement

This work has been partly supported by National Natural Science Foundation of China (61402332, 61502339, 61502338, 61402331); Tianjin Municipal Science and Technology Commission (17JCQNJC00400, 18JCZDJC32100); the Foundation of Tianjin University of Science and Technology (2017LG10); the Key Laboratory of food safety intelligent monitoring technology, China Light Industry; research Plan Project of Tianjin Municipal Education Commission (2017KJ034, 2017KJ035, 2018KJ106).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yarui Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, S., Chen, Y., Qin, Z., Yang, J., Zhao, T., Zhang, C. (2019). Latent Gaussian-Multinomial Generative Model for Annotated Data. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16148-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16147-7

  • Online ISBN: 978-3-030-16148-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics