Latent Gaussian-Multinomial Generative Model for Annotated Data

Jiang, Shuoran; Chen, Yarui; Qin, Zhifei; Yang, Jucheng; Zhao, Tingting; Zhang, Chuanlei

doi:10.1007/978-3-030-16148-4_4

Latent Gaussian-Multinomial Generative Model for Annotated Data

Shuoran Jiang¹⁹,
Yarui Chen¹⁹,
Zhifei Qin¹⁹,
Jucheng Yang¹⁹,
Tingting Zhao¹⁹ &
…
Chuanlei Zhang¹⁹

Conference paper
First Online: 22 March 2019

2685 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Abstract

Traditional generative models annotate images by multiple instances independently segmented, but these models have been becoming prohibitively expensive and time-consuming along with the growth of Internet data. Focusing on the annotated data, we propose a latent Gaussian-Multinomial generative model (LGMG), which generates the image-annotations using a multimodal probabilistic models. Specifically, we use a continuous latent variable with prior of Normal distribution as the latent representation summarizing the high-level semantics of images, and a discrete latent variable with prior of Multinomial distribution as the topics indicator for annotation. We compute the variational posteriors from a mapping structure among latent representation, topics indicator and image-annotation. The stochastic gradient variational Bayes estimator on variational objective is realized by combining the reparameterization trick and Monte Carlo estimator. Finally, we demonstrate the performance of LGMG on LabelMe in terms of held-out likelihood, automatic image annotation with the state-of-the-art models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM (2003). https://doi.org/10.1145/860435.860459
Barnard, K., Duygulu, P., Forsyth, D., et al.: Matching words and pictures. J. Mach. Learn. Res. 3(Feb), 1107–1135 (2003)
MATH Google Scholar
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 127–134. ACM (2003). https://doi.org/10.1145/860435.860460
Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation (2010). https://doi.org/10.1109/CVPR.2010.5540000
Huang, S.J., Gao, W., Zhou, Z.H.: Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell.(2018)
Google Scholar
Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603–606. ACM (2015). https://doi.org/10.1145/2671188.2749391
Wu, J., Yu, Y., Huang, C., et al.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015). https://doi.org/10.1109/CVPR.2015.7298968
Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_50
Chapter Google Scholar
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning, pp. 1791–1799 (2014)
Google Scholar
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. Stat 1050:1 (2014)
Google Scholar
Doersch, C.: Tutorial on variational autoencoders. Stat 1050:13 (2016)
Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8
Article Google Scholar
Uricchio, T., Ballan, L., Seidenari, L., et al.: Automatic image annotation via label transfer in the semantic space. Pattern Recogn. 71, 144–157 (2017). https://doi.org/10.1016/j.patcog.2017.05.019
Article Google Scholar
Kumar, R.: Natural language processing. In: Machine Learning and Cognition in Enterprises, pp. 65–73. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-3069-5_5
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017). https://doi.org/10.1080/01621459.2017.1285773
Article MathSciNet Google Scholar
Kingma, D.P.: Variational inference & deep learning: a new synthesis (2017)
Google Scholar
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_25
Chapter Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Article Google Scholar
van Ravenzwaaij, D., Cassey, P., Brown, S.D.: A simple introduction to Markov Chain Monte-Carlo sampling. Psychon. Bull. Rev. 25(1), 143–154 (2018)
Article Google Scholar
Pu, Y., Gan, Z., Henao, R., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016). https://doi.org/10.3758/s13423-016-1015-8
Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), p. 5 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012). https://doi.org/10.1145/3065386

Download references

Ackonwledgement

This work has been partly supported by National Natural Science Foundation of China (61402332, 61502339, 61502338, 61402331); Tianjin Municipal Science and Technology Commission (17JCQNJC00400, 18JCZDJC32100); the Foundation of Tianjin University of Science and Technology (2017LG10); the Key Laboratory of food safety intelligent monitoring technology, China Light Industry; research Plan Project of Tianjin Municipal Education Commission (2017KJ034, 2017KJ035, 2018KJ106).

Author information

Authors and Affiliations

Tianjin University of Science and Technology, Tianjin, 300457, China
Shuoran Jiang, Yarui Chen, Zhifei Qin, Jucheng Yang, Tingting Zhao & Chuanlei Zhang

Authors

Shuoran Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yarui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhifei Qin
View author publications
You can also search for this author in PubMed Google Scholar
Jucheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chuanlei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yarui Chen .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, S., Chen, Y., Qin, Z., Yang, J., Zhao, T., Zhang, C. (2019). Latent Gaussian-Multinomial Generative Model for Annotated Data. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-16148-4_4
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics