Abstract
Traditional generative models annotate images by multiple instances independently segmented, but these models have been becoming prohibitively expensive and time-consuming along with the growth of Internet data. Focusing on the annotated data, we propose a latent Gaussian-Multinomial generative model (LGMG), which generates the image-annotations using a multimodal probabilistic models. Specifically, we use a continuous latent variable with prior of Normal distribution as the latent representation summarizing the high-level semantics of images, and a discrete latent variable with prior of Multinomial distribution as the topics indicator for annotation. We compute the variational posteriors from a mapping structure among latent representation, topics indicator and image-annotation. The stochastic gradient variational Bayes estimator on variational objective is realized by combining the reparameterization trick and Monte Carlo estimator. Finally, we demonstrate the performance of LGMG on LabelMe in terms of held-out likelihood, automatic image annotation with the state-of-the-art models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM (2003). https://doi.org/10.1145/860435.860459
Barnard, K., Duygulu, P., Forsyth, D., et al.: Matching words and pictures. J. Mach. Learn. Res. 3(Feb), 1107–1135 (2003)
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 127–134. ACM (2003). https://doi.org/10.1145/860435.860460
Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation (2010). https://doi.org/10.1109/CVPR.2010.5540000
Huang, S.J., Gao, W., Zhou, Z.H.: Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell.(2018)
Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 603–606. ACM (2015). https://doi.org/10.1145/2671188.2749391
Wu, J., Yu, Y., Huang, C., et al.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015). https://doi.org/10.1109/CVPR.2015.7298968
Lev, G., Sadeh, G., Klein, B., Wolf, L.: RNN fisher vectors for action recognition and image annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 833–850. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_50
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning, pp. 1791–1799 (2014)
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. Stat 1050:1 (2014)
Doersch, C.: Tutorial on variational autoencoders. Stat 1050:13 (2016)
Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8
Uricchio, T., Ballan, L., Seidenari, L., et al.: Automatic image annotation via label transfer in the semantic space. Pattern Recogn. 71, 144–157 (2017). https://doi.org/10.1016/j.patcog.2017.05.019
Kumar, R.: Natural language processing. In: Machine Learning and Cognition in Enterprises, pp. 65–73. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-3069-5_5
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017). https://doi.org/10.1080/01621459.2017.1285773
Kingma, D.P.: Variational inference & deep learning: a new synthesis (2017)
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_25
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
van Ravenzwaaij, D., Cassey, P., Brown, S.D.: A simple introduction to Markov Chain Monte-Carlo sampling. Psychon. Bull. Rev. 25(1), 143–154 (2018)
Pu, Y., Gan, Z., Henao, R., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016). https://doi.org/10.3758/s13423-016-1015-8
Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), p. 5 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012). https://doi.org/10.1145/3065386
Ackonwledgement
This work has been partly supported by National Natural Science Foundation of China (61402332, 61502339, 61502338, 61402331); Tianjin Municipal Science and Technology Commission (17JCQNJC00400, 18JCZDJC32100); the Foundation of Tianjin University of Science and Technology (2017LG10); the Key Laboratory of food safety intelligent monitoring technology, China Light Industry; research Plan Project of Tianjin Municipal Education Commission (2017KJ034, 2017KJ035, 2018KJ106).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, S., Chen, Y., Qin, Z., Yang, J., Zhao, T., Zhang, C. (2019). Latent Gaussian-Multinomial Generative Model for Annotated Data. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-16148-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)