TopSpin: TOPic Discovery via Sparse Principal Component INterference

Takáč, Martin; Ahipaşaoğlu, Selin Damla; Cheung, Ngai-Man; Richtárik, Peter

doi:10.1007/978-3-030-12119-8_8

Martin Takáč³,
Selin Damla Ahipaşaoğlu⁴,
Ngai-Man Cheung⁴ &
…
Peter Richtárik^5,6,7

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 279))

Included in the following conference series:

Modeling and Optimization: Theory and Applications

473 Accesses

Abstract

We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.

This work was partially supported by the U.S. National Science Foundation, under award number NSF:CCF:1618717, NSF:CMMI:1663256 and NSF:CCF:1740796.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Source code: https://code.google.com/p/24am/.
2.
A simple scaling argument shows that the solution must satisfy \(\Vert x\Vert _2= 1\).
3.
We use http://www.cs.princeton.edu/~blei/lda-c/ implementation for LDA.

References

Bart, E., Porteous, I., Perona, P., Welling, M.: Unsupervised learning of visual taxonomies. In: CVPR (2008)
Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS (2004)
Google Scholar
Blei, D.M., McAuliffe, J.: Supervised topic models. In: NIPS (2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). Mar
MATH Google Scholar
d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
MathSciNet MATH Google Scholar
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 48(3), 434–448 (2007)
Article MathSciNet Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories (2004)
Google Scholar
Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR (2006)
Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MathSciNet MATH Google Scholar
Kinnunen, T., Kamarainen, J.-K., Lensu, L., Kalviainen, H.: Unsupervised visual object categorisation via self-organisation. In: ICPR (2010)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: ICCV (1999)
Google Scholar
Mackey, L.: Deflation methods for sparse PCA. In: NIPS (2008)
Google Scholar
Naikal, N., Yang, A., Sastry, S.: Towards an efficient distributed object recognition system in wireless smart camera networks. In: International Conference on Information Fusion (2010)
Google Scholar
Naikal, N., Yang, A.Y., Shankar Sastry, S.: Informative feature selection for object recognition via sparse PCA. In: ICCV (2011)
Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In CVPR (2006)
Google Scholar
Richtárik, P., Takáč, M., Ahipasaoglu S.D.: Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes (2012). arXiv:1212.4137
Sivic, J., Russell, B.C., Zisserman, A., Freeman, W.T., Efros, A.A.: Unsupervised discovery of visual object class hierarchies. In: CVPR (2008)
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV (2003)
Google Scholar
Tuytelaars, T., Lampert, C.H., Blaschko, M.B., Buntine, W.: Unsupervised object discovery: a comparison. IJCV 88(2) (2010)
Google Scholar
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV (2007)
Google Scholar
Zhang, Y., Ghaoui, L.E.: Large–scale sparse principal component analysis with application to text data. In: NIPS (2011)
Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Technical report, Stanford University (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehigh University, Bethlehem, PA, 18015, USA
Martin Takáč
Singapore University of Technology and Design, Singapore, 487372, Singapore
Selin Damla Ahipaşaoğlu & Ngai-Man Cheung
King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
Peter Richtárik
University of Edinburgh, Edinburgh, UK
Peter Richtárik
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Peter Richtárik

Authors

Martin Takáč
View author publications
You can also search for this author in PubMed Google Scholar
Selin Damla Ahipaşaoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Ngai-Man Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Peter Richtárik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Takáč .

Editor information

Editors and Affiliations

Dept of Industrial & Systems Engineering, Lehigh University, Bethlehem, PA, USA
János D. Pintér
Dept of Industrial & Systems Engineering, Lehigh University, Bethlehem, PA, USA
Tamás Terlaky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takáč, M., Ahipaşaoğlu, S.D., Cheung, NM., Richtárik, P. (2019). TopSpin: TOPic Discovery via Sparse Principal Component INterference. In: Pintér, J.D., Terlaky, T. (eds) Modeling and Optimization: Theory and Applications. MOPTA 2017. Springer Proceedings in Mathematics & Statistics, vol 279. Springer, Cham. https://doi.org/10.1007/978-3-030-12119-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-12119-8_8
Published: 15 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12118-1
Online ISBN: 978-3-030-12119-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics