Improving Self-supervised Dimensionality Reduction: Exploring Hyperparameters and Pseudo-Labeling Strategies

Oliveira, Artur André A. M.; Espadoto, Mateus; Hirata, Roberto; Hirata, Nina S. T.; Telea, Alexandru C.

doi:10.1007/978-3-031-25477-2_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1691))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

248 Accesses

Abstract

Dimensionality reduction (DR) is an essential tool for the visualization of high-dimensional data. The recently proposed Self-Supervised Network Projection (SSNP) method addresses DR with a number of attractive features, such as high computational scalability, genericity, stability and out-of-sample support, computation of an inverse mapping, and the ability of data clustering. Yet, SSNP has an involved computational pipeline using self-supervision based on labels produced by clustering methods and two separate deep learning networks with multiple hyperparameters. In this paper we explore the SSNP method in detail by studying its hyperparameter space and pseudo-labeling strategies. We show how these affect SSNP’s quality and how to set them to optimal values based on extensive evaluations involving multiple datasets, DR methods, and clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amorim, E., Brazil, E.V., Daniels, J., Joia, P., Nonato, L.G., Sousa, M.C.: iLAMP: exploring high-dimensional spacing through backward multidimensional projection. In: Proceedings of IEEE VAST, pp. 53–62 (2012)
Google Scholar
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: Bravo, J., Hervás, R., Rodríguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35395-6_30
Chapter Google Scholar
Becker, M., Lippel, J., Stuhlsatz, A., Zielke, T.: Robust dimensionality reduction for data visualization with deep neural networks. Graph. Models 108, 101060 (2020)
Article Google Scholar
Chan, D., Rao, R., Huang, F., Canny, J.: T-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. In: Proceedings of SBAC-PAD, pp. 330–338 (2018)
Google Scholar
Cunningham, J., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. JMLR 16, 2859–2900 (2015)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodological) 39(1), 1–22 (1977)
MATH Google Scholar
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Article MATH Google Scholar
Engel, D., Hattenberger, L., Hamann, B.: A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Proceedings of IRTG Workshop, vol. 27, pp. 135–149. Schloss Dagstuhl (2012)
Google Scholar
Espadoto, M., Falcao, A., Hirata, N., Telea, A.: Improving neural network-based multidimensional projections. In: Proceedings of IVAPP (2020)
Google Scholar
Espadoto, M., Hirata, N., Telea, A.: Deep learning multidimensional projections. J. Inf. Vis. (2020). https://doi.org/10.1177/1473871620909485
Espadoto, M., Hirata, N.S., Telea, A.C.: Self-supervised dimensionality reduction with neural networks and pseudo-labeling. In: Proceedings of IVAPP, pp. 27–37. SCITEPRESS (2021)
Google Scholar
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S., Telea, A.C.: Towards a quantitative survey of dimension reduction techniques. IEEE TVCG 27(3), 2153–2173 (2019)
Google Scholar
Espadoto, M., Rodrigues, F.C.M., Hirata, N.S.T., Hirata Jr., R., Telea, A.C.: Deep learning inverse multidimensional projections. In: Proceedings of EuroVA, Eurographics (2019)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Article Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MATH Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of AISTATS, pp. 249–256 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE ICCV, pp. 1026–1034 (2015)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MATH Google Scholar
Hoffman, P., Grinstein, G.: A survey of visualizations for high-dimensional data mining. Inf. Vis. Data Min. Knowl. Disc. 104, 47–82 (2002)
Google Scholar
Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE TVCG 17(12), 2563–2571 (2011)
Google Scholar
Jolliffe, I.T.: Principal component analysis and factor analysis. In: Principal Component Analysis, pp. 115–128. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8_7
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (2005)
Google Scholar
Kehrer, J., Hauser, H.: Visualization and visual analysis of multifaceted scientific data: a survey. IEEE TVCG 19(3), 495–513 (2013)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2013), eprint: 1312.6114
Google Scholar
Kohonen, T.: Self-organizing Maps. Springer, Berlin (1997). https://doi.org/10.1007/978-3-642-97966-8
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of NIPS, pp. 950–957 (1992)
Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digits dataset (2010). http://yann.lecun.com/exdb/mnist
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE TVCG 23(3), 1249–1268 (2015)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans Inf. Theor. 28(2), 129–137 (1982)
Article MATH Google Scholar
Maaten, L.V.D.: Barnes-hut-SNE. arXiv preprint arXiv:1301.3342 (2013)
Accelerating t-SNE using tree-based algorithms: Maaten, L.V.d. JMLR 15, 3221–3245 (2014)
Google Scholar
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)
Google Scholar
Maaten, L.V.d., Postma, E.: Dimensionality reduction: a comparative review. Technical Report, Tilburg University, Netherlands (2009)
Google Scholar
Martins, R.M., Minghim, R., Telea, A.C., et al.: Explaining neighborhood preservation for multidimensional projections. In: CGVC, pp. 7–14 (2015)
Google Scholar
McInnes, L., Healy, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426v1 [stat.ML] (2018)
Modrakowski, T.S., Espadoto, M., Falcão, A.X., Hirata, N.S.T., Telea, A.: Improving deep learning projections by neighborhood analysis. In: Bouatouch, K., et al. (eds.) VISIGRAPP 2020. CCIS, vol. 1474, pp. 127–152. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94893-1_6
Chapter Google Scholar
Nonato, L., Aupetit, M.: Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE TVCG (2018). https://doi.org/10.1109/TVCG.2018.2846735
Article Google Scholar
Paulovich, F.V., Minghim, R.: Text map explorer: a tool to create and explore document maps. In: Proceedings of International Conference on Information Visualisation (IV), pp. 245–251. IEEE (2006)
Google Scholar
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE TVCG 14(3), 564–575 (2008)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. (JMLR) 12, 2825–2830 (2011)
MATH Google Scholar
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann, E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum 35(3), 21–30 (2016)
Article Google Scholar
Pezzotti, N., Lelieveldt, B., Maaten, L.V.d., Höllt, T., Eisemann, E., Vilanova, A.: Approximated and user steerable t-SNE for progressive visual analytics. IEEE TVCG 23, 1739–1752 (2017)
Google Scholar
Pezzotti, N., et al.: GPGPU linear complexity t-SNE optimization. IEEE TVCG 26(1), 1172–1181 (2020)
Google Scholar
Roweis, S.T., Saul, L.L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE TPAMI 22(8), 888–905 (2000)
Article Google Scholar
Sorzano, C., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques (2014). arXiv:1403.2877 [stat.ML]
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Thoma, M.: The Reuters dataset, July 2017. https://martin-thoma.com/nlp-reuters
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, Hoboken (1958)
Google Scholar
Ulyanov, D.: Multicore-TSNE (2016). https://github.com/DmitryUlyanov/Multicore-TSNE
Venna, J., Kaski, S.: Visualizing gene interaction graphs with local multidimensional scaling. In: Proceedings of ESANN, pp. 557–562 (2006)
Google Scholar
Wattenberg, M.: How to use t-SNE effectively (2016). https://distill.pub/2016/misread-tsne
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747
Xie, H., Li, J., Xue, H.: A survey of dimensionality reduction techniques based on random projection (2017). arXiv:1706.04371 [cs.LG]
Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems (NIPS), pp. 1593–1600 (2007)
Google Scholar
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)
Article MATH Google Scholar

Download references

Acknowledgments

This study was financed in part by FAPESP grants 2015/22308-2, 2017/25835-9 and 2020/13275-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
Artur André A. M. Oliveira, Mateus Espadoto, Roberto Hirata Jr. & Nina S. T. Hirata
Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
Alexandru C. Telea

Authors

Artur André A. M. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Mateus Espadoto
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Hirata Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Nina S. T. Hirata
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru C. Telea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateus Espadoto .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
A. Augusto de Sousa
Czech Technical University in Prague, Prague, Czech Republic
Vlastimil Havran
Mines ParisTech, Paris, France
Alexis Paljic
Davidson College, Davidson, NC, USA
Tabitha Peck
French Civil Aviation University (ENAC), Toulouse, France
Christophe Hurter
Monash University, Melbourne, Australia
Helen Purchase
University of Catania, Catania, Italy
Giovanni Maria Farinella
University of Barcelona, Barcelona, Spain
Petia Radeva
IRISA, University of Rennes 1, Rennes, France
Kadi Bouatouch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliveira, A.A.A.M., Espadoto, M., Hirata, R., Hirata, N.S.T., Telea, A.C. (2023). Improving Self-supervised Dimensionality Reduction: Exploring Hyperparameters and Pseudo-Labeling Strategies. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021. Communications in Computer and Information Science, vol 1691. Springer, Cham. https://doi.org/10.1007/978-3-031-25477-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-25477-2_7
Published: 02 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25476-5
Online ISBN: 978-3-031-25477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Self-supervised Dimensionality Reduction: Exploring Hyperparameters and Pseudo-Labeling Strategies