Abstract
Dimensionality reduction (DR) is an essential tool for the visualization of high-dimensional data. The recently proposed Self-Supervised Network Projection (SSNP) method addresses DR with a number of attractive features, such as high computational scalability, genericity, stability and out-of-sample support, computation of an inverse mapping, and the ability of data clustering. Yet, SSNP has an involved computational pipeline using self-supervision based on labels produced by clustering methods and two separate deep learning networks with multiple hyperparameters. In this paper we explore the SSNP method in detail by studying its hyperparameter space and pseudo-labeling strategies. We show how these affect SSNP’s quality and how to set them to optimal values based on extensive evaluations involving multiple datasets, DR methods, and clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amorim, E., Brazil, E.V., Daniels, J., Joia, P., Nonato, L.G., Sousa, M.C.: iLAMP: exploring high-dimensional spacing through backward multidimensional projection. In: Proceedings of IEEE VAST, pp. 53–62 (2012)
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: Bravo, J., Hervás, R., Rodríguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35395-6_30
Becker, M., Lippel, J., Stuhlsatz, A., Zielke, T.: Robust dimensionality reduction for data visualization with deep neural networks. Graph. Models 108, 101060 (2020)
Chan, D., Rao, R., Huang, F., Canny, J.: T-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. In: Proceedings of SBAC-PAD, pp. 330–338 (2018)
Cunningham, J., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. JMLR 16, 2859–2900 (2015)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodological) 39(1), 1–22 (1977)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Engel, D., Hattenberger, L., Hamann, B.: A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Proceedings of IRTG Workshop, vol. 27, pp. 135–149. Schloss Dagstuhl (2012)
Espadoto, M., Falcao, A., Hirata, N., Telea, A.: Improving neural network-based multidimensional projections. In: Proceedings of IVAPP (2020)
Espadoto, M., Hirata, N., Telea, A.: Deep learning multidimensional projections. J. Inf. Vis. (2020). https://doi.org/10.1177/1473871620909485
Espadoto, M., Hirata, N.S., Telea, A.C.: Self-supervised dimensionality reduction with neural networks and pseudo-labeling. In: Proceedings of IVAPP, pp. 27–37. SCITEPRESS (2021)
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S., Telea, A.C.: Towards a quantitative survey of dimension reduction techniques. IEEE TVCG 27(3), 2153–2173 (2019)
Espadoto, M., Rodrigues, F.C.M., Hirata, N.S.T., Hirata Jr., R., Telea, A.C.: Deep learning inverse multidimensional projections. In: Proceedings of EuroVA, Eurographics (2019)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, vol. 96, pp. 226–231 (1996)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of AISTATS, pp. 249–256 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE ICCV, pp. 1026–1034 (2015)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hoffman, P., Grinstein, G.: A survey of visualizations for high-dimensional data mining. Inf. Vis. Data Min. Knowl. Disc. 104, 47–82 (2002)
Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE TVCG 17(12), 2563–2571 (2011)
Jolliffe, I.T.: Principal component analysis and factor analysis. In: Principal Component Analysis, pp. 115–128. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8_7
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (2005)
Kehrer, J., Hauser, H.: Visualization and visual analysis of multifaceted scientific data: a survey. IEEE TVCG 19(3), 495–513 (2013)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2013), eprint: 1312.6114
Kohonen, T.: Self-organizing Maps. Springer, Berlin (1997). https://doi.org/10.1007/978-3-642-97966-8
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of NIPS, pp. 950–957 (1992)
LeCun, Y., Cortes, C.: MNIST handwritten digits dataset (2010). http://yann.lecun.com/exdb/mnist
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE TVCG 23(3), 1249–1268 (2015)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans Inf. Theor. 28(2), 129–137 (1982)
Maaten, L.V.D.: Barnes-hut-SNE. arXiv preprint arXiv:1301.3342 (2013)
Accelerating t-SNE using tree-based algorithms: Maaten, L.V.d. JMLR 15, 3221–3245 (2014)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)
Maaten, L.V.d., Postma, E.: Dimensionality reduction: a comparative review. Technical Report, Tilburg University, Netherlands (2009)
Martins, R.M., Minghim, R., Telea, A.C., et al.: Explaining neighborhood preservation for multidimensional projections. In: CGVC, pp. 7–14 (2015)
McInnes, L., Healy, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426v1 [stat.ML] (2018)
Modrakowski, T.S., Espadoto, M., Falcão, A.X., Hirata, N.S.T., Telea, A.: Improving deep learning projections by neighborhood analysis. In: Bouatouch, K., et al. (eds.) VISIGRAPP 2020. CCIS, vol. 1474, pp. 127–152. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94893-1_6
Nonato, L., Aupetit, M.: Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE TVCG (2018). https://doi.org/10.1109/TVCG.2018.2846735
Paulovich, F.V., Minghim, R.: Text map explorer: a tool to create and explore document maps. In: Proceedings of International Conference on Information Visualisation (IV), pp. 245–251. IEEE (2006)
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE TVCG 14(3), 564–575 (2008)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. (JMLR) 12, 2825–2830 (2011)
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann, E., Vilanova, A.: Hierarchical stochastic neighbor embedding. Comput. Graph. Forum 35(3), 21–30 (2016)
Pezzotti, N., Lelieveldt, B., Maaten, L.V.d., Höllt, T., Eisemann, E., Vilanova, A.: Approximated and user steerable t-SNE for progressive visual analytics. IEEE TVCG 23, 1739–1752 (2017)
Pezzotti, N., et al.: GPGPU linear complexity t-SNE optimization. IEEE TVCG 26(1), 1172–1181 (2020)
Roweis, S.T., Saul, L.L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE TPAMI 22(8), 888–905 (2000)
Sorzano, C., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques (2014). arXiv:1403.2877 [stat.ML]
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Thoma, M.: The Reuters dataset, July 2017. https://martin-thoma.com/nlp-reuters
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, Hoboken (1958)
Ulyanov, D.: Multicore-TSNE (2016). https://github.com/DmitryUlyanov/Multicore-TSNE
Venna, J., Kaski, S.: Visualizing gene interaction graphs with local multidimensional scaling. In: Proceedings of ESANN, pp. 557–562 (2006)
Wattenberg, M.: How to use t-SNE effectively (2016). https://distill.pub/2016/misread-tsne
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747
Xie, H., Li, J., Xue, H.: A survey of dimensionality reduction techniques based on random projection (2017). arXiv:1706.04371 [cs.LG]
Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems (NIPS), pp. 1593–1600 (2007)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)
Acknowledgments
This study was financed in part by FAPESP grants 2015/22308-2, 2017/25835-9 and 2020/13275-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, A.A.A.M., Espadoto, M., Hirata, R., Hirata, N.S.T., Telea, A.C. (2023). Improving Self-supervised Dimensionality Reduction: Exploring Hyperparameters and Pseudo-Labeling Strategies. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021. Communications in Computer and Information Science, vol 1691. Springer, Cham. https://doi.org/10.1007/978-3-031-25477-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-25477-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25476-5
Online ISBN: 978-3-031-25477-2
eBook Packages: Computer ScienceComputer Science (R0)