Abstract
In this paper we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value, as convex objects. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective is the difference of two convex functions (DC). Suitable DC decompositions allow us to use the Difference of Convex Algorithm (DCA) in a very efficient way. Our algorithmic approach is used to visualize two real-world datasets.
Similar content being viewed by others
References
Abdi, H., Williams, L.J., Valentin, D., Bennani-Dosse, M.: STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. Wiley Interdiscip. Rev. Comput. Stat. 4(2), 124–167 (2012)
Blanquero, R., Carrizosa, E.: Continuous location problems and big triangle small triangle: constructing better bounds. J. Glob. Optim. 45(3), 389–402 (2009)
Blanquero, R., Carrizosa, E., Hansen, P.: Locating objects in the plane using global optimization techniques. Math. Oper. Res. 34(4), 837–858 (2009)
Bomze, I.M., Locatelli, M., Tardella, F.: New and old bounds for standard quadratic optimization: dominance, equivalence and incomparability. Math. Program. 115(1), 31–64 (2008)
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin (2005)
Buchin, K., Speckmann, B., Verdonschot, S.: Evolution strategies for optimizing rectangular cartograms. In: Xiao, N., Kwan, M.-P., Goodchild, M.F., Shekhar, S. (eds.) Geographic Information Science, Volume 7478 of Lecture Notes in Computer Science, pp. 29–42. Springer (2012)
Cameron, S., Culley, R.: Determining the minimum translational distance between two convex polyhedra. IEEE Int. Conf. Robot. Autom. 3, 591–596 (1986)
Carrizosa, E., Conde, E., Muñoz-Márquez, M., Puerto, J.: The generalized Weber problem with expected distances. Revue française d’automatique, d’informatique et de recherche opérationnelle. Recherche opérationnelle 29(1), 35–57 (1995)
Carrizosa, E., Dražić, M., Dražić, Z., Mladenović, N.: Gaussian variable neighborhood search for continuous optimization. Comput. Oper. Res. 39(9), 2206–2213 (2012)
Carrizosa, E., Guerrero, V.: Biobjective sparse principal component analysis. J. Multivar. Anal. 132, 151–159 (2014)
Carrizosa, E., Guerrero, V.: rs-Sparse principal component analysis: a mixed integer nonlinear programming approach with VNS. Comput. Oper. Res. 52, 349–354 (2014)
Carrizosa, E., Guerrero, V., Romero Morales, D.: A multi-objective approach to visualize adjacencies in weighted graphs by rectangular maps. Technical report, Optimization Online (2015). http://www.optimization-online.org/DB_HTML/2015/12/5226.html
Carrizosa, E., Guerrero, V., Romero Morales, D.: Visualizing proportions and dissimilarities by space-filling maps: a large neighborhood search approach. Comput. Oper. Res. 78, 369–380 (2017)
Carrizosa, E., Martín-Barragán, B., Plastria, F., Romero Morales, D.: On the selection of the globally optimal prototype subset for nearest-neighbor classification. INFORMS J. Comput. 19(3), 470–479 (2007)
Carrizosa, E., Muñoz-Márquez, M., Puerto, J.: Location and shape of a rectangular facility in \({\mathbb{R}}^n\). Convexity properties. Math. Program. 83(1–3), 277–290 (1998)
Carrizosa, E., Muñoz-Márquez, M., Puerto, J.: The weber problem with regional demand. Eur. J. Oper. Res. 104(2), 358–365 (1998)
Carrizosa, E., Romero Morales, D.: Supervised classification and mathematical optimization. Comput. Oper. Res. 40(1), 150–165 (2013)
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Choo, J., Park, H.: Customizing computational methods for visual analytics with big data. IEEE Comput. Gr. Appl. 33(4), 22–28 (2013)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. CRC Press, Boca Raton (2000)
De Leeuw, J., Heiser, W.J.: Convergence of correction matrix algorithms for multidimensional scaling. In: Lingoes, J.C., Roskam, E.E., Borg, I. (eds.) Geometric Representations of Relational Data, pp. 735–752. Mathesis Press, Ann Arbor (1977)
De Silva, V., Tenenbaum, J.B.: Sparse Multidimensional Scaling Using Landmark Points. Technical report, Stanford University (2004)
Díaz-Báñez, J.M., Mesa, J.A., Schöbel, A.: Continuous location of dimensional structures. Eur. J. Oper. Res. 152(1), 22–44 (2004)
Dörk, M., Carpendale, S., Williamson, C.: Visualizing explicit and implicit relations of complex information spaces. Inf. Vis. 11(1), 5–21 (2012)
Dorling, D.: Area cartograms: their use and creation. Concepts and Techniques in Modern Geography Series No. 59. University of East Anglia: Environmental Publications, UK (1996)
Ehrgott, M.: A discussion of scalarization techniques for multiple objective integer programming. Ann. Oper. Res. 147(1), 343–360 (2006)
Elkeran, A.: A new approach for sheet nesting problem using guided cuckoo search and pairwise clustering. Eur. J. Oper. Res. 231(3), 757–769 (2013)
Ferrer, A., Martínez-Legaz, J.E.: Improving the efficiency of DC global optimization methods by improving the DC representation of the objective function. J. Glob. Optim. 43(4), 513–531 (2009)
Flavin, T., Hurley, M., Rousseau, F.: Explaining stock market correlation: a gravity model approach. Manch. Sch. 70, 87–106 (2002)
Fountoulakis, K., Gondzio, J.: Performance of first- and second-order methods for \(\ell _1\)-regularized least squares problems. Comput. Optim. Appl. 65(3), 605–635 (2016)
Fountoulakis, K., Gondzio, J.: A second-order method for strongly convex \(\ell _1\)-regularization problems. Math. Program. 156(1), 189–219 (2016)
Gomez-Nieto, E., San Roman, F., Pagliosa, P., Casaca, W., Helou, E.S., de Oliveira, M.C.F., Nonato, L.G.: Similarity preserving snippet-based visualization of web search results. IEEE Trans. Vis. Comput. Gr. 20(3), 457–470 (2014)
Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53(3–4), 325–338 (1966)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
Heilmann, R., Keim, D.A., Panse, C., Sips, M.: Recmap: Rectangular map approximations. In: Proceedings of the IEEE Symposium on Information Visualization, pp. 33–40. IEEE Computer Society (2004)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms. Springer, Berlin (1993)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Koshizuka, T., Kurita, O.: Approximate formulas of average distances associated with regions and their applications to location problems. Math. Program. 52(1–3), 99–123 (1991)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Le Thi, H.A., Pham Dinh, T.: D.C. programming approach to the multidimensional scaling problem. In: Migdalas, A., Pardalos, P.M., Värbrand, P. (eds.) From Local to Global Optimization, Volume 53 of Nonconvex Optimizations and Its Applications, pp. 231–276. Springer, Berlin (2001)
Le Thi, H.A., Pham Dinh, T.: DC programming approaches for distance geometry problems. In: Mucherino, A., Lavor, C., Liberti, L., Maculan, N. (eds.) Distance Geometry, pp. 225–290. Springer, Berlin (2013)
Le Thi, H.A.: An efficient algorithm for globally minimizing a quadratic function under convex quadratic constraints. Math. Program. 87, 401–426 (2000)
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)
Lin, M.C., Manocha, D.: Collision and proximity queries. In: O’Rourke, J., Goodman, E. (eds.) Handbook of Discrete and Computational Geometry. CRC Press, Boca Rotan (2004)
Liu, S., Cui, W., Wu, Y., Liu, M.: A survey on information visualization: recent advances and challenges. Vis. Comput. 30(12), 1373–1393 (2014)
Mladenović, N., Dražić, M., Kovačevic-Vujčić, V., Čangalović, M.: General variable neighborhood search for the continuous optimization. Eur. J. Oper. Res. 191(3), 753–770 (2008)
Olafsson, S., Li, X., Wu, S.: Operations research and data mining. Eur. J. Oper. Res. 187(3), 1429–1448 (2008)
Ong, C.J., Gilbert, E.G.: Growth distances: new measures for object separation and penetration. IEEE Trans. Robot. Autom. 12(6), 888–903 (1996)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572 (1901)
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Pham Dinh, T., Le Thi, H.A.: A branch-and-bound method via DC optimization algorithm and ellipsoidal technique for box constrained nonconvex quadratic programming problems. J. Glob. Optim. 13, 171–206 (1998)
Pong, T.K., Tseng, P.: (Robust) edge-based semidefinite programming relaxation of sensor network localization. Math. Program. 130(2), 321–358 (2011)
Rabello, R.L., Mauri, G.R., Ribeiro, G.M., Lorena, L.A.N.: A clustering search metaheuristic for the point-feature cartographic label placement problem. Eur. J. Oper. Res. 234(3), 802–808 (2014)
So, A.M.-C., Ye, Y.: Theory of semidefinite programming for sensor network localization. Math. Program. 109(2–3), 367–384 (2007)
Speckmann, B., van Kreveld, M., Florisson, S.: A linear programming approach to rectangular cartograms. In: Proceedings of the 12th International Symposium on Spatial Data Handling, pp. 527–546. Springer (2006)
Thomas, J., Wong, P.C.: Visual analytics. IEEE Comput. Gr. Appl. 24(5), 20–21 (2004)
Tobler, W.: Thirty five years of computer cartograms. Ann. Assoc. Am. Geogr. 94(1), 58–73 (2004)
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958)
Trosset, M.W.: Extensions of classical multidimensional scaling via variable reduction. Comput. Stat. 17, 147–163 (2002)
Tseng, P.: Second-order cone programming relaxation of sensor network localization. SIAM J. Optim. 18(1), 156–185 (2007)
Tuy, H.: Convex Analysis and Global Optimization. Kluwer Academic Publishers, Dordrecht (1998)
Umetani, S., Yagiura, M., Imahori, S., Imamichi, T., Nonobe, K., Ibaraki, T.: Solving the irregular strip packing problem via guided local search for overlap minimization. Int. Trans. Oper. Res. 16(6), 661–683 (2009)
Vaughan, R.: Approximate formulas for average distances associated with zones. Transp. Sci. 18(3), 231–244 (1984)
Wang, Z., Zheng, S., Ye, Y., Boyd, S.: Further relaxations of the semidefinite programming approach to sensor network localization. SIAM J. Optim. 19(2), 655–673 (2008)
Acknowledgements
We thank the reviewers for their helpful suggestions and comments, which have been very valuable to strengthen the paper and to improve its quality.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is funded in part by Project MTM2015-65915-R (Spain), P11-FQM-7603 and FQM-329 (Andalucía), all with EU ERD Funds, and VPPI-US from the University of Seville.
Appendix
Appendix
1.1 Proof of Proposition 1
In Sect. 3, the convexity of the function \(g_{ij}\) was stated. Moreover, since \(g_{ij}\), \(\lambda \), \(\delta _{ij}\ge 0\), then \(g_{ij}^2(\varvec{c}_i,\varvec{c}_j,\tau )\), \(2\lambda \kappa ^2\delta ^2_{ij}\) and \((g_{ij}(\varvec{c}_i,\varvec{c}_j,\tau )+\kappa \delta _{ij})^2\) are convex. Finally, \((3\lambda -1)g_{ij}^2(\varvec{c}_i,\varvec{c}_j,\tau )\) is convex for \(3\lambda -1\ge 0\) and concave otherwise. \(\square \)
1.2 Proof of Proposition 2
For convex sets \(A_1\) and \(A_2\) with nonempty interior, the condition in the definition of penetration depth stated in Sect. 2.2 is equivalent to the existence of a separating hyperplane between the sets \({\varvec{p}}+A_1\) and \(A_2,\) i.e., of some \({\varvec{\xi }}\ne 0,\) such that
Without loss of generality, we can consider \(\Vert {\varvec{\xi }}\Vert = 1\) and thus we have
Thus, \(h_{ij}\) can be written as follows
Equivalently, the first constraint, i.e.,
can be written as follows,
Let \(\sigma _{\mathcal {B}}\) be the support function of \(\mathcal {B},\) i.e.,
Since \(\mathcal {B}\) is assumed to be symmetric with respect to the origin, we have
Hence, by replacing the expression of the support function in the constraint above, one has
For \({\varvec{\xi }}\) fixed with \(\Vert {\varvec{\xi }}\Vert =1,\) let \(\eta ({\varvec{\xi }}) = {\varvec{\xi }}^\top ( {\varvec{c}}_j-{\varvec{c}}_i)-\tau (r_i+r_j) \sigma _{\mathcal {B}}({\varvec{\xi }}).\) It follows that the inner minimum in \(h_{ij}({\varvec{c}}_i,{\varvec{c}}_j,\tau ),\) is the distance from the origin to the halfspace \({\varvec{\xi }}^\top {\varvec{p}} \le \eta ({\varvec{\xi }}),\) and such distance equals 0, if 0 belongs to the halfspace, i.e., if \(0 \le {\varvec{\xi }}^\top ( {\varvec{c}}_j-{\varvec{c}}_i)-\tau (r_i+r_j) \sigma _{\mathcal {B}}({\varvec{\xi }}),\) and \( - \eta ({\varvec{\xi }})\) else. Hence
But, for \({\varvec{\xi }}\) fixed, the function \(({\varvec{c}}_i,{\varvec{c}}_j,\tau ) \, \longmapsto - {\varvec{\xi }}^\top ( {\varvec{c}}_j-{\varvec{c}}_i)+\tau (r_i+r_j) \sigma _{\mathcal {B}}({\varvec{\xi }})\) is affine, and thus the function \(({\varvec{c}}_i,{\varvec{c}}_j,\tau ) \, \longmapsto \displaystyle \min _{\begin{array}{c} {\varvec{\xi }}\in {\mathbb R}^n\\ \Vert {\varvec{\xi }}\Vert =1 \end{array}} \left\{ - {\varvec{\xi }}^\top ( {\varvec{c}}_j-{\varvec{c}}_i)+\tau (r_i+r_j) \sigma _{\mathcal {B}}({\varvec{\xi }})\right\} \) is the minimum of affine functions, and is thus concave. Hence, \(h_{ij}\) is the maximum between 0 and a concave function, which is DC, whose decomposition is
\(\square \)
1.3 Proof of Proposition 3
Before giving the proof of Proposition 3, the following technical result is needed.
Lemma 1
Let \(\beta _{ij}\in {\mathbb R}\) be such that \(\beta _{ij}\ge 2\Vert r_i\varvec{b}_i-r_j\varvec{b}_j\Vert ^2\), \(\forall \varvec{b}_i, \varvec{b}_j \in \mathcal {B}\). Then, \(g_{ij}^2\) can be expressed as a DC function, \(g_{ij}^2=u_{ij}-(u_{ij}-g_{ij}^2)\), where
Proof
Observe that taking \(\beta _{ij}\in {\mathbb R}\) such that
the function
is convex. Since the maximum of convex functions is convex, hence taking \(u_{ij}= 2\Vert \varvec{c}_i-\varvec{c}_j\Vert ^2+\beta _{ij}\tau ^2\), we have obtained a DC decomposition for \(g_{ij}^2\) as in the statement. \(\square \)
We prove now Proposition 3:
If \(\lambda <\displaystyle \frac{1}{3}\), considering Proposition 1, one has
and thus \(u= \displaystyle \sum _{\begin{array}{c} i,j=1,\ldots ,N \\ i\ne j \end{array}} 2\lambda \kappa ^2\delta _{ij}^2\) holds.
If \(\lambda \ge \displaystyle \frac{1}{3}\), by using the DC decomposition for \(g_{ij}^2\) obtained in Lemma 1 and Proposition 1, one has
\(\square \)
Rights and permissions
About this article
Cite this article
Carrizosa, E., Guerrero, V. & Romero Morales, D. Visualizing data as objects by DC (difference of convex) optimization. Math. Program. 169, 119–140 (2018). https://doi.org/10.1007/s10107-017-1156-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1156-1