Abstract
Scientists working with genomic data face challenges to analyze and understand an ever-increasing amount of data. Multidimensional scaling (MDS) refers to the representation of high dimensional data in a low dimensional space that preserves the similarities between data points. Metric MDS algorithms aim to embed inter-point distances as close as the input dissimilarities. The computational complexity of most metric MDS methods is over O(n 2), which restricts application to large genomic data (n ≫ 106). The application of non-metric MDS might be considered, in which inter-point distances are embedded considering only the relative order of the input dissimilarities. A non-metric MDS method has lower complexity compared to a metric MDS, although it does not preserve the true relationships. However, if the input dissimilarities are unreliable, too difficult to measure or simply unavailable, a non-metric MDS is the appropriate algorithm. In this paper, we give overview of both metric and non-metric MDS methods and their application to genomic data analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D.J., Belongie, S.: Generalized non-metric multidimensional scaling. In: International Conference on Artificial Intelligence and Statistics, pp. 11–18 (2007)
Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A.C., Cruz, J.A., Sinelnikov, I., Budwill, K., Nesbø, C.L., Wishart, D.S.: Metagenassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40, W88–W95 (2012)
Bécavin, C., Tchitchek, N., Mintsa-Eya, C., Lesne, A., Benecke, A.: Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition. Bioinformatics 27 (10), 1413–1421 (2011)
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer Series in Statistics, vol. 1. Springer, New York (2005)
Clarke, K., Warwick, R.: Change in Marine Communities: An Approach to Statistical Analysis and Interpretation. Primer-E Ltd., Devon (2001)
Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and HallCRC Monographs on Statistics and Applied Probability, vol. 88. Chapman and Hall/CRC Press, London/Boca Raton (2000)
Dzemyda, G., Kurasova, O., Žilinskas, J.: Multidimensional Data Visualization. Methods and Applications Series: Springer Optimization and its Applications, vol. 75, pp. 122. Springer, Berlin (2013)
Floudas, C.A., Pardalos, P.M.: Encyclopedia of Optimization, vol. 1. Springer Science and Business Media, Berlin (2008)
Goll, J., Rusch, D.B., Tanenbaum, D.M., Thiagarajan, M., Li, K., Methé, B.A., Yooseph, S.: METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26 (20), 2631–2632 (2010)
Gonzalez, A., Knight, R.: Advancing analytical algorithms and pipelines for billions of microbial sequences. Curr. Opin. Biotechnol. 23 (1), 64–71 (2012)
Heinrich, V., Kamphans, T., Stange, J., Parkhomchuk, D., Hecht, J., Dickhaus, T., Robinson, P.N., Krawitz, P.M.: Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects. Genome Med. 5, 1–11 (2013)
Hughes, A., Ruan, Y., Ekanayake, S., Bae, S.H., Dong, Q., Rho, M., Qiu, J., Fox, G.: Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets. In: Proceedings from the Great Lakes Bioinformatics Conference 2011, vol. 13, p. S9. BioMed Central Ltd, London (2012)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 (1), 1–27 (1964)
Malaspinas, A.S., Tange, O., Moreno-Mayar, J.V., Rasmussen, M., DeGiorgio, M., Wang, Y., Valdiosera, C.E., Politis, G., Willerslev, E., Nielsen, R.: Bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS). Bioinformatics 30 (20), 2962–2964 (2014)
Marx, V.: Biology: the big challenges of big data. Nature 498 (7453), 255–260 (2013)
McCue, M.E., Bannasch, D.L., Petersen, J.L., Gurr, J., Bailey, E., Binns, M.M., Distl, O., Guérin, G., Hasegawa, T., Hill, E.W., et al.: A high density SNP array for the domestic horse and extant perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 8 (1), e1002,451 (2012)
Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11 (1), 31–46 (2010)
Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. Inf. Vis. 2 (1), 68–77 (2003)
Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13 (9), 667–672 (2012)
Pardalos, P.M., Shalloway, D., Xue, G., et al.: Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 23. American Mathematical Society, Providence, RI (1996)
Park, S., Shin, S.Y., Hwang, K.B.: CFMDS: CUDA-based fast multidimensional scaling for genome-scale data. BMC Bioinf. 13 (Suppl 17), 1–23 (2012)
Park, J., Brureau, A., Kernan, K., Starks, A., Gulati, S., Ogunnaike, B., Schwaber, J., Vadigepalli, R.: Inputs drive cell phenotype variability. Genome Res. 24 (6), 930–941 (2014)
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 (3), 559–575 (2007)
Ruan, Y., Ekanayake, S., Rho, M., Tang, H., Bae, S.H., Qiu, J., Fox, G.: DACIDR: deterministic annealed clustering with interpolative dimension reduction using a large collection of 16s rRNA sequences. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB ’12, pp. 329–336. ACM, New York (2012)
Ruan, Y., House, G.L., Ekanayake, S., Schutte, U., Bever, J.D., Tang, H., Fox, G.: Integration of clustering and multidimensional scaling to determine phylogenetic trees as spherical phylograms visualized in 3 dimensions. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 720–729. IEEE, New York (2014)
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75 (23), 7537–7541 (2009)
Schloss, P.D., Gevers, D., Westcott, S.L.: Reducing the effects of pcr amplification and sequencing artifacts on 16s rRNA-based studies. PloS One 6 (12), e27,310 (2011)
Staley, C., Unno, T., Gould, T.J., Jarvis, B., Phillips, J., Cotner, J.B., Sadowsky, M.J.: Application of Illumina next-generation sequencing to characterize the bacterial community of the Upper Mississippi River. J. Appl. Microbiol. 115 (5), 1147–1158 (2013)
Stanberry, L., Higdon, R., Haynes, W., Kolker, N., Broomall, W., Ekanayake, S., Hughes, A., Ruan, Y., Qiu, J., Kolker, E., et al.: Visualizing the protein sequence universe. Concurr. Comput. Pract. Exper. 26 (6), 1313–1325 (2014)
Taguchi, Y.h., Oono, Y.: Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics 21 (6), 730–740 (2005)
Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17 (4), 401–419 (1952)
Tzeng, J., Lu, H.H., Li, W.H.: Multidimensional scaling for large genomic data sets. BMC Bioinf. 9 (1), 179 (2008)
Wolfe, P.J.: Making sense of big data. Proc. Natl. Acad. Sci. 110 (45), 18031–18032 (2013)
Zhu, C., Yu, J.: Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics 182 (3), 875–888 (2009)
Žilinskas, A., Jakaitiene, A.: A conjugate gradient method for two dimensional scaling. Commun. Cognition. Monograph. 43 (3–4), 3–13 (2010)
Žilinskas, A., Žilinskas, J.: Parallel genetic algorithm: assessment of performance in multidimensional scaling. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1492–1501. ACM, New York (2007)
Žilinskas, A., Žilinskas, J.: Two level minimization in multidimensional scaling. J. Glob. Optim. 38 (4), 581–596 (2007)
Žilinskas, A., Žilinskas, J.: Optimization-based visualization. In: Encyclopedia of Optimization, pp. 2785–2791. Springer, Berlin (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jakaitiene, A., Sangiovanni, M., Guarracino, M.R., Pardalos, P.M. (2016). Multidimensional Scaling for Genomic Data. In: Pardalos, P., Zhigljavsky, A., Žilinskas, J. (eds) Advances in Stochastic and Deterministic Global Optimization. Springer Optimization and Its Applications, vol 107. Springer, Cham. https://doi.org/10.1007/978-3-319-29975-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-29975-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29973-0
Online ISBN: 978-3-319-29975-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)