Skip to main content

Multidimensional Scaling for Genomic Data

  • Chapter
  • First Online:
Advances in Stochastic and Deterministic Global Optimization

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 107))

Abstract

Scientists working with genomic data face challenges to analyze and understand an ever-increasing amount of data. Multidimensional scaling (MDS) refers to the representation of high dimensional data in a low dimensional space that preserves the similarities between data points. Metric MDS algorithms aim to embed inter-point distances as close as the input dissimilarities. The computational complexity of most metric MDS methods is over O(n 2), which restricts application to large genomic data (n ≫ 106). The application of non-metric MDS might be considered, in which inter-point distances are embedded considering only the relative order of the input dissimilarities. A non-metric MDS method has lower complexity compared to a metric MDS, although it does not preserve the true relationships. However, if the input dissimilarities are unreliable, too difficult to measure or simply unavailable, a non-metric MDS is the appropriate algorithm. In this paper, we give overview of both metric and non-metric MDS methods and their application to genomic data analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D.J., Belongie, S.: Generalized non-metric multidimensional scaling. In: International Conference on Artificial Intelligence and Statistics, pp. 11–18 (2007)

    Google Scholar 

  2. Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A.C., Cruz, J.A., Sinelnikov, I., Budwill, K., Nesbø, C.L., Wishart, D.S.: Metagenassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40, W88–W95 (2012)

    Article  Google Scholar 

  3. Bécavin, C., Tchitchek, N., Mintsa-Eya, C., Lesne, A., Benecke, A.: Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition. Bioinformatics 27 (10), 1413–1421 (2011)

    Article  Google Scholar 

  4. Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer Series in Statistics, vol. 1. Springer, New York (2005)

    Google Scholar 

  5. Clarke, K., Warwick, R.: Change in Marine Communities: An Approach to Statistical Analysis and Interpretation. Primer-E Ltd., Devon (2001)

    Google Scholar 

  6. Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and HallCRC Monographs on Statistics and Applied Probability, vol. 88. Chapman and Hall/CRC Press, London/Boca Raton (2000)

    Google Scholar 

  7. Dzemyda, G., Kurasova, O., Žilinskas, J.: Multidimensional Data Visualization. Methods and Applications Series: Springer Optimization and its Applications, vol. 75, pp. 122. Springer, Berlin (2013)

    Google Scholar 

  8. Floudas, C.A., Pardalos, P.M.: Encyclopedia of Optimization, vol. 1. Springer Science and Business Media, Berlin (2008)

    MATH  Google Scholar 

  9. Goll, J., Rusch, D.B., Tanenbaum, D.M., Thiagarajan, M., Li, K., Methé, B.A., Yooseph, S.: METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26 (20), 2631–2632 (2010)

    Article  Google Scholar 

  10. Gonzalez, A., Knight, R.: Advancing analytical algorithms and pipelines for billions of microbial sequences. Curr. Opin. Biotechnol. 23 (1), 64–71 (2012)

    Article  Google Scholar 

  11. Heinrich, V., Kamphans, T., Stange, J., Parkhomchuk, D., Hecht, J., Dickhaus, T., Robinson, P.N., Krawitz, P.M.: Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects. Genome Med. 5, 1–11 (2013)

    Article  Google Scholar 

  12. Hughes, A., Ruan, Y., Ekanayake, S., Bae, S.H., Dong, Q., Rho, M., Qiu, J., Fox, G.: Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets. In: Proceedings from the Great Lakes Bioinformatics Conference 2011, vol. 13, p. S9. BioMed Central Ltd, London (2012)

    Google Scholar 

  13. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 (1), 1–27 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  14. Malaspinas, A.S., Tange, O., Moreno-Mayar, J.V., Rasmussen, M., DeGiorgio, M., Wang, Y., Valdiosera, C.E., Politis, G., Willerslev, E., Nielsen, R.: Bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS). Bioinformatics 30 (20), 2962–2964 (2014)

    Article  Google Scholar 

  15. Marx, V.: Biology: the big challenges of big data. Nature 498 (7453), 255–260 (2013)

    Article  Google Scholar 

  16. McCue, M.E., Bannasch, D.L., Petersen, J.L., Gurr, J., Bailey, E., Binns, M.M., Distl, O., Guérin, G., Hasegawa, T., Hill, E.W., et al.: A high density SNP array for the domestic horse and extant perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 8 (1), e1002,451 (2012)

    Article  Google Scholar 

  17. Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11 (1), 31–46 (2010)

    Article  Google Scholar 

  18. Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. Inf. Vis. 2 (1), 68–77 (2003)

    Article  Google Scholar 

  19. Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13 (9), 667–672 (2012)

    Article  Google Scholar 

  20. Pardalos, P.M., Shalloway, D., Xue, G., et al.: Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 23. American Mathematical Society, Providence, RI (1996)

    Google Scholar 

  21. Park, S., Shin, S.Y., Hwang, K.B.: CFMDS: CUDA-based fast multidimensional scaling for genome-scale data. BMC Bioinf. 13 (Suppl 17), 1–23 (2012)

    Google Scholar 

  22. Park, J., Brureau, A., Kernan, K., Starks, A., Gulati, S., Ogunnaike, B., Schwaber, J., Vadigepalli, R.: Inputs drive cell phenotype variability. Genome Res. 24 (6), 930–941 (2014)

    Article  Google Scholar 

  23. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 (3), 559–575 (2007)

    Article  Google Scholar 

  24. Ruan, Y., Ekanayake, S., Rho, M., Tang, H., Bae, S.H., Qiu, J., Fox, G.: DACIDR: deterministic annealed clustering with interpolative dimension reduction using a large collection of 16s rRNA sequences. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB ’12, pp. 329–336. ACM, New York (2012)

    Google Scholar 

  25. Ruan, Y., House, G.L., Ekanayake, S., Schutte, U., Bever, J.D., Tang, H., Fox, G.: Integration of clustering and multidimensional scaling to determine phylogenetic trees as spherical phylograms visualized in 3 dimensions. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 720–729. IEEE, New York (2014)

    Google Scholar 

  26. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75 (23), 7537–7541 (2009)

    Article  Google Scholar 

  27. Schloss, P.D., Gevers, D., Westcott, S.L.: Reducing the effects of pcr amplification and sequencing artifacts on 16s rRNA-based studies. PloS One 6 (12), e27,310 (2011)

    Article  Google Scholar 

  28. Staley, C., Unno, T., Gould, T.J., Jarvis, B., Phillips, J., Cotner, J.B., Sadowsky, M.J.: Application of Illumina next-generation sequencing to characterize the bacterial community of the Upper Mississippi River. J. Appl. Microbiol. 115 (5), 1147–1158 (2013)

    Article  Google Scholar 

  29. Stanberry, L., Higdon, R., Haynes, W., Kolker, N., Broomall, W., Ekanayake, S., Hughes, A., Ruan, Y., Qiu, J., Kolker, E., et al.: Visualizing the protein sequence universe. Concurr. Comput. Pract. Exper. 26 (6), 1313–1325 (2014)

    Article  Google Scholar 

  30. Taguchi, Y.h., Oono, Y.: Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics 21 (6), 730–740 (2005)

    Google Scholar 

  31. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17 (4), 401–419 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tzeng, J., Lu, H.H., Li, W.H.: Multidimensional scaling for large genomic data sets. BMC Bioinf. 9 (1), 179 (2008)

    Article  Google Scholar 

  33. Wolfe, P.J.: Making sense of big data. Proc. Natl. Acad. Sci. 110 (45), 18031–18032 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Zhu, C., Yu, J.: Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics 182 (3), 875–888 (2009)

    Article  Google Scholar 

  35. Žilinskas, A., Jakaitiene, A.: A conjugate gradient method for two dimensional scaling. Commun. Cognition. Monograph. 43 (3–4), 3–13 (2010)

    Google Scholar 

  36. Žilinskas, A., Žilinskas, J.: Parallel genetic algorithm: assessment of performance in multidimensional scaling. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1492–1501. ACM, New York (2007)

    Google Scholar 

  37. Žilinskas, A., Žilinskas, J.: Two level minimization in multidimensional scaling. J. Glob. Optim. 38 (4), 581–596 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  38. Žilinskas, A., Žilinskas, J.: Optimization-based visualization. In: Encyclopedia of Optimization, pp. 2785–2791. Springer, Berlin (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Audrone Jakaitiene .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Jakaitiene, A., Sangiovanni, M., Guarracino, M.R., Pardalos, P.M. (2016). Multidimensional Scaling for Genomic Data. In: Pardalos, P., Zhigljavsky, A., Žilinskas, J. (eds) Advances in Stochastic and Deterministic Global Optimization. Springer Optimization and Its Applications, vol 107. Springer, Cham. https://doi.org/10.1007/978-3-319-29975-4_7

Download citation

Publish with us

Policies and ethics