Skip to main content

Exploiting Non-linear Structure in Astronomical Data for Improved Statistical Inference

  • Conference paper
  • First Online:
Statistical Challenges in Modern Astronomy V

Part of the book series: Lecture Notes in Statistics ((LNSP,volume 902))

Abstract

Many estimation problems in astrophysics are highly complex, with high-dimensional, non-standard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are non-linear data transformation methods that efficiently reveal the underlying geometry of observable data. Here we focus on one particular technique: diffusion maps or more generally spectral connectivity analysis (SCA). We give examples of applications in astronomy; e.g., photometric redshift estimation, prototype selection for estimation of star formation history, and supernova light curve classification. We outline some computational and statistical challenges that remain, and we discuss some promising future directions for astronomy and data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.darkenergysurvey.org

  2. 2.

    www.lsst.org [19].

  3. 3.

    www.pan-starrs.ifa.hawaii.edu/public

  4. 4.

    www.vista.ac.uk

References

  1. Bailer-Jones, C. A. L. (2010, March). The ILIUM forward modelling algorithm for multivariate parameter estimation and its application to derive stellar parameters from Gaia spectrophotometry. Monthly Notices of the Royal Astronomical Society 403, 96–116.

    Google Scholar 

  2. Ball, N. M., R. J. Brunner, A. D. Myers, N. E. Strand, S. L. Alberts, and D. Tcheng (2008, August). Robust Machine Learning Applied to Astronomical Data Sets. III. Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX. Astrophysical Journal 683, 12–21.

    Google Scholar 

  3. Belkin, M. and P. Niyogi (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 6(15), 1373–1396.

    Article  Google Scholar 

  4. Belkin, M. and P. Niyogi (2005). Semi-supervised learning on Riemannian manifolds. Machine Learning 56, 209–239.

    Article  Google Scholar 

  5. Boroson, T. A. and T. R. Lauer (2010, August). Exploring the Spectral Space of Low Redshift QSOs. The Astronomical Journal 140, 390–402.

    Google Scholar 

  6. Bruzual, G. and S. Charlot (2003). Stellar population synthesis at the resolution of 2003. Monthly Notices of the Royal Astronomical Society 344, 1000–1028.

    Article  Google Scholar 

  7. Budavári, T., V. Wild, A. S. Szalay, L. Dobos, and C.-W. Yip (2009, April). Reliable eigenspectra for new generation surveys. Monthly Notices of the Royal Astronomical Society 394, 1496–1502.

    Article  Google Scholar 

  8. Cid Fernandes, R., Q. Gu, J. Melnick, E. Terlevich, R. Terlevich, D. Kunth, R. Rodrigues Lacerda, and B. Joguet (2004). The star formation history of Seyfert 2 nuclei. Monthly Notices of the Royal Astronomical Society 355, 273–296.

    Article  Google Scholar 

  9. Cid Fernandes, R., L. Sodré, H. R. Schmitt, and J. R. S. Leão (2001, July). A probabilistic formulation for empirical population synthesis: sampling methods and tests. Monthly Notices of the Royal Astronomical Society 325, 60–76.

    Article  Google Scholar 

  10. Coifman, R. and S. Lafon (2006). Diffusion maps. Applied and Computational Harmonic Analysis 21, 5–30.

    Article  MathSciNet  MATH  Google Scholar 

  11. Coifman, R., S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker (2005). Geometric diffusions as a tool for harmonics analysis and structure definition of data: Diffusion maps. Proc. of the National Academy of Sciences 102(21), 7426–7431.

    Article  Google Scholar 

  12. Collister, A. A. and O. Lahav (2004, April). ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks. Publ. of the Astronomical Society of the Pacific 116, 345–351.

    Article  Google Scholar 

  13. Dahlen, T., B. Mobasher, M. Dickinson, H. C. Ferguson, M. Giavalisco, N. A. Grogin, Y. Guo, A. Koekemoer, K.-S. Lee, S.-K. Lee, M. Nonino, A. G. Riess, and S. Salimbeni (2010, November). A Detailed Study of Photometric Redshifts for GOODS-South Galaxies. Astrophysical Journa 724, 425–447.

    Article  Google Scholar 

  14. Donoho, D. and C. Grimes (2003, May). Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc. of the National Academy of Sciences 100(10), 5591–5596.

    Article  MathSciNet  MATH  Google Scholar 

  15. Efromovich, S. (1999). Nonparametric curve estimation: methods, theory and applications. Springer series in statistics. Springer.

    MATH  Google Scholar 

  16. Fouss, F., A. Pirotte, and M. Saerens (2005). A novel way of computing similarities between nodes of a graph, with application to collaborative recommendation. In Proc. of the 2005 IEEE/WIC/ACM International Joint Conference on Web Intelligence, pp. 550–556.

    Google Scholar 

  17. Freeman, P. E., J. Newman, A. B. Lee, J. W. Richards, and C. M. Schafer (2009). Photometric redshift estimation using SCA. Monthly Notices of the Royal Astronomical Society 398, 2012–2021.

    Article  Google Scholar 

  18. Hayden, B. T., P. M. Garnavich, et al. (2010, March). The Rise and Fall of Type Ia Supernova Light Curves in the SDSS-II Supernova Survey. Astrophysical Journa 712, 350–366.

    Article  Google Scholar 

  19. Ivezic, Z., J. A. Tyson, and for the LSST Collaboration (2008, May). LSST: from Science Drivers to Reference Design and Anticipated Data Products. ArXiv e-prints.

    Google Scholar 

  20. Kessler, R., Bassett, B., et al. (2010) Results from the Supernova Photometric Classification Challenge, Publ. Astro. Soc. Pacific, 122, 1415–1431.

    Article  Google Scholar 

  21. Lafferty, J. and L. Wasserman (2007). Statistical analysis of semi-supervised regression. In Adv. in Neural Inf. Processing Systems.

    Google Scholar 

  22. Lafon, S. and A. Lee (2006). Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans. Pattern Anal. and Mach. Intel. 28, 1393–1403.

    Article  Google Scholar 

  23. Lee, A. B. and L. Wasserman (2010). Spectral connectivity analysis. Journal of the American Statistical Association 105(491), 1241–1255.

    Article  MathSciNet  Google Scholar 

  24. N. Halko, P. M. and J. Tropp (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53(2).

    Google Scholar 

  25. Ng, A. Y., M. I. Jordan, and Y. Weiss (2001). On spectral clustering: Analysis and an algorithm. In Adv. in Neural Inf. Processing Systems.

    Google Scholar 

  26. Richards, J. W., P. E. Freeman, A. B. Lee, and C. M. Schafer (2009a). Accurate parameter estimation for star formation history in galaxies using SDSS spectra. Monthly Notices of the Royal Astronomical Society 399, 1044–1057.

    Article  Google Scholar 

  27. Richards, J. W., P. E. Freeman, A. B. Lee, and C. M. Schafer (2009b). Exploiting low-dimensional structure in astronomical spectra. Astrophysical Journal 691, 32–42.

    Article  Google Scholar 

  28. Richards, J. W., P. E. Freeman, A. B. Lee, and C. M. Schafer (2011a). Prototype selection for parameter estimation in complex models. Submitted; arXiv:1105.6344.

    Google Scholar 

  29. Richards, J. W., D. Homrighausen, P. E. Freeman, C. M. Schafer, and D. Poznanski (2011b). Semi-supervised learning for photometric supernova classification. Submitted; arXiv:1103.6034.

    Google Scholar 

  30. Roweis, S. and L. Saul (2000). Nonlinear dimensionality reduction by annalsly linear embedding. Science 290, 2323–2326.

    Article  Google Scholar 

  31. Sesar, B., Ž. Ivezić, et al. (2010, January). Light Curve Templates and Galactic Distribution of RR Lyrae Stars from Sloan Digital Sky Survey Stripe 82. Astrophysical Journal 708, 717–741.

    Article  Google Scholar 

  32. Settles, B. (2010). Active learning literature survey. Technical Report 1648, Dept. of Computer Science, University of Wisconsin-Madison.

    Google Scholar 

  33. Singh, A., R. Nowak, and X. Zhu (2008). Unlabeled data: Now it helps, now it doesn’t. In Adv. in Neural Inf. Processing Systems.

    Google Scholar 

  34. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  35. Zhang, Z. and H. Zha (2002). Principal manifolds and nonlinear dimension reduction via local tangent space alignement. Technical Report CSE-02-019, Department of computer science and engineering, Pennsylvania State University.

    Google Scholar 

Download references

Acknowledgements

Part of this work is joint with Joseph W. Richards, Chad M. Schafer, Jeffrey A. Newman, and Darren W. Homrighausen. We would also like to acknowledge ONR grant #00424143, NSF grant #0707059, and NASA AISR grant NNX09AK59G.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ann B. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this paper

Cite this paper

Lee, A.B., Freeman, P.E. (2012). Exploiting Non-linear Structure in Astronomical Data for Improved Statistical Inference. In: Feigelson, E., Babu, G. (eds) Statistical Challenges in Modern Astronomy V. Lecture Notes in Statistics(), vol 902. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3520-4_24

Download citation

Publish with us

Policies and ethics