Classification and Anomaly Detection for Astronomical Survey Data

  • Marc Henrion
  • Daniel J. Mortlock
  • David J. Hand
  • Axel Gandy
Part of the Springer Series in Astrostatistics book series (SSIA, volume 1)


We present two statistical techniques for astronomical problems: a star-galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star-galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region are used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets.


Anomaly Detection Morphology Statistic Combination Function Anomaly Score Anomaly Detection Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The results presented here would not have been possible without the efforts of the many people involved in the SDSS and UKIDSS projects.

Marc Henrion was supported by an EPSRC research studentship, and David Hand was partially supported by a Royal Society Wolfson Research Merit Award.


  1. 1.
    Borne, K.D.: Data-Driven Discovery through e-Science Technologies. In: 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT–06), pp. 251–256. IEEE Computer Society (2006)Google Scholar
  2. 2.
    York, D.G., et al.: The Sloan Digital Sky Survey: technical summary. Astron. J. 120, 1579–1587 (2000)CrossRefGoogle Scholar
  3. 3.
    Gunn, J.E., et al.: The 2.5 m telescope of the Sloan Digital Sky Survey. Astrophys. J. 131, 2332–2359 (2006)Google Scholar
  4. 4.
    Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., Shimasaku, K., Schneider, D.P.: The Sloan Digital Sky Survey photometric system. Astron. J. 111, 1748–1756 (1996)CrossRefGoogle Scholar
  5. 5.
    Lupton, R.H., Gunn, J.E., Szalay, A.S.: A modified magnitude system that produces well-behaved magnitudes, colors, and errors even for low signal-to-noise ratio measurements. Astron. J. 118, 1406–1410 (1999)CrossRefGoogle Scholar
  6. 6.
    Lawrence, A., Warren, S.J., Almaini, O., Edge, A.C., Hambly, N.C., Jameson, R.F., Lucas, P., Casali, M., Adamson, A., Dye, S., Emerson, J.P., Foucaud, S., Hewett, P., Hirst, P., Hodgkin, S.T., Irwin, M.J., Lodieu, N., McMahon, R.G., Simpson, C., Smail, I., Mortlock, D., Folger, M.: The UKIRT Infrared Deep Sky Survey (UKIDSS). Mon. Not. R. Astron. Soc. 379, 1599–1617(19)(2007)CrossRefGoogle Scholar
  7. 7.
    Dye, S., et al.: The UKIRT Infrared Deep Sky Survey early data release. Mon. Not. R. Astron. Soc. 372, 1227–1252 (2006)CrossRefGoogle Scholar
  8. 8.
    Warren, S.J., et al.: The United Kingdom Infrared Telescope Infrared Deep Sky Survey first data release. Mon. Not. R. Astron. Soc. 375, 213–226 (2007)CrossRefGoogle Scholar
  9. 9.
    Casali, M., et al.: The UKIRT wide-field camera. Astron. Astrophys. 467, 777–784 (2007)CrossRefGoogle Scholar
  10. 10.
    Hewett, P.C., Warren, S.J., Leggett, S.K., Hodgkin, S.T.: The UKIRT Infrared Deep Sky Survey ZY JHK photometric system: passbands and synthetic colours. Mon. Not. R. Astron. Soc. 367, 454–468 (2006)CrossRefGoogle Scholar
  11. 11.
    Skrutskie, M.F., Cutri, R.M., Stiening, R., Weinberg, M.D., Schneider, S., Carpenter, J.M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J., Lonsdale, C., Monet, D.G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J.D., Gizis, J.E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R., Kopan, E.L., Marsh, K.A., McCallon, H.L., Tam, R., Van Dyk, S., Wheelock, S.: The Two Micron All Sky Survey (2MASS). Astron. J. 131, 1163–1183 (2006)CrossRefGoogle Scholar
  12. 12.
    Lintott, C.J., et al.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc. 389, 1179–1189 (2008)CrossRefGoogle Scholar
  13. 13.
    Irwin, M.J.: Automatic analysis of crowded fields. Mon. Not. R. Astron. Soc. 214, 575–604 (1985)Google Scholar
  14. 14.
    Bertin, E., Arnouts, S.: SExtractor: software for source extraction. Astron. Astrophys. Suppl. Ser. 117, 393–404 (1996)CrossRefGoogle Scholar
  15. 15.
    Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A.: A Bayesian approach to star-galaxy classification. Mon. Not. R. Astron. Soc. 412, 2286–2302 (2011)CrossRefGoogle Scholar
  16. 16.
    Bazell, D., Peng, Y.: A comparison of neural network algorithms and preprocessing methods for star-galaxy discrimination. Astrophys. J. Suppl. Ser. 116, 47–55 (1998)CrossRefGoogle Scholar
  17. 17.
    Cortiglioni, F., Mähönen, P., Hakala, P., Frantti, T.: Automated star-galaxy discrimination for large surveys. Astrophys. J. 556, 937–943 (2001)CrossRefGoogle Scholar
  18. 18.
    Wolf, C., Meisenheimer, K, Röser, HJ.: Object classification in astronomical multi-color surveys. Astron. Astrophys. 365(3), 660–680 (2001)CrossRefGoogle Scholar
  19. 19.
    Aihara, H., et al.: The eighth data release of the Sloan Digital Sky Survey: first data from SDSS-III. Astrophys. J. Suppl. Ser. 193, 29–45 (2011)CrossRefGoogle Scholar
  20. 20.
    Richards, G.T., Nichol, R.C., Gray, A.G., Brunner, R.J., Lupton, R.H., Vanden Berk, D.E., Chong, S.S., Weinstein, M.A., Schneider, D.P., Anderson, S.F., Munn, J.A., Harris, H.C., Strauss, M.A., Fan, X., Gunn, J.E., Ivezić, Ž., York, D.G., Brinkmann, J., Moore, A.W.: Efficient photometric selection of quasars from the Sloan Digital Sky Survey: 100,000 z < 3 quasars from data release one. Astrophys. J. Suppl. Ser. 155, 257–269 (2004)CrossRefGoogle Scholar
  21. 21.
    Richards, G.T., Deo, R.P, Lacy, M., Myers, A.D., Nichol, R.C., Zakamska, N.L., Brunner, R.J., Brandt, W.N., Gray, A.G., Parejko, J.K., Ptak, A., Schneider, D.P, Storrie-Lombardi, L.J., Szalay, A.S.: Eight-dimensional mid-infrared/optical Bayesian quasar selection. Astron. J. 137, 3884–3899 (2009)CrossRefGoogle Scholar
  22. 22.
    Wolf, C., Meisenheimer, K, Röser, HJ., Beckwith, SVW., Chaffee Jr., F.H., Fried, J., Hippelein, H., Huang, J.S., Kümmel, M., von Kuhlmann, B., Maier, C., Phleps, S., Rix, H.W., Thommes, E., Thompson, D.: Multi-color classification in the Calar Alto Deep Imaging Survey. Astron. Astrophys. 365, 681–698 (2001)CrossRefGoogle Scholar
  23. 23.
    Bazell, D., Miller, D J.: Class discovery in galaxy classification. Astrophys. J. 618, 723–732 (2005)CrossRefGoogle Scholar
  24. 24.
    Suchkov, A.A., Hanisch, R.J., Margon, B.: A census of object types and redshift estimates in the SDSS photometric catalog from a trained decision tree classifier. Astron. J. 130, 2439–2452 (2005)CrossRefGoogle Scholar
  25. 25.
    Ball, N.M., Brunner, R.J., Myers, A.D., Tcheng, D.: Robust machine learning applied to astronomical data sets. I. Star-galaxy classification of the Sloan Digital Sky Survey DR3 using decision trees. Astrophys. J. 650, 497–509 (2006)CrossRefGoogle Scholar
  26. 26.
    Irwin, M., Lewis, J., Riello, M., Hodgkin, S., Gonzales-Solares, E., Wyn Evans, D., Bunclark, P.: Pipeline processing of wide-field near-infrared data from WFCAM (in preparation)Google Scholar
  27. 27.
    Odewahn, S.C., de Carvalho, R.R., Gal, R.R., Djorgovski, S.G., Brunner, R., Mahabal, A., Lopes, P.A.A., Moreira, J.L.K., Stalder, B.: The Digitized Second Palomar Observatory Sky Survey (DPOSS). III. Star-galaxy separation. Astron. J. 128, 3092–3107 (2004)CrossRefGoogle Scholar
  28. 28.
    Philip, N.S., Wadadekar, Y., Kembhavi, A., Joseph, K.B.: A difference boosting neural network for automated star-galaxy classification. Astron. Astrophys. 385, 1119–1126 (2002)CrossRefGoogle Scholar
  29. 29.
    Miller, D.J., Browning, J.: A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. IEEE T. Pattern. Anal. 25, 1468–1483 (2003)CrossRefGoogle Scholar
  30. 30.
    Bardeau, S., Kneib, J.P., Czoske, O., Soucail, G., Smail, I., Ebeling, H., Smith, G.P.: A CFH12k lensing survey of X-ray luminous galaxy clusters. I. Weak lensing methodology. Astron. Astrophys. 434, 433–448 (2005)CrossRefGoogle Scholar
  31. 31.
    Scranton, R., Johnston, D., Dodelson, S., Frieman, J.A., Connolly, A., Eisenstein, D.J., Gunn, J.E., Hui, L., Jain, B., Kent, S., Loveday, J., Narayanan, V., Nichol, R.C., O–Connell, L., Scoc-cimarro, R., Sheth, R.K., Stebbins, A., Strauss, M.A., Szalay, A.S., Szapudi, I., Tegmark, M., Vogeley, M., Zehavi, I., Annis, J., Bahcall, N.A., Brinkman, J., Csabai, I., Hindsley, R., Ivezic, Z., Kim, R.S.J., Knapp, G.R., Lamb, D.Q., Lee, B.C., Lupton, R.H., McKay, T., Munn, J., Peoples, J., Pier, J., Richards, G.T., Rockosi, C., Schlegel, D., Schneider, D.P., Stoughton, C., Tucker, D.L., Yanny, B., York, D.G.: Analysis of systematic effects and statistical uncertainties in angular clustering of galaxies from early Sloan Digital Sky Survey data. Astrophys. J. 579(1), 48–75 (2002)CrossRefGoogle Scholar
  32. 32.
    Mortlock, D.J., Patel, M., Warren, S.J., Hewett, P.C., Venemans, B.P., McMahon, R.G., Simpson, C.: Probabilistic selection of high-redshift quasars. Mon. Not. R. Astron. Soc. 419, 390–410 (2012)CrossRefGoogle Scholar
  33. 33.
    Sérsic, J.L.: Influence of the atmospheric and instrumental dispersion on the brightness distribution in a galaxy. La Plata Bol 6, 41 (1963)Google Scholar
  34. 34.
    Yasuda, N., Fukugita, M., Narayanan, V.K., Lupton, R.H., Strateva, I., Strauss, M.A., Ivezić, Z., Kim, R.S.J., Hogg, D.W., Weinberg, D.H., Shimasaku, K., Loveday, J., Annis, J., Bahcall, N.A., Blanton, M., Brinkmann, J., Brunner, R.J., Connolly, A.J., Csabai, I., Doi, M., Hamabe, M., Ichikawa, S.I., Ichikawa, T., Johnston, D.E., Knapp G. R. andKunszt, P.Z., Lamb, D.Q., McKay, T.A., Munn, J.A., Nichol, R.C., Okamura, S., Schneider, D.P., Szokoly, G.P., Vogeley, M.S., Watanabe, M., York, D.G.: Galaxy number counts from the Sloan Digital Sky Survey commissioning data. Astron. J. 122, 1104–1124 (2001)CrossRefGoogle Scholar
  35. 35.
    Henrion, M., Hand, D.J., Gandy, A., Mortlock, D.: CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases (2011). SubmittedGoogle Scholar
  36. 36.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high himensional space. In: Van den Bussche, J., Vianu, V. (eds.) Database Theory — ICDT 2001, Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer (2001)CrossRefGoogle Scholar
  37. 37.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) Database Theory — ICDT –99, Lecture Notes in Computer Science, vol. 1540, pp. 217–235. Springer (1999)Google Scholar
  38. 38.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29(2), 93–104 (2000)CrossRefGoogle Scholar
  39. 39.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8, 237–253 (2000)CrossRefGoogle Scholar
  40. 40.
    Rebbapragada, U., Protopapas, P., Brodley, C., Alcock, C.: Finding anomalous periodic time series. Mach. Learn. 74, 281–313 (2009). 10.1007/s10994–008–5093–3CrossRefGoogle Scholar
  41. 41.
    Dutta, H., Gianella, C., Borne, K., Kargupta, h.: Distributed top-K outlier detection in astronomy catalogs using the DEMAC system. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 473–478 (2007)Google Scholar
  42. 42.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics. Springer (2002)Google Scholar
  43. 43.
    Mahule, T., Borne, K., Dey, S., Arora, S., Kargupta, H.: PADMINI: a peer-to-peer distributed astronomy data mining system and a case study. In: Proceedings of the Conference on Intelligent Data Understanding 2010 (2010)Google Scholar
  44. 44.
    Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)CrossRefGoogle Scholar
  45. 45.
    Latecki, L., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, vol. 4571, pp. 61–75. Springer (2007)CrossRefGoogle Scholar
  46. 46.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering (ICDE–03). IEEE Computer Society (2003)Google Scholar
  47. 47.
    Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–08) (2008)Google Scholar
  48. 48.
    Hambly, N.C., et al.: The WFCAM Science Archive. Mon. Not. R. Astron. Soc. 384, 637–662 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Marc Henrion
    • 1
  • Daniel J. Mortlock
    • 2
  • David J. Hand
    • 3
  • Axel Gandy
    • 4
  1. 1.Department of MathematicsImperial College LondonLondonUK
  2. 2.Astrophysics Group, Dept. of Physics, and Statistics Group, Dept. of MathematicsImperial College, LondonLondonUK
  3. 3.Dept. of Mathematics.Imperial College, LondonLondonUK
  4. 4.Statistics Group, Dept. of MathematicsImperial College, LondonLondonUK

Personalised recommendations