Clustering Based One-Class Classification for Compliance Verification of the Comprehensive Nuclear-Test-Ban Treaty

  • Shiven Sharma
  • Colin Bellinger
  • Nathalie Japkowicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7310)

Abstract

Monitoring the levels of radioxenon isotopes in the atmosphere has been proposed as a means of verifying the Comprehensive Nuclear-Test-Ban Treaty (CTBT). This translates into a classification problem, whereby the measured concentrations either belong to an explosion class or a background class. Instances drawn from the explosions class are extremely rare, if not non-existent. Therefore, the resulting dataset is extremely imbalanced, and inherently suited for one-class classification. Further exacerbating the problem is the fact that the background distribution can be extremely complex, and thus, modelling it using one-class learning is difficult. In order to improve upon the previous classification results, we investigate the augmentation of one-class learning methods with clustering. The purpose of clustering is to convert a complex distribution into simpler distributions, the clusters, over which more effective models can be built. The resulting model, built from one-class learners trained over the clusters, performs more effectively than a model that is built over the original distribution. This thesis is empirically tested on three different data domains; in particular, a number of artificial datasets, datasets from the UCI repository, and data modelled after the extremely challenging CTBT. The results offer credence to the fact that there is an improvement in performance when clustering is used with one-class classification on complex distributions.

Keywords

Target Class Original Distribution Complex Distribution Multimodal Distribution Imbalance Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arya, S.P.: Air Pollution Meteorology and Dispersion. Oxford University Press, New York (1999)Google Scholar
  2. 2.
    Bellinger, C., Oommen, B.J.: On simulating episodic events against a background of noise-like non-episodic events. In: Proceedings of 42nd Summer Computer Simulation Conference, SCSC 2010, Ottawa, Canada, July 11-14 (2010)Google Scholar
  3. 3.
    Bellinger, C., Oommen, B.J.: On the pattern recognition and classification of stochastically episodic events. Transactions on Computational Collective Intelligence (2011) (accepted for publication)Google Scholar
  4. 4.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006)Google Scholar
  5. 5.
    Fontaine, J., Pointurier, F., Blanchard, X., Taffary, T.: Atmospheric xenon radioactive isotope monitoring. Journal of Environmental Radioactivity 72, 129–135 (2004)CrossRefGoogle Scholar
  6. 6.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  7. 7.
    Hempstalk, K., Frank, E., Witten, I.H.: One-Class Classification by Combining Density and Class Probability Estimation. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 505–519. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press (2011)Google Scholar
  9. 9.
    Japkowicz, N.: Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning 42(1/2), 97–122 (2001)MATHCrossRefGoogle Scholar
  10. 10.
    Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. In: Machine Learning, pp. 195–215 (1998)Google Scholar
  11. 11.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)Google Scholar
  12. 12.
    Manevitz, L.M., Yousef, M.: One-class svms for document classification. The Journal of Machine Learning Research 2, 139–154 (2002)MATHGoogle Scholar
  13. 13.
    Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., O’Blenis, P.: A new algorithm for reducing the workload of experts in performing systematic reviews. Journal of the American Medical Informatics Association 17, 446–453 (2010)CrossRefGoogle Scholar
  14. 14.
    Simmonds, J.R., Lawson, G., Mayall, A.: A Methodology for Assessing the Radiological Consequences of RoutineReleases of Radionuclides to the Environment. EUR, 1018-5593; 15760. European Commission, Directorate-General for Environment, Nuclear Safety and Civil Protection (1995)Google Scholar
  15. 15.
    Stocki, T.J., Japkowicz, N., Ungar, I.K., Hoffman, J., Yi, J.: Summary of the data mining contest for the IEEE international conference on data mining. In: Proceedings of the ICDM 2008 Data Mining Contest (2008), http://www.cs.uu.nl/groups/ADA/icdm08cup/booklet.pdf
  16. 16.
    Stocki, T.J., Li, G., Japkowicz, N., Ungar, R.K.: Machine learning for radioxenon event classification for the Comprehensive Nuclear-Test-Ban Treaty. Journal of Environmental Radioactivity 101(1), 68–74 (2010)CrossRefGoogle Scholar
  17. 17.
    Sullivan, J.D.: The comprehensive test ban treaty. Physics Today 51(3), 24–29 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Shiven Sharma
    • 1
  • Colin Bellinger
    • 1
  • Nathalie Japkowicz
    • 1
  1. 1.SITEUniversity of OttawaOttawaCanada

Personalised recommendations