Mining Class Outliers: Concepts, Algorithms and Applications

  • Zengyou He
  • Joshua Zhexue Huang
  • Xiaofei Xu
  • Shengchun Deng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3129)

Abstract

Detection of outliers is important in many applications and has attracted much attention in the data mining research community recently. However, most existing methods are designed for mining outliers from a single dataset without considering the class labels of data objects. In this paper, we consider the class outlier detection problem, i.e., ”given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels.” By generalizing two pioneering contributions in this field, we propose the notion of class outliers and practical solutions by extending existing outlier detection algorithms to detect class outliers. Furthermore, its potential applications in CRM (customer relationship management) are discussed. The experiments on real datasets have shown that our method can find interesting outliers and can be used in practice.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    He, Z., Deng, S., Xu, X.: Outlier detection integrating semantic knowledge. In: WAIM 2002, pp. 126–131 (2002)Google Scholar
  2. 2.
    Papadimitriou, S., Faloutsos, C.: Cross-outlier detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 199–213. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Hawkins, D.: Identification of outliers. Chapman and Hall, Reading (1980)MATHGoogle Scholar
  4. 4.
    Gibson, D., et al.: Clustering categorical data: an approach based on dynamic systems. In: VLDB (1998)Google Scholar
  5. 5.
    He, Z., et al.: A Frequent Pattern Discovery Method for Outlier Detection. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, Springer, Heidelberg (2004)Google Scholar
  6. 6.
    He, Z., Xu, X., Deng, S.: Discovering Cluster Based Local Outliers. Pattern Recognition Letters (2003)Google Scholar
  7. 7.
    He, Z., Huang, J., Xu, X., Deng, S.: Mining Class Outlier: Concepts, Algorithms and Applications. Technology Report, HIT (2003), http://www.angelfire.com/mac/zengyouhe/publications/Class_Outlier.pdf
  8. 8.
    Yao, Y., Zhong, N., Huang, J., Ou, C., Liu, C.: Using Market Value Functions for Targeted Marketing Data Mining. International Journal of Pattern Recognition and Artificial Intelligence 16(8), 1117–1132 (2002)CrossRefGoogle Scholar
  9. 9.
    Setnes, M., Kaymak, U.: Fuzzy Modeling of Client Preference from Large Data Sets: An Application to Target Selection in Direct Marketing. IEEE Transactions on Fuzzy Systems 9(1), 153–163 (2001)CrossRefGoogle Scholar
  10. 10.
    SPSS Inc., SPSS CHAID for Windows 6.0. Prentice-Hall, Englewood Cliffs (1993) Google Scholar
  11. 11.
    Ling, C.X., Li, C.: Data Mining for Direct Marketing: Problems and Solutions. In: KDD 1998, pp. 73–79 (1998)Google Scholar
  12. 12.
    Liu, B., Ma, Y., Wong, C.K., Yu, P.S.: Scoring the Data Using Association Rules. Applied intelligence (2003)Google Scholar
  13. 13.
    The Coil dataset can found at: http://www.liacs.nl/~putten/library/cc2000/
  14. 14.
  15. 15.
    Lewandowski, A.: How to detect potential customers. In: CoIL Challenge 2000: The Insurance Company Case, Technical Report 2000-09, Leiden Institute of Advanced Computer Science, Netherlands (2000) Google Scholar
  16. 16.
    Elkan, C.: Magical Thinking in Data Mining: Lessons From CoIL Challenge 2000. In: Proc of KDD 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Zengyou He
    • 1
  • Joshua Zhexue Huang
    • 2
  • Xiaofei Xu
    • 1
  • Shengchun Deng
    • 1
  1. 1.Department of Computer Science and EngineeringHarbin Institute of TechnologyChina
  2. 2.E-Business Technology InstituteThe University of Hong KongChina

Personalised recommendations