Frequent Pattern Outlier Detection Without Exhaustive Mining

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9652)

Abstract

Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provide exact and approximate methods for calculating the frequent pattern outlier factor (FPOF) without extracting any pattern or by extracting a small sample. We propose an algorithm that returns the exact FPOF without mining any pattern. Surprisingly, it works in polynomial time on the size of the dataset. We also present an approximate method where the end-user controls the maximum error on the estimated FPOF. Experiments show the interest of both methods for very large datasets where exhaustive mining fails to provide the exact solution. The accuracy of our approximate method outperforms the baseline approach for a same budget in time or number of patterns.

Notes

Acknowledgements

This work has been partially supported by the Prefute project, PEPS 2015, CNRS.

References

  1. 1.
    Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, The Netherlands (1980)CrossRefMATHGoogle Scholar
  2. 2.
    He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)CrossRefGoogle Scholar
  3. 3.
    Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Discovery 12(2–3), 203–228 (2006)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Koufakou, A., Secretan, J., Georgiopoulos, M.: Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl. Inf. Syst. 29(3), 697–725 (2011)CrossRefGoogle Scholar
  5. 5.
    Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML PKDD 2008 Workshop, pp. 1–16 (2008)Google Scholar
  6. 6.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: International Conference on Knowledge Discovery and Data mining (1998)Google Scholar
  7. 7.
    Liu, Q., Dong, G.: CPCQ: contrast pattern based clustering quality index for categorical data. Pattern Recogn. 45(4), 1739–1748 (2012)CrossRefGoogle Scholar
  8. 8.
    Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J.: ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590 (2011)Google Scholar
  10. 10.
    van Leeuwen, M.: Interactive data exploration using pattern mining. In: Jurisica, I., Holzinger, A. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)Google Scholar
  11. 11.
    Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: International conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)Google Scholar
  12. 12.
    Giacometti, A., Li, D.H., Soulet, A.: Balancing the analysis of frequent patterns. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 53–64. Springer, Heidelberg (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Université François Rabelais Tours, LI EA 6300BloisFrance

Personalised recommendations