Mining Fault-Tolerant Item Sets Using Subset Size Occurrence Distributions

  • Christian Borgelt
  • Tobias Kötter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7014)

Abstract

Mining fault-tolerant (or approximate or fuzzy) item sets means to allow for errors in the underlying transaction data in the sense that actually present items may not be recorded due to noise or measurement errors. In order to cope with such missing items, transactions that do not contain all items of a given set are still allowed to support it. However, either the number of missing items must be limited, or the transaction’s contribution to the item set’s support is reduced in proportion to the number of missing items, or both. In this paper we present an algorithm that efficiently computes the subset size occurrence distribution of item sets, evaluates this distribution to find fault-tolerant item sets, and exploits intermediate data to remove pseudo (or spurious) item sets. We demonstrate the usefulness of our algorithm by applying it to a concept detection task on the 2008/2009 Wikipedia Selection for schools.

Keywords

Subset Size Frequent Pattern Mining Extended Support Standard Support Item Counter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Lin, Y., Wang, J., Wang, J.: Frequent Pattern Mining with Uncertain Data. In: Proc. 15th ACM SIGMOD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009), Paris, France, pp. 29–38. ACM Press, New York (2009)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 20th Int. Conf. on Very Large Databases (VLDB 1994), Santiago de Chile, pp. 487–499. Morgan Kaufmann, San Mateo (1994)Google Scholar
  3. 3.
    Besson, J., Robardet, C., Boulicaut, J.-F.: Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery. In: Proc. Int. Conference on Computational Science (ICCS 2006), Reading, United Kingdom, pp. 144–157. Springer, Berlin (2006)Google Scholar
  4. 4.
    Berger, D., Borgelt, C., Diesmann, M., Gerstein, G., Grün, S.: An Accretion based Data Mining Algorithm for Identification of Sets of Correlated Neurons. In: 18th Ann. Computational Neuroscience Meeting (CNS*2009), Berlin, Germany (2009)Google Scholar
  5. 5.
    Borgelt, C., Wang, X.: SaM: A Split and Merge Algorithm for Fuzzy Frequent Item Set Mining. In: Proc. 13th Int. Fuzzy Systems Association World Congress and 6th Conf. of the European Society for Fuzzy Logic and Technology (IFSA/EUSFLAT 2009), Lisbon, Portugal, pp. 968–973. IFSA/EUSFLAT Organization Committee, Lisbon (2009)Google Scholar
  6. 6.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of Frequency Queries by Means of Free-Sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Calders, T., Garboni, C., Goethals, B.: Efficient Pattern Mining of Uncertain Data with Sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6118, pp. 480–487. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Chui, C.-K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Creighton, C., Hanash, S.: Mining Gene Expression Databases for Association Rules. Bioinformatics 19, 79–86 (2003)CrossRefGoogle Scholar
  10. 10.
    Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and Combinatorial Tiles in 0-1 Data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Grün, S., Rotter, S. (eds.): Analysis of Parallel Spike Trains. Springer, Berlin (2010)Google Scholar
  12. 12.
    Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Exploration 2(2), 86–93 (2000)CrossRefGoogle Scholar
  13. 13.
    Leung, C.K.-S., Carmichael, C.L., Hao, B.: Efficient Mining of Frequent Patterns from Uncertain Data. In: 7th IEEE Int. Conf. on Data Mining Workshops (ICDMW 2007), Omaha, NE, pp. 489–494. IEEE Press, Piscataway (2007)CrossRefGoogle Scholar
  14. 14.
    Pensa, R.G., Robardet, C., Boulicaut, J.F.: Supporting Bi-cluster Interpretation in 0/1 Data by Means of Local Patterns. In: Intelligent Data Analysis, vol. 10, pp. 457–472. IOS Press, Amsterdam (2006)Google Scholar
  15. 15.
    Pei, J., Tung, A.K.H., Han, J.: Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. In: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMK 2001), Santa Babara, CA. ACM Press, New York (2001)Google Scholar
  16. 16.
    Segond, M., Borgelt, C.: Item Set Mining Based on Cover Similarity. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 493–505. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Seppänen, J.K., Mannila, H.: Dense Itemsets. In: Proc. 10th ACM SIGMOD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, pp. 683–688. ACM Press, New York (2004)CrossRefGoogle Scholar
  18. 18.
    Wang, X., Borgelt, C., Kruse, R.: Mining Fuzzy Frequent Item Sets. In: Proc. 11th Int. Fuzzy Systems Association World Congress (IFSA 2005), Beijing, China, pp. 528–533. Tsinghua University Press and Springer-Verlag, Beijing, and Heidelberg (2005)Google Scholar
  19. 19.
    Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-tolerant Frequent Itemsets in High Dimensions. In: Proc. 7th ACM SIGMOD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 194–203. ACM Press, New York (2001)Google Scholar
  20. 20.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, pp. 283–296. AAAI Press, Menlo Park (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Christian Borgelt
    • 1
  • Tobias Kötter
    • 2
  1. 1.European Centre for Soft ComputingMieresSpain
  2. 2.Dept. of Computer ScienceUniversity of KonstanzKonstanzGermany

Personalised recommendations