Prefix-Suffix Trees: A Novel Scheme for Compact Representation of Large Datasets

  • Radhika M. Pai
  • V. S Ananthanarayana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

Keywords

Data mining Incremental mining Clustering Pattern- Count(PC) tree Abstraction Prefix-Suffix Trees 

References

  1. 1.
    Moore, A., Lee, M.S.: Cached Sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research 8, 67–91 (1998)MATHMathSciNetGoogle Scholar
  2. 2.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (Advanced Reference Series)Google Scholar
  3. 3.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  4. 4.
    Pujari, A.K.: Data Mining techniques. University Press, New Haven (2001)Google Scholar
  5. 5.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM trans. Math software 3(3), 209–226 (1997)CrossRefGoogle Scholar
  6. 6.
    Viswanath, P., Murthy, M.N.: An incremental mining algorithm for compact realization of prototypes. Technical Report, IISC, Bangalore (2002)Google Scholar
  7. 7.
    Prakash, M., Murthy, M.N.: Growing subspace pattern recognition methods and their neural network models. IEEE trans. Neural Networks 8(1), 161–168 (1997)CrossRefGoogle Scholar
  8. 8.
    Ananthanarayana, V.S., NarasimhaMurty, M., Subramanian, D.K.: Tree structure for efficient data mining using rough sets. Pattern Recognition Letters 24, 851–886 (2003)MATHCrossRefGoogle Scholar
  9. 9.
  10. 10.
  11. 11.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATHGoogle Scholar
  12. 12.
    Ravindra, T., Murthy, M.N.: Comparison of Genetic Algorithms based prototype selection scheme. Pattern Recognition 34, 523–525 (2001)CrossRefGoogle Scholar
  13. 13.
    Pai, R.M., Ananthanarayana, V.S.: A novel data structure for efficient representation of large datasets in Data Mining. In: Proceedings of the 14th international Conference on Advanced Computing and Communications, pp. 547–552 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Radhika M. Pai
    • 1
  • V. S Ananthanarayana
    • 2
  1. 1.Manipal Institute of Technology, Manipal 
  2. 2.National Institute of Technology Karnataka, Surathkal 

Personalised recommendations