Skip to main content

FAT-CAT: Frequent Attributes Tree Based Classification

  • Conference paper
Comparative Evaluation of XML Information Retrieval Systems (INEX 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

The natural representation of XML data is to use the underlying tree structure of the data. When analyzing these trees we are ensured that no structural information is lost. These tree structures can be efficiently analyzed due to the existence of frequent pattern mining algorithms that works directly on tree structured data. In this work we describe a classification method for XML data based on frequent attribute trees. From these frequent patterns we select so called emerging patterns, and use these as binary features in a decision tree algorithm. The experimental results show that combining emerging attribute tree patterns with standard classification methods, is a promising combination to tackle the classification of XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499 (1994)

    Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SIAM Symposium on Discrete Algorithms (2002)

    Google Scholar 

  3. Bayardo, R.: Efficiently mining long patterns from databases. In: Laura, A.T., Haas, M. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pp. 85–93 (1998)

    Google Scholar 

  4. Borgelt, C.: A decision tree plug-in for dataengine. In: Proc. 6th European Congress on Intelligent Techniques and Soft Computing (1998)

    Google Scholar 

  5. Bringmann, B., Zimmermann, A.: Tree2 - decision trees for tree structured data. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 46–58, (2005)

    Google Scholar 

  6. Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2005 and inex 2006. In: Proceedings of INEX (2006)

    Google Scholar 

  7. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)

    Google Scholar 

  8. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52 (1999)

    Google Scholar 

  9. Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K.: Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae. 66(1-2), 131–160 (2005)

    MATH  MathSciNet  Google Scholar 

  10. De Knijf, J.: FAT-miner: Mining frequent attribute trees. In: SAC 2007. Proceedings of the 2007 ACM symposium on Applied computing (to appear)

    Google Scholar 

  11. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)

    Google Scholar 

  12. Ross, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  13. Wang, K., Liu, H.: Discovering structural association of semistructured data. Knowledge and Data Engineering 12(2), 353–371 (2000)

    Article  Google Scholar 

  14. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2002)

    Google Scholar 

  15. Zaki, M.J., Aggarwal, C.C.: Xrules: an effective structural classifier for XML data. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 316–325 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Knijf, J. (2007). FAT-CAT: Frequent Attributes Tree Based Classification. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73888-6_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73887-9

  • Online ISBN: 978-3-540-73888-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics