FAT-CAT: Frequent Attributes Tree Based Classification

De Knijf, Jeroen

doi:10.1007/978-3-540-73888-6_45

Jeroen De Knijf¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

634 Accesses
6 Citations

Abstract

The natural representation of XML data is to use the underlying tree structure of the data. When analyzing these trees we are ensured that no structural information is lost. These tree structures can be efficiently analyzed due to the existence of frequent pattern mining algorithms that works directly on tree structured data. In this work we describe a classification method for XML data based on frequent attribute trees. From these frequent patterns we select so called emerging patterns, and use these as binary features in a decision tree algorithm. The experimental results show that combining emerging attribute tree patterns with standard classification methods, is a promising combination to tackle the classification of XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499 (1994)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SIAM Symposium on Discrete Algorithms (2002)
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: Laura, A.T., Haas, M. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pp. 85–93 (1998)
Google Scholar
Borgelt, C.: A decision tree plug-in for dataengine. In: Proc. 6th European Congress on Intelligent Techniques and Soft Computing (1998)
Google Scholar
Bringmann, B., Zimmermann, A.: Tree² - decision trees for tree structured data. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 46–58, (2005)
Google Scholar
Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2005 and inex 2006. In: Proceedings of INEX (2006)
Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Google Scholar
Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K.: Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae. 66(1-2), 131–160 (2005)
MATH MathSciNet Google Scholar
De Knijf, J.: FAT-miner: Mining frequent attribute trees. In: SAC 2007. Proceedings of the 2007 ACM symposium on Applied computing (to appear)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Google Scholar
Ross, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Wang, K., Liu, H.: Discovering structural association of semistructured data. Knowledge and Data Engineering 12(2), 353–371 (2000)
Article Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2002)
Google Scholar
Zaki, M.J., Aggarwal, C.C.: Xrules: an effective structural classifier for XML data. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 316–325 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Universiteit Utrecht, Department of Information and Computing Sciences, PO Box 80.089, 3508 TB Utrecht, The Netherlands
Jeroen De Knijf

Authors

Jeroen De Knijf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Knijf, J. (2007). FAT-CAT: Frequent Attributes Tree Based Classification. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-73888-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics