Evaluation of Position-Constrained Association-Rule-Based Classification for Tree-Structured Data

Bui, Dang Bach; Hadzic, Fedja; Hecker, Michael

doi:10.1007/978-3-642-40319-4_33

Dang Bach Bui²⁵,
Fedja Hadzic²⁵ &
Michael Hecker²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3456 Accesses

Abstract

Tree-structured data is popular in many domains making structural classification an important task. In this paper, a recently proposed structure preserving flat representation is used to generate association rules using itemset mining techniques. The main difference to traditional techniques is that subtrees are constrained by the position in the original tree, and initial associations prior to subtree reconstruction can be based on disconnected subtrees. Imposing the positional constraint on subtreee typically result in a reduces the number of rules generated, especially with greater structural variation among tree instances. This outcome would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However, the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a structural classifier based on same subtree type, but not positional constrained in any way. The experiments using publicly available real-world data reveal important differences between the methods and implications when frequent candidate subtrees on which the association rules are based, are not only equivalent structure and node label wise, but also occur at the same position across the tree instances in the database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: 4th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Google Scholar
Li, J., Shen, H., Topor, R.W.: Mining the optimal class association rule set. Knowledge-Based Systems 15(7), 399–405 (2002)
Article Google Scholar
Li, W., Han, J., Pei, J.: CMAR:Accurate and efficient classification based on multiple class-association rules. In: IEEE International Conference on Data Mining (ICDM), pp. 369–376 (2001)
Google Scholar
Veloso, A., Meira, W., Zaki, M.J.: Lazy Associative Classification. In: 6th IEEE Inetrantional Conference on Data Mining (ICDM), pp. 645–654 (2006)
Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: Mining of Data with Complex Structures. SCI, vol. 333. Springer, Heidelberg (2011)
MATH Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Freequent Subtree Mining - An Overview. Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences 66(1-2), 161–198 (2005)
MathSciNet MATH Google Scholar
Hadzic, F.: A Structure Preserving Flat Data Format Representation for Tree-Structured Data. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 221–233. Springer, Heidelberg (2012)
Chapter Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE TKDE 17(8), 1021–1035 (2005)
Google Scholar
Bouchachia, A., Hassler, M.: Classification of XML documents. In: Computational Intelligence and Data Mining, CIDM (2007)
Google Scholar
Bringmann, B., Zimmermann, A.: Tree²: decision trees for tree structured data. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, Berlin, Heidelberg, pp. 46–58 (2005)
Google Scholar
Candillier, L., Tellier, I., Torre, F.: Transforming xml trees for efficient classification and clustering. In: 4th International Conference on Initiative for the Evaluation of XML Retrieval, INEX, Berlin, Heidelberg, pp. 469–480 (2006)
Google Scholar
Chehreghani, M.H., Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E.: Efficient rule based structural algorithms for classification of tree structured data. J. Intelligent Data Analysis 13(1), 165–188 (2009)
Google Scholar
Costa, G., Ortale, R., Ritacco, E.: Effective XML classification using content and structural information via rule learning. In: 23rd International Conference on Tools with Artificial Intelligence, ICTAI, Washington DC, USA, pp. 102–109 (2011)
Google Scholar
Denoyer, L., Gallinari, P.: Bayesian network model for semi-structured document classification. Journal of Information Processing Management 40(5), 807–827 (2004)
Article Google Scholar
De Knijf, J.: FAT-CAT: Frequent attributes tree based classification. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 485–496. Springer, Heidelberg (2007)
Chapter Google Scholar
Wang, J., Karypis, G.: On mining instance-centric classification rules. IEEE Transaction on Knowledge and Data Engineering 18(11), 1497–1511 (2006)
Article Google Scholar
Wang, S., Hong, Y., Yang, J.: XML document classification using closed frequent subtree. In: Bao, Z., Gao, Y., Gu, Y., Guo, L., Li, Y., Lu, J., Ren, Z., Wang, C., Zhang, X. (eds.) WAIM 2012 Workshops. LNCS, vol. 7419, pp. 350–359. Springer, Heidelberg (2012)
Chapter Google Scholar
Wu, J.: A framework for learning comprehensible theories in XML document classification. IEEE Transaction on Knowledge and Data Engineering 24(1), 1–14 (2012)
Article Google Scholar
Zaki, M.J., Aggarwal, C.C.: Xrules: An effective algorithm for structural classification of XML data. Machine Learning 62(1-2), 137–170 (2006)
Article Google Scholar
Bui, D.B., Hadzic, F., Potdar, V.: A Framework for Application of Tree-Structured Data Mining to Process Log Analysis. In: Proc. Intelligent Data Engineering and Automated Learning, Brazil (2012)
Google Scholar
Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Romero, A.E.: Probabilistic Methods for Structured Document Classification at INEX. Focused Access to XML Documents (2008)
Google Scholar
Geng, L., Hamilton, H.J.: Interestingness Measures for Data Mining: A Survey. ACM Computing Surveys 38(3) (2006)
Google Scholar
Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research 184(2), 610–626 (2008)
Article MATH Google Scholar
Wang, K., He, Y., Cheung, D.W.: Mining confident rules without support requirement. In: 10th International Conference on Information and Knowledge Management, pp. 89–96 (2001)
Google Scholar
Bras, Y.L., Lenca, P., Lallich, S.: Mining Classification Rules without Support: an Anti-monotone Property of Jaccard Measure. In: Elomaa, T., Hollmén, J., Mannila, H. (eds.) DS 2011. LNCS, vol. 6926, pp. 179–193. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University, Perth, Australia
Dang Bach Bui, Fedja Hadzic & Michael Hecker

Authors

Dang Bach Bui
View author publications
You can also search for this author in PubMed Google Scholar
Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hecker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Mathematical Sciences, University of South Australia, 1 Mawson Lakes Boulevard, 5095, Adelaide, SA, Australia
Jiuyong Li
Advanced Analytics Institute, University of Technology, 2-12 Blackfriars Street, Chippendale, Blackfriars Campus, 2008, Sydney, NSW, Australia
Longbing Cao & Can Wang &
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117576, Singapore, Singapore
Kay Chen Tan
School of Automation, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu District, 510006, Guangzhou, China
Bo Liu
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, 701, Tainan, Taiwan
Vincent S. Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, D.B., Hadzic, F., Hecker, M. (2013). Evaluation of Position-Constrained Association-Rule-Based Classification for Tree-Structured Data. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-40319-4_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics