Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification

Piernik, Maciej; Morzy, Tadeusz

doi:10.1007/978-3-319-57529-2_17

Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification

Maciej Piernik¹⁹ &
Tadeusz Morzy¹⁹

Conference paper
First Online: 23 April 2017

2946 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Abstract

Pattern-based tree classifiers are capable of producing high quality results, however, they are prone to the problem of the default class overuse. In this paper, we propose a measure designed to address this issue, called partial tree-edit distance (PTED), which allows for assessing the degree of containment of one tree in another. Furthermore, we propose an algorithm which calculates the measure and perform an experiment involving pattern-based classification to illustrate its usefulness. The results show that incorporating PTED into the classification scheme allowed us to significantly improve the accuracy on the tested datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)
Article Google Scholar
Dulucq, S., Tichit, L.: RNA secondary structure comparison: exact analysis of the Zhang-Shasha tree edit algorithm. Theor. Comput. Sci. 306(1–3), 471–484 (2003)
Article MathSciNet MATH Google Scholar
Kouylekov, M., Magnini, B.: Combining lexical resources with tree edit distance for recognizing textual entailment. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 217–230. Springer, Heidelberg (2006). doi:10.1007/11736790_12
Chapter Google Scholar
Augsten, N., Bohlen, M., Dyreson, C., Gamper, J.: Approximate joins for data-centric XML. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 814–823 (2008)
Google Scholar
Augsten, N., Barbosa, D., Bohlen, M., Palpanas, T.: Efficient top-k approximate subtree matching in small memory. IEEE Trans. Knowl. Data Eng. 23(8), 1123–1137 (2011)
Article Google Scholar
Zaki, M.J., Aggarwal, C.C.: XRules: an effective algorithm for structural classification of XML data. Mach. Learn. 62(1–2), 137–170 (2006)
Article Google Scholar
Stefanowski, J.: Algorithms of rule induction for knowledge discovery. Habilitation thesis (2001)
Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Article MathSciNet MATH Google Scholar
Cohen, S., Or, N.: A general algorithm for subtree similarity-search. In: Proceedings of the 30th International Conference on Data Engineering, ICDE 2014, pp. 928–939 (2014)
Google Scholar
Cohen, S.: Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 49–60 (2013)
Google Scholar
Amer-Yahia, S., Cho, S.R., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Šaltenis, S., Jeffery, K.G., Pokorny, J., Bertino, E., Böhn, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002). doi:10.1007/3-540-45876-X_32
Chapter Google Scholar
Zhang, K., Shasha, D., Wang, J.T.L.: Approximate tree matching in the presence of variable length don’t cares. J. Algorithms 16, 33–66 (1993)
Article MathSciNet MATH Google Scholar
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)
MathSciNet MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This research is partly funded by the Polish National Science Center under Grant No. 2015/19/B/ST6/02637.

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, 60-965, Poznan, Poland
Maciej Piernik & Tadeusz Morzy

Authors

Maciej Piernik
View author publications
You can also search for this author in PubMed Google Scholar
Tadeusz Morzy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Piernik .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Piernik, M., Morzy, T. (2017). Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-57529-2_17
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics