Abstract
Pattern-based tree classifiers are capable of producing high quality results, however, they are prone to the problem of the default class overuse. In this paper, we propose a measure designed to address this issue, called partial tree-edit distance (PTED), which allows for assessing the degree of containment of one tree in another. Furthermore, we propose an algorithm which calculates the measure and perform an experiment involving pattern-based classification to illustrate its usefulness. The results show that incorporating PTED into the classification scheme allowed us to significantly improve the accuracy on the tested datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)
Dulucq, S., Tichit, L.: RNA secondary structure comparison: exact analysis of the Zhang-Shasha tree edit algorithm. Theor. Comput. Sci. 306(1–3), 471–484 (2003)
Kouylekov, M., Magnini, B.: Combining lexical resources with tree edit distance for recognizing textual entailment. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 217–230. Springer, Heidelberg (2006). doi:10.1007/11736790_12
Augsten, N., Bohlen, M., Dyreson, C., Gamper, J.: Approximate joins for data-centric XML. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 814–823 (2008)
Augsten, N., Barbosa, D., Bohlen, M., Palpanas, T.: Efficient top-k approximate subtree matching in small memory. IEEE Trans. Knowl. Data Eng. 23(8), 1123–1137 (2011)
Zaki, M.J., Aggarwal, C.C.: XRules: an effective algorithm for structural classification of XML data. Mach. Learn. 62(1–2), 137–170 (2006)
Stefanowski, J.: Algorithms of rule induction for knowledge discovery. Habilitation thesis (2001)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Cohen, S., Or, N.: A general algorithm for subtree similarity-search. In: Proceedings of the 30th International Conference on Data Engineering, ICDE 2014, pp. 928–939 (2014)
Cohen, S.: Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 49–60 (2013)
Amer-Yahia, S., Cho, S.R., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Šaltenis, S., Jeffery, K.G., Pokorny, J., Bertino, E., Böhn, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002). doi:10.1007/3-540-45876-X_32
Zhang, K., Shasha, D., Wang, J.T.L.: Approximate tree matching in the presence of variable length don’t cares. J. Algorithms 16, 33–66 (1993)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgments
This research is partly funded by the Polish National Science Center under Grant No. 2015/19/B/ST6/02637.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Piernik, M., Morzy, T. (2017). Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-57529-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)