Skip to main content

Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Abstract

Pattern-based tree classifiers are capable of producing high quality results, however, they are prone to the problem of the default class overuse. In this paper, we propose a measure designed to address this issue, called partial tree-edit distance (PTED), which allows for assessing the degree of containment of one tree in another. Furthermore, we propose an algorithm which calculates the measure and perform an experiment involving pattern-based classification to illustrate its usefulness. The results show that incorporating PTED into the classification scheme allowed us to significantly improve the accuracy on the tested datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)

    Article  Google Scholar 

  2. Dulucq, S., Tichit, L.: RNA secondary structure comparison: exact analysis of the Zhang-Shasha tree edit algorithm. Theor. Comput. Sci. 306(1–3), 471–484 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  3. Kouylekov, M., Magnini, B.: Combining lexical resources with tree edit distance for recognizing textual entailment. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 217–230. Springer, Heidelberg (2006). doi:10.1007/11736790_12

    Chapter  Google Scholar 

  4. Augsten, N., Bohlen, M., Dyreson, C., Gamper, J.: Approximate joins for data-centric XML. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 814–823 (2008)

    Google Scholar 

  5. Augsten, N., Barbosa, D., Bohlen, M., Palpanas, T.: Efficient top-k approximate subtree matching in small memory. IEEE Trans. Knowl. Data Eng. 23(8), 1123–1137 (2011)

    Article  Google Scholar 

  6. Zaki, M.J., Aggarwal, C.C.: XRules: an effective algorithm for structural classification of XML data. Mach. Learn. 62(1–2), 137–170 (2006)

    Article  Google Scholar 

  7. Stefanowski, J.: Algorithms of rule induction for knowledge discovery. Habilitation thesis (2001)

    Google Scholar 

  8. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cohen, S., Or, N.: A general algorithm for subtree similarity-search. In: Proceedings of the 30th International Conference on Data Engineering, ICDE 2014, pp. 928–939 (2014)

    Google Scholar 

  10. Cohen, S.: Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 49–60 (2013)

    Google Scholar 

  11. Amer-Yahia, S., Cho, S.R., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Šaltenis, S., Jeffery, K.G., Pokorny, J., Bertino, E., Böhn, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002). doi:10.1007/3-540-45876-X_32

    Chapter  Google Scholar 

  12. Zhang, K., Shasha, D., Wang, J.T.L.: Approximate tree matching in the presence of variable length don’t cares. J. Algorithms 16, 33–66 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  13. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)

    MathSciNet  MATH  Google Scholar 

  14. Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  15. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research is partly funded by the Polish National Science Center under Grant No. 2015/19/B/ST6/02637.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Piernik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Piernik, M., Morzy, T. (2017). Partial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57529-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57528-5

  • Online ISBN: 978-3-319-57529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics