Abstract
This paper proposes a new algorithm for pattern extraction from Stratified Ordered Trees (SOT). It first describes the SOT data structure that renders possible a representation of structured sequential data. Then it shows how it is possible to extract clusters of similar recurrent patterns from any SOT. The similarity on which our clustering algorithm is based is a generalized edit distance, also described in the paper. The algorithms presented have been tested on text mining: the aim was to detect recurrent syntactical motives in texts drawn from classical literature. Hopefully, this algorithm can be applied to many different fields where data are naturally sequential (e.g. financial data, molecular biology, traces of computation, etc.)
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Crochemore M, Rytter W Text Algorithms, “Approximate matching”, (1994): 237–251.
Gusfield D. Efficient methods for multiple sequence Alignment with Guaranteed Error Bounds, Bull. Math. Biol., 55 (1993); 141–154.
Holmes D Autorship Attribution, Computers and the Humanities, 28 (1994): 87–106.
Karp R M., Miller R E., Rosenberg A L. Rapid Identification of Repeated Patterns in Strings, Trees and Arrays, in Proc. 4th. ACM Symp. Theory of Computing, (1972): 125–136.
Landraud A M., Avril J-F, Chrétienne P An algorithm for Finding a Common Structure Shared by a Family of Strings, IEEE transactions on Pattern Analysis and Machine Intelligence, 11(8), (1989): 890–895.
Lowe D, Matthews R Shakespeare Vs. Fletcher: A Stylometric Analysis by Radial Basis Function, Computer and the Humanities, 29 (1995): 449–461.
Rolland, P-Y, Ganascia J-G, Musical Pattern Extraction and Similarity Assessment. In Miranda, E. (ed.). Readings in Music and Artificial Intelligence. Contemporary Music Studies-Vol 20. Harwood Academic Publishers. (1999).
Sagot, Viari A. A Double Combinatorial Approach to Discovering Patterns in Biological Sequences, Combinatorial Pattern Matching, Springer Verlag, LNCS 1075 (1996): 168–208
Sankoff D., Kruskal J.B. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, Mass. (1983)
Vergne J., Analyseur linéaire avec dictionnaire partiel, décembre 1999, convention d’utilisation de l’analyseur de Jacques Vergne. (1999)
Zhang K. Fast algorithms for the constrained editing distance between ordered labeled trees and related problems, report No 361, Department of computer science, University of Western Ontario, London, Ontario, Canada. (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ganascia, JG. (2001). Extraction of Recurrent Patterns from Stratified Ordered Trees. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_15
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive