Abstract
By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Substree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algorithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.
Similar content being viewed by others
References
Wang Ke, Liu Hui-qing. Discovering Structural Association of Semistructured Data.IEEE Trans Knowl Data Eng, 2000,12(2):353–371.
Miyahara T, Shoudai T, Uchida T,et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents.Proc PAKDD-2001. London: Springer-Verlag, 2001, 47–52.
Mohammed J Z. Efficiently Mining Frequent Trees in a Forest.Proc KDD2002. New York: ACM Press, 2002, 71–80.
Dehaspe L, Toivonen H, King R D. Finding Frequent Substructures in Chemical Compounds.Proc KDD98. New York: ACM Press, 1998. 30–36.
Roberto J, Bayardo J R. Efficiently Mining Long Patterns from Databases.Proc SIGMOD98. New York: ACM Press, 1998, 85–93.
Matsuda T, Horiuchi T, Motoda H,et al. Graph-Based Induction for General Graph Structured Data.Proc of the Second International Conference on Discovery Science. London: Springer-Verlag, 1999, 340–342.
Sese J, Morishita, S. Answering the Most Correlated N Association Rules Efficiently.Proc of PKDD2002. Helsinki: ACM Press, 2002. 410–422.
Yang L H, Lee M L, Hsu W. Mining Frequent Quer Patterns from XML Queries.Proc DASFAA2003. Kyoto: IEEE Press. 2003. 355–362.
Kleinfeld D. Sequential State Generation by Model Neural Networks.The National Academy of Sciences, 1986,83 (24):9469–9473.
Kleinfeld D, Sompolinsky H. Associative Neural Network Model for the Generation of Temporal Patterns.Biophys J, 1989,55(3):1039–1051.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by Key Science-Technology Project of Heilongjiang Province(GA010401-3)
Biography: SUN Wei(1978-) male, Ph.D. candidate, research direction: XML database, data mining and information management.
Rights and permissions
About this article
Cite this article
Wei, S., Da-xin, L. & Tong, W. Discovering frequent subtrees from XML data using neural networks. Wuhan Univ. J. Nat. Sci. 11, 117–121 (2006). https://doi.org/10.1007/BF02831715
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02831715