Skip to main content
Log in

Discovering frequent subtrees from XML data using neural networks

  • Web Data Management Information Integration
  • Published:
Wuhan University Journal of Natural Sciences

Abstract

By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Substree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algorithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wang Ke, Liu Hui-qing. Discovering Structural Association of Semistructured Data.IEEE Trans Knowl Data Eng, 2000,12(2):353–371.

    Google Scholar 

  2. Miyahara T, Shoudai T, Uchida T,et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents.Proc PAKDD-2001. London: Springer-Verlag, 2001, 47–52.

    Google Scholar 

  3. Mohammed J Z. Efficiently Mining Frequent Trees in a Forest.Proc KDD2002. New York: ACM Press, 2002, 71–80.

    Google Scholar 

  4. Dehaspe L, Toivonen H, King R D. Finding Frequent Substructures in Chemical Compounds.Proc KDD98. New York: ACM Press, 1998. 30–36.

    Google Scholar 

  5. Roberto J, Bayardo J R. Efficiently Mining Long Patterns from Databases.Proc SIGMOD98. New York: ACM Press, 1998, 85–93.

    Google Scholar 

  6. Matsuda T, Horiuchi T, Motoda H,et al. Graph-Based Induction for General Graph Structured Data.Proc of the Second International Conference on Discovery Science. London: Springer-Verlag, 1999, 340–342.

    Google Scholar 

  7. Sese J, Morishita, S. Answering the Most Correlated N Association Rules Efficiently.Proc of PKDD2002. Helsinki: ACM Press, 2002. 410–422.

    Google Scholar 

  8. Yang L H, Lee M L, Hsu W. Mining Frequent Quer Patterns from XML Queries.Proc DASFAA2003. Kyoto: IEEE Press. 2003. 355–362.

    Google Scholar 

  9. Kleinfeld D. Sequential State Generation by Model Neural Networks.The National Academy of Sciences, 1986,83 (24):9469–9473.

    Article  MathSciNet  Google Scholar 

  10. Kleinfeld D, Sompolinsky H. Associative Neural Network Model for the Generation of Temporal Patterns.Biophys J, 1989,55(3):1039–1051.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liu Da-xin.

Additional information

Foundation item: Supported by Key Science-Technology Project of Heilongjiang Province(GA010401-3)

Biography: SUN Wei(1978-) male, Ph.D. candidate, research direction: XML database, data mining and information management.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, S., Da-xin, L. & Tong, W. Discovering frequent subtrees from XML data using neural networks. Wuhan Univ. J. Nat. Sci. 11, 117–121 (2006). https://doi.org/10.1007/BF02831715

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02831715

Key words

CLC number

Navigation