Enhancing site style tree construction algorithm

Zhu, Fuxi; Gong, Changsheng; Yao, Haitao; Dong, Wenyong

doi:10.1007/s11859-009-0207-8

Enhancing site style tree construction algorithm

Published: 08 March 2009

Volume 14, pages 129–133, (2009)
Cite this article

Wuhan University Journal of Natural Sciences

Fuxi Zhu¹,
Changsheng Gong¹,
Haitao Yao¹ &
…
Wenyong Dong¹

50 Accesses
2 Citations
Explore all metrics

Abstract

The World Wide Web has become a global information service center with a vast amount of news, advertisements, product and service information, and disparate information from diversified sources. However, only a small portion of information is truly relevant and useful to the users who are seeking information on specific topics. In this paper, common relations among nodes are taken into consideration when constructing site style tree, and a new node type is introduced. Experimental results show that the proposed algorithm has higher precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ye Shiren, Chua Tatseng. Detecting and Partitioning Data Objects in Complex Web Pages[C]//Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence. Washington D C: IEEE Computer Society Press, 2004: 669–672.
Google Scholar
Ye Shiren, Chua Tatseng. Learning Object Models from Semistructured Web Documents[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3): 334–349.
Article Google Scholar
Lin Shanhua, Huo Janming. Discovering Informative Content Blocks from Web Documents[C]//Proceedings of the Eighth ACM SIGKDD.New York: ACM Press, 2002: 588–593.
Google Scholar
Yossef Z B, Rajagopalan S. Template Detection via Data Mining and Its Applications[C]//Association for Computing Machinery: www’02.New York: ACM Press, 2002:580–591.
Google Scholar
Lee M L, Ling T W, Low W L. A Knowledge-Based Intelligent Data Cleaner [C]//Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2000:290–294.
Chapter Google Scholar
Yong N U, Mikhail B, Raymond J M. Two Approaches to Handling Noisy Variation in Text Mining [C]// Proceedings of the ICML-2002 Workshop on Text Learning. New York: Springer-Verlag, 2002:18–27.
Google Scholar
Xiong H, Gaurav P, Steinbach M, et al. Enhancing Data Analysis with Noise Removal[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3): 304–319.
Article Google Scholar
Fabrizio A, Clara P. Fast Outlier Detection in High Dimensional Spaces[C]//Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery. London: Springer-Verlag, 2002:15–26.
Google Scholar
Yi Lan, Liu Bing, Li Xiaoli. Eliminating Noisy Information in Web Pages for Data Mining[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2003). New York: IEEE Press, 2003: 296–305.
Google Scholar
Zhao Chengli, Yi Dongyun. A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications[J]. Wuhan University Journal of Natural Sciences, 2004, 9(5):611–616.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 430072, Hubei, China
Fuxi Zhu, Changsheng Gong, Haitao Yao & Wenyong Dong

Authors

Fuxi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Gong
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Wenyong Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuxi Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, F., Gong, C., Yao, H. et al. Enhancing site style tree construction algorithm. Wuhan Univ. J. Nat. Sci. 14, 129–133 (2009). https://doi.org/10.1007/s11859-009-0207-8

Download citation

Received: 17 March 2008
Published: 08 March 2009
Issue Date: April 2009
DOI: https://doi.org/10.1007/s11859-009-0207-8

Key words

CLC number

TP 393.092

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing site style tree construction algorithm

Abstract

Access this article

Similar content being viewed by others

Web Structure Mining Algorithms: A Survey

Building Enhanced Link Context by Logical Sitemap

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Enhancing site style tree construction algorithm

Abstract

Access this article

Similar content being viewed by others

Web Structure Mining Algorithms: A Survey

Building Enhanced Link Context by Logical Sitemap

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation