Recognition of Data Records in Semi-structured Web-Pages Using Ontology and χ 2 Statistical Distribution

Keshavarzi, Amin; Rahmani, Amir Masoud; Mohsenzadeh, Mehran; Keshavarzi, Reza

doi:10.1007/978-3-540-88192-6_71

Amin Keshavarzi⁶,
Amir Masoud Rahmani⁷,
Mehran Mohsenzadeh⁷ &
…
Reza Keshavarzi⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2442 Accesses

Abstract

Information extraction (IE) has been emerged as a novel discipline in computer science. In IE, intelligent algorithms are employed to extract the required data, and structure them so that they are appropriate for query. In most IE systems, a web-page structure, e.g. HTML tags are used to recognize the looked-for information. In this article, an algorithm is developed to recognize the main region of web-pages containing the looked-for information, by means of an ontology, a web-page structure and goodness-of-fit χ ² test. After recognizing the main region, the existing records of the region are recognized, and then each record is put in a text file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press (2001)
Google Scholar
Papadakis, N.K., Skoutas, D., Raftopoulos, K.: STAVIES: A System for Information Extraction from Unknown Web Data Source through Automatic Web Wrapper Generation Using Clustering Techniques. IEEE Transaction on Knowledge and Data Engineering 17(12), 1638–1652 (2005)
Article Google Scholar
Ye, S., Chua, T.S.: Learning Object Models from Semistructured Web Documents. IEEE Transaction on Knowledge and Data Engineering 18(3), 334–349 (2006)
Article Google Scholar
Chang, C.H., Gigis, M.R.: A Survey of Web Information Extraction Systems. IEEE Transaction on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Article Google Scholar
Liu, B., Zhai, Y.: NET—A System for Extracting Web Data from Flat and Nested Data Records. In: Proc. Sixth Int’l Conf. Web Information Systems Eng, pp. 487–495 (2005)
Google Scholar
Liu, B., Grossman, R., Zhai, Y.: Mining Data Records in Web Pages. In: Proc. Int’l Conf. Knowledge Discovery in Databases and Data Mining (KDD), pp. 601–606 (2003)
Google Scholar
http://www.w3.org/DOM/
Zhang, N., Chen, H., Wang, Y., Chen, S.J., Xiong, M.F.: Odaies: Ontology-driven Adaptive Web Information Extarction Systems. In: Proc. IEEE/WIC International Conference on Intelligent Agent Technology (IAT 2003), pp. 454–460 (2003)
Google Scholar
Daconta, M.C., Obrst, L.J., Smith, K.T.: The Semantic Web: A Guide to the Future of XML, Web Service, and Knowledge Management. Wiley publishing, Inc., Chichester (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Islamic azad univ.Marvdasht branch, Iran
Amin Keshavarzi
Islamic azad univ.Science and research branch, Tehran, Iran
Amir Masoud Rahmani & Mehran Mohsenzadeh
University of isfahan, Iran
Reza Keshavarzi

Authors

Amin Keshavarzi
View author publications
You can also search for this author in PubMed Google Scholar
Amir Masoud Rahmani
View author publications
You can also search for this author in PubMed Google Scholar
Mehran Mohsenzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Reza Keshavarzi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
Department of Computer Science, The University of Western Ontario, Canada
Charles X. Ling
School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
Faculty of Science & Engineering, York University, 355 Lumbers Building, M3J 1P3, Toronto, Ontario, Canada
Nick J. Cercone
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, 4072, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keshavarzi, A., Rahmani, A.M., Mohsenzadeh, M., Keshavarzi, R. (2008). Recognition of Data Records in Semi-structured Web-Pages Using Ontology and χ ²Statistical Distribution. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_71

Download citation

DOI: https://doi.org/10.1007/978-3-540-88192-6_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics