Extracting Information from XML Documents by Reverse Generating a DTD

Jung, Jong-Seok; Oh, Dong-Ik; Kong, Yong-Hae; Ahn, Jong-Keun

doi:10.1007/3-540-36087-5_37

Jong-Seok Jung⁷,
Dong-Ik Oh⁷,
Yong-Hae Kong⁷ &
…
Jong-Keun Ahn⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2510))

Included in the following conference series:

Eurasian Conference on Information and Communication Technology

374 Accesses
2 Citations

Abstract

Information contained in XML documents cannot properly be interpreted without an appropriate DTD. However, XML documents collected from the web may not always be accompanied by the corresponding DTD, so that extracting information from such sources may not be easy. In this study, we reverse construct a DTD from DTD-unknown XML sources, and use it to extract information from XML inputs. The DTD construction module developed is designed to scan input XML files in 1-path, where most other implementations use 2-path approach. Developed modules provide clean Java programming interfaces as well, so that it can be integrated with other web applications seamlessly.

This works is supported in part by the Ministry of Information & Communication of Korea

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Oh, D., Jung, J.: Effective Web-Based Information Gathering Services of IHWA. Proceedings of ICEIC’2000 International Conference, Shenyang, China (2000) 202–205
Google Scholar
Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT-A System for Extracting Document Type Descriptors from XML Documents. Bell Labs Tech. Memorandum (1999)
Google Scholar
Moh, C.-H., Lim, E.-P., Ng, W.-K.: Re-engineering Structures from Web Documents. Proceedings of the 5th ACM International Conference on Digital Libraries (DL2000), San Antonio, Texas, USA (2000)
Google Scholar
Ha, S.: The Effective Exploitation of Heterogeneous Product Information for E-Commerce. Submitted for Publication (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Information Technology Engineering, SoonChunHyang University, Shinchangmyun, Asan, Korea
Jong-Seok Jung, Dong-Ik Oh, Yong-Hae Kong & Jong-Keun Ahn

Authors

Jong-Seok Jung
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Ik Oh
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Hae Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Keun Ahn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science Computer Science Department, Shahid Bahonar University, 22 Bahman Bulvard, Kerman, Iran
Hassan Shafazand
Fraunhofer IPSI, Dolivostr. 15, 64293, Darmstadt, Germany
Hassan Shafazand
Institute of Software Technology, Vienna University of Technology, Favoritenstr. 9/188, 1040, Vienna, Austria
A. Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, JS., Oh, DI., Kong, YH., Ahn, JK. (2002). Extracting Information from XML Documents by Reverse Generating a DTD. In: Shafazand, H., Tjoa, A.M. (eds) EurAsia-ICT 2002: Information and Communication Technology. EurAsia-ICT 2002. Lecture Notes in Computer Science, vol 2510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36087-5_37

Download citation

DOI: https://doi.org/10.1007/3-540-36087-5_37
Published: 10 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00028-0
Online ISBN: 978-3-540-36087-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics